file命令使用介绍
file最常用的场景就是用来查看可执行文件的运行环境,是arm呢,还是x86呢,还是mips呢?一看便知
$ file a.out a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=0xa240b1958136fc294a6ee5833de2a0fc8c9e0bd4, not stripped
可获取到的一些信息:操作系统位数(64-bit), 大小端(LSB小端), 文件类型(executable可执行文件, Relocatable可重定位文件, Shared object动态库文件), 指令集类型(x86-64, Intel 80386, mips, ARM), 是否去除符号表(not stripped 发布版一般都会去除以增加反汇编难度,加强安全性)
objdump使用及测试分析(x86-64位ubuntu)
linux下的ELF文件(可执行文件,动态库文件,可重定位文件,静态库文件)结构:
ELF文件头
$ readelf -h a.out ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x8048320 Start of program headers: 52 (bytes into file) Start of section headers: 4960 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 9 Size of section headers: 40 (bytes) Number of section headers: 36 Section header string table index: 33
SHT(section head table),ELF包含的section的一张映射表
$ readelf -S a.out There are 36 section headers, starting at offset 0x1360: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 08048154 000154 000013 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 08048168 000168 000020 00 A 0 0 4 [ 3] .note.gnu.build-i NOTE 08048188 000188 000024 00 A 0 0 4 [ 4] .gnu.hash GNU_HASH 080481ac 0001ac 000020 04 A 5 0 4 [ 5] .dynsym DYNSYM 080481cc 0001cc 000050 10 A 6 1 4 [ 6] .dynstr STRTAB 0804821c 00021c 00004a 00 A 0 0 1 [ 7] .gnu.version VERSYM 08048266 000266 00000a 02 A 5 0 2 [ 8] .gnu.version_r VERNEED 08048270 000270 000020 00 A 6 1 4 [ 9] .rel.dyn REL 08048290 000290 000008 08 A 5 0 4 [10] .rel.plt REL 08048298 000298 000018 08 A 5 12 4 [11] .init PROGBITS 080482b0 0002b0 00002e 00 AX 0 0 4 [12] .plt PROGBITS 080482e0 0002e0 000040 04 AX 0 0 16 [13] .text PROGBITS 08048320 000320 00017c 00 AX 0 0 16 [14] .fini PROGBITS 0804849c 00049c 00001a 00 AX 0 0 4 [15] .rodata PROGBITS 080484b8 0004b8 000014 00 A 0 0 4 [16] .eh_frame_hdr PROGBITS 080484cc 0004cc 000034 00 A 0 0 4 [17] .eh_frame PROGBITS 08048500 000500 0000c4 00 A 0 0 4 [18] .ctors PROGBITS 08049f14 000f14 000008 00 WA 0 0 4 [19] .dtors PROGBITS 08049f1c 000f1c 000008 00 WA 0 0 4 [20] .jcr PROGBITS 08049f24 000f24 000004 00 WA 0 0 4 [21] .dynamic DYNAMIC 08049f28 000f28 0000c8 08 WA 6 0 4 [22] .got PROGBITS 08049ff0 000ff0 000004 04 WA 0 0 4 [23] .got.plt PROGBITS 08049ff4 000ff4 000018 04 WA 0 0 4 [24] .data PROGBITS 0804a00c 00100c 000008 00 WA 0 0 4 [25] .bss NOBITS 0804a014 001014 000008 00 WA 0 0 4 [26] .comment PROGBITS 00000000 001014 00002a 01 MS 0 0 1 [27] .debug_aranges PROGBITS 00000000 00103e 000020 00 0 0 1 [28] .debug_info PROGBITS 00000000 00105e 00008b 00 0 0 1 [29] .debug_abbrev PROGBITS 00000000 0010e9 00003f 00 0 0 1 [30] .debug_line PROGBITS 00000000 001128 000038 00 0 0 1 [31] .debug_str PROGBITS 00000000 001160 00007e 01 MS 0 0 1 [32] .debug_loc PROGBITS 00000000 0011de 000038 00 0 0 1 [33] .shstrtab STRTAB 00000000 001216 000147 00 0 0 1 [34] .symtab SYMTAB 00000000 001900 000470 10 35 51 4 [35] .strtab STRTAB 00000000 001d70 0001fb 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific)
常用到的几个section解释:
1) .text section 里装载了可执行代码;
2) .data section 里面装载了被初始化的数据;
3) .bss section 里面装载了未被初始化的数据;
4) 以 .rec 打头的 sections 里面装载了重定位条目;
5) .symtab 或者 .dynsym section 里面装载了符号信息;
6) .strtab 或者 .dynstr section 里面装载了字符串信息;
Flg为A表示进程需要的,会被分配到内存的section, 另外一些没有A的,可以通过strip去掉
section(在ELF文件里头,用以装载内容数据的最小容器)
用objdump对可执行文件的代码段(sections .text)进行反汇编:
$ objdump -d -j .text a.out a.out: file format elf32-i386 Disassembly of section .text: 08048320 <_start>: 8048320: 31 ed xor %ebp,%ebp 8048322: 5e pop %esi 8048323: 89 e1 mov %esp,%ecx 8048325: 83 e4 f0 and $0xfffffff0,%esp 8048328: 50 push %eax 8048329: 54 push %esp 804832a: 52 push %edx 804832b: 68 60 84 04 08 push $0x8048460 8048330: 68 f0 83 04 08 push $0x80483f0 8048335: 51 push %ecx 8048336: 56 push %esi 8048337: 68 d4 83 04 08 push $0x80483d4 804833c: e8 cf ff ff ff call 8048310 <__libc_start_main@plt> 8048341: f4 hlt 8048342: 90 nop 8048343: 90 nop 8048344: 90 nop 8048345: 90 nop 8048346: 90 nop 8048347: 90 nop 8048348: 90 nop 8048349: 90 nop 804834a: 90 nop 804834b: 90 nop 804834c: 90 nop 804834d: 90 nop 804834e: 90 nop 804834f: 90 nop 08048350 <__do_global_dtors_aux>: 8048350: 55 push %ebp 8048351: 89 e5 mov %esp,%ebp 8048353: 53 push %ebx 8048354: 83 ec 04 sub $0x4,%esp 8048357: 80 3d 14 a0 04 08 00 cmpb $0x0,0x804a014 804835e: 75 3f jne 804839f <__do_global_dtors_aux+0x4f> 8048360: a1 18 a0 04 08 mov 0x804a018,%eax 8048365: bb 20 9f 04 08 mov $0x8049f20,%ebx 804836a: 81 eb 1c 9f 04 08 sub $0x8049f1c,%ebx 8048370: c1 fb 02 sar $0x2,%ebx 8048373: 83 eb 01 sub $0x1,%ebx 8048376: 39 d8 cmp %ebx,%eax 8048378: 73 1e jae 8048398 <__do_global_dtors_aux+0x48> 804837a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 8048380: 83 c0 01 add $0x1,%eax 8048383: a3 18 a0 04 08 mov %eax,0x804a018 8048388: ff 14 85 1c 9f 04 08 call *0x8049f1c(,%eax,4) 804838f: a1 18 a0 04 08 mov 0x804a018,%eax 8048394: 39 d8 cmp %ebx,%eax 8048396: 72 e8 jb 8048380 <__do_global_dtors_aux+0x30> 8048398: c6 05 14 a0 04 08 01 movb $0x1,0x804a014 804839f: 83 c4 04 add $0x4,%esp 80483a2: 5b pop %ebx 80483a3: 5d pop %ebp 80483a4: c3 ret 80483a5: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi 80483a9: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi 080483b0 <frame_dummy>: 80483b0: 55 push %ebp 80483b1: 89 e5 mov %esp,%ebp 80483b3: 83 ec 18 sub $0x18,%esp 80483b6: a1 24 9f 04 08 mov 0x8049f24,%eax 80483bb: 85 c0 test %eax,%eax 80483bd: 74 12 je 80483d1 <frame_dummy+0x21> 80483bf: b8 00 00 00 00 mov $0x0,%eax 80483c4: 85 c0 test %eax,%eax 80483c6: 74 09 je 80483d1 <frame_dummy+0x21> 80483c8: c7 04 24 24 9f 04 08 movl $0x8049f24,(%esp) 80483cf: ff d0 call *%eax 80483d1: c9 leave 80483d2: c3 ret 80483d3: 90 nop 080483d4 <main>: 80483d4: 55 push %ebp 80483d5: 89 e5 mov %esp,%ebp 80483d7: 83 e4 f0 and $0xfffffff0,%esp 80483da: 83 ec 10 sub $0x10,%esp 80483dd: c7 04 24 c0 84 04 08 movl $0x80484c0,(%esp) 80483e4: e8 07 ff ff ff call 80482f0 <puts@plt> 80483e9: b8 00 00 00 00 mov $0x0,%eax 80483ee: c9 leave 80483ef: c3 ret 080483f0 <__libc_csu_init>: 80483f0: 55 push %ebp 80483f1: 57 push %edi 80483f2: 56 push %esi 80483f3: 53 push %ebx 80483f4: e8 69 00 00 00 call 8048462 <__i686.get_pc_thunk.bx> 80483f9: 81 c3 fb 1b 00 00 add $0x1bfb,%ebx 80483ff: 83 ec 1c sub $0x1c,%esp 8048402: 8b 6c 24 30 mov 0x30(%esp),%ebp 8048406: 8d bb 20 ff ff ff lea -0xe0(%ebx),%edi 804840c: e8 9f fe ff ff call 80482b0 <_init> 8048411: 8d 83 20 ff ff ff lea -0xe0(%ebx),%eax 8048417: 29 c7 sub %eax,%edi 8048419: c1 ff 02 sar $0x2,%edi 804841c: 85 ff test %edi,%edi 804841e: 74 29 je 8048449 <__libc_csu_init+0x59> 8048420: 31 f6 xor %esi,%esi 8048422: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 8048428: 8b 44 24 38 mov 0x38(%esp),%eax 804842c: 89 2c 24 mov %ebp,(%esp) 804842f: 89 44 24 08 mov %eax,0x8(%esp) 8048433: 8b 44 24 34 mov 0x34(%esp),%eax 8048437: 89 44 24 04 mov %eax,0x4(%esp) 804843b: ff 94 b3 20 ff ff ff call *-0xe0(%ebx,%esi,4) 8048442: 83 c6 01 add $0x1,%esi 8048445: 39 fe cmp %edi,%esi 8048447: 75 df jne 8048428 <__libc_csu_init+0x38> 8048449: 83 c4 1c add $0x1c,%esp 804844c: 5b pop %ebx 804844d: 5e pop %esi 804844e: 5f pop %edi 804844f: 5d pop %ebp 8048450: c3 ret 8048451: eb 0d jmp 8048460 <__libc_csu_fini> 8048453: 90 nop 8048454: 90 nop 8048455: 90 nop 8048456: 90 nop 8048457: 90 nop 8048458: 90 nop 8048459: 90 nop 804845a: 90 nop 804845b: 90 nop 804845c: 90 nop 804845d: 90 nop 804845e: 90 nop 804845f: 90 nop 08048460 <__libc_csu_fini>: 8048460: f3 c3 repz ret 08048462 <__i686.get_pc_thunk.bx>: 8048462: 8b 1c 24 mov (%esp),%ebx 8048465: c3 ret 8048466: 90 nop 8048467: 90 nop 8048468: 90 nop 8048469: 90 nop 804846a: 90 nop 804846b: 90 nop 804846c: 90 nop 804846d: 90 nop 804846e: 90 nop 804846f: 90 nop 08048470 <__do_global_ctors_aux>: 8048470: 55 push %ebp 8048471: 89 e5 mov %esp,%ebp 8048473: 53 push %ebx 8048474: 83 ec 04 sub $0x4,%esp 8048477: a1 14 9f 04 08 mov 0x8049f14,%eax 804847c: 83 f8 ff cmp $0xffffffff,%eax 804847f: 74 13 je 8048494 <__do_global_ctors_aux+0x24> 8048481: bb 14 9f 04 08 mov $0x8049f14,%ebx 8048486: 66 90 xchg %ax,%ax 8048488: 83 eb 04 sub $0x4,%ebx 804848b: ff d0 call *%eax 804848d: 8b 03 mov (%ebx),%eax 804848f: 83 f8 ff cmp $0xffffffff,%eax 8048492: 75 f4 jne 8048488 <__do_global_ctors_aux+0x18> 8048494: 83 c4 04 add $0x4,%esp 8048497: 5b pop %ebx 8048498: 5d pop %ebp 8048499: c3 ret 804849a: 90 nop 804849b: 90 nop
4. 查看执行文件依赖的动态库
$ readelf -d /bin/dd Dynamic section at offset 0xc964 contains 25 entries: Tag Type Name/Value 0x00000001 (NEEDED) Shared library: [librt.so.1] 0x00000001 (NEEDED) Shared library: [libc.so.6] 0x0000000c (INIT) 0x8048ecc 0x0000000d (FINI) 0x80514ac
GCC生成的HELLO WORLD汇编语言分析
用c语言写一个hello world程序main1.c
#include <stdio.h> #include <stdlib.h> int main() { printf("hello world\n"); return 0; }
生成汇编代码
gcc -o1 -S main1.c
打开汇编文件
.file "main1.c" .section .rodata #.rodata用来保存只读数据的地方, 字串符"hello world"就是放在这里 .LC0: #标签, 标签名可以修改 .string "hello world" .text .globl main .type main, @function #定义函数 main: .LFB0: .cfi_startproc #函数开始标示 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl $.LC0, %edi #将字符串'hello world'放入edi寄存器,作为系统调用的参数 call puts #调用系统函数 movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc #函数结束标示 .LFE0: .size main, .-main .ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3" .section .note.GNU-stack,"",@progbits
objdump 反汇编的代码
00000000004004f4 <main>: 4004f4: 55 push %rbp 4004f5: 48 89 e5 mov %rsp,%rbp 4004f8: bf fc 05 40 00 mov $0x4005fc,%edi #0x4005fc字串地址是什么内容? 4004fd: e8 ee fe ff ff callq 4003f0 <puts@plt> #4003f0这是系统函数的地址,后面括号里就是对应的函数名 400502: b8 00 00 00 00 mov $0x0,%eax 400507: 5d pop %rbp 400508: c3 retq 400509: 90 nop 40050a: 90 nop 40050b: 90 nop 40050c: 90 nop 40050d: 90 nop 40050e: 90 nop 40050f: 90 nop
$0x4005fc这个字符串地址是什么内容?
从前的汇编文件可以看到,字符串是保存在.rodata这个section里的,对执行文件用objdump可以看到.rodata的内容,如果看不到汇编文件,那就只能靠猜了
$ objdump -d -j .rodata a.out a.out: file format elf64-x86-64 Disassembly of section .rodata: 00000000004005f8 <_IO_stdin_used>: 4005f8: 01 00 02 00 68 65 6c 6c 6f 20 77 6f 72 6c 64 00 ....hello world.
手写的hello world 汇编程序
这是一本讲AT&A的书里的例子,用来入门学习一下( Linux汇编AT&A 汇 编.pdf),生成的可执行文件真的比gcc生成的小很多(gcc: 7K, asm: 352byte)
编写hello.s汇编文件
#hello.s .data # 数据段声明 msg : .string "Hello, world!\n" # 要输出的字符串 len = . - msg # 字串长度 .text # 代码段声明 .global _start # 指定入口函数 _start: # 在屏幕上显示一个字符串 movl $len, %edx # 参数三:字符串长度 movl $msg, %ecx # 参数二:要显示的字符串 movl $1, %ebx # 参数一:文件描述符(stdout) movl $4, %eax # 系统调用号(sys_write) int $0x80 # 调用中断,进入内核调用 # 退出程序 movl $0,%ebx # 参数一:退出代码 movl $1,%eax # 系统调用号(sys_exit) int $0x80 # 调用内核功能
通过编译器as编译成可重定向文件hello.o
$ as hello.s -o hello.o
通过链接器ld链成可执行文件hello
$ ld hello.o -o hello
生成可执行文件后,再用objdump反编译看一下是什么样子
$ objdump -d hello hello: file format elf32-i386 Disassembly of section .text: 08048074 <_start>: 8048074: ba 0f 00 00 00 mov $0xf,%edx 8048079: b9 98 90 04 08 mov $0x8049098,%ecx 804807e: bb 01 00 00 00 mov $0x1,%ebx 8048083: b8 04 00 00 00 mov $0x4,%eax 8048088: cd 80 int $0x80 804808a: bb 00 00 00 00 mov $0x0,%ebx 804808f: b8 01 00 00 00 mov $0x1,%eax 8048094: cd 80 int $0x80
#在执行 int 80 指令时,寄存器 eax 中存放的是 系统调用的功能号,而传给系统调用的参数则必须按顺序放到寄存器 ebx,ecx,edx,esi,edi 中,当系统调用完成之
后,返回值可以在寄存器 eax 中获得