ELF, Executable and Linking Format, 是一种用于可执行文件、目标文件、共享库和核心转储的标准文件格式。  ELF格式是是UNIX系统实验室作为ABI(Application Binary Interface)而开发和发布的。

这里简单介绍一下相关历史:  

- UNIX:        最初采用的格式为a.out,之后被System V中的COFF取代,最后则被SVR4中的ELF格式所取代。  

- Windows:   采用的则是COFF格式的变种PE格式 

- MAC OS X: 采用的是Mach-O格式

ELF有四种不同的类型:  

1. 可重定位文件(Relocatable): 编译器和汇编器产生的.o文件,需要被Linker进一步处理  

2. 可执行文件(Executable): Have all relocation done and all symbol resolved except perhaps shared library symbols that must be resolved at run time  

3. 共享对象文件(Shared Object): 即动态库文件(.so)  

4. 核心转储文件(Core File): 

1.ELF文件结构 

可以从两个角度来描述ELF文件结构  

~1. Compilers,assemblers,linkers: 由Section header table描述的Sections组成  

~2. System loader: 由Program header table描述的Segments组成

Linux ELF格式分析_符号表

TIP:  
- A single segment usually consist of several sections.  
- Relocatable files have Section header tables. Executable files have Program header tables. Shared object files have both  
- Sections are intended for further processing by a linker, while the segments are intended to be mapped into memory  
- 只有ELF header是固定在文件的首部, 而Program header和Section header的位置则由ELF header指出

ELF数据表示: 六种数据类型(32-bit)


Name

Size

Alignment

Purpose

Elf32_Addr

4

4

Unsigned program address

Elf32_Off

4

4

Unsigned file offset

Elf32_Half

2

2

Unsigned medium interger

Elf32_Word

4

4

unsigned interger

Elf32_Sword

4

4

Signed interger

unsigned char

1

1

Unsigned small interger


@1: 

ELF header: 在文件开始处,描述了整个文件的组织,占用 52-bytes



#define EI_NIDENT (16)
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf32_Half e_type; /* Object file type */
Elf32_Half e_machine; /* Architecture */
Elf32_Word e_version; /* Object file version */
Elf32_Addr e_entry; /* Entry point virtual address */
Elf32_Off e_phoff; /* Program header table file offset */
Elf32_Off e_shoff; /* Section header table file offset */
Elf32_Word e_flags; /* Processor-specific flags */
Elf32_Half e_ehsize; /* ELF header size in bytes */
Elf32_Half e_phentsize; /* Program header table entry size */
Elf32_Half e_phnum; /* Program header table entry count */
Elf32_Half e_shentsize; /* Section header table entry size */
Elf32_Half e_shnum; /* Section header table entry count */
Elf32_Half e_shstrndx; /* Section header string table index */
} Elf32_Ehdr;


我们来看看一个最基本的ELF header



[root@bogon ~]# readelf -h a.out 
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x80482a0 /* e_entry */
Start of program headers: 52 (bytes into file) /* e_phoff */
Start of section headers: 1992 (bytes into file) /* e_shoff: See Starting address of section headers */
Flags: 0x0
Size of this header: 52 (bytes) /* e_ehsize */
Size of program headers: 32 (bytes) /* e_phentsize */
Number of program headers: 8 /* e_phnum */
Size of section headers: 40 (bytes) /* e_shentsize */
Number of section headers: 29 /* e_shnum */
Section header string table index: 26 /* e_shstrndx */


从elf header我们可以得到如下信息?

@2:

section header:  包含section的信息。

每个section header占 40-bytes (即e_shentsize大小)



/* Section header.  */
typedef struct
{
elf32_word sh_name; /* Section name (string tbl index) */
elf32_word sh_type; /* Section type */
elf32_word sh_flags; /* Section flags */
elf32_addr sh_addr; /* Section virtual addr at execution */
elf32_off sh_offset; /* Section file offset */
elf32_word sh_size; /* Section size in bytes */
elf32_word sh_link; /* Link to another section */
elf32_word sh_info; /* Additional section information */
elf32_word sh_addralign; /* Section alignment */
elf32_word sh_entsize; /* Entry size if section holds table */
} elf32_shdr;


Section Type(*sh_type*) 



PROGBITS:           This holds program contents including code, data, and debugger information. 
NOBITS: Like PROGBITS. However, it occupies no space.
SYMTAB and DYNSYM: These hold symbol table. [See below]
STRTAB: This is a string table, like the one used in a.out. [See below]
REL and RELA: These hold relocation information.
DYNAMIC and HASH: This holds information related to dynamic linking.


下面列举了一些常见的Section:



.text:  (PROGBITS:ALLOC+EXECINSTR)
可执行代码
.data: (PROGBITS:ALLOC+WRITE)
初始化数据
.rodata:(PROGBITS:ALLOC)
只读数据
.bss: (NOBITS:ALLOC+WRITE)
未初始化数据,运行时会置0
.rel.text, .rel.data, and .rel.rodata:(REL)
静态链接的重定位信息
.rel.plt: (REL)
The list of elements in the PLT, which are liable to the relocatio during the dynamic linking(if PLT is used)
.rel.dyn: (REL)
The relocation for dynamically linked functions(if PLT is not used)
.symtab:
符号表
.strtab:
字符串表
.shstrtab:
Section String Table, 段名表
.init, .fini: (PROGBITS:ALLOC+EXECINSTR)
程序初始化与终结代码段
.interp: (PROGBITS:ALLOC)
This section holds the pathname of a program interpreter.For present,this is used to run the run-time dynamic linker to load the program and to link in any required shared libraries.
.got, .plt: (PROGBIT)
动态链接的跳转表和全局入口表.


TIP: 符号表(symtab)和字符串表(strtab)的区别 
strtab就是记录ELF文件中的字符串常量,变量名等等 
symtab记录的则是函数和变量(符号), 主要用于链接时目标文件之间对地址的引用

下面是基本的Section header tables [0x7c8 = 1992]



[root@bogon ~]# readelf -s a.out 
there are 29 section headers, starting at offset 0x7c8:
section headers:
[nr] name type addr off size es flg lk inf al
[ 0] null 00000000 000000 000000 00 0 0 0
[ 1] .interp progbits 08048134 000134 000013 00 a 0 0 1
[ 2] .note.abi-tag note 08048148 000148 000020 00 a 0 0 4
[ 3] .hash hash 08048168 000168 000024 04 a 4 0 4
[ 4] .dynsym dynsym 0804818c 00018c 000040 10 a 5 1 4
[ 5] .dynstr strtab 080481cc 0001cc 000045 00 a 0 0 1
[ 6] .gnu.version versym 08048212 000212 000008 02 a 4 0 2
[ 7] .gnu.version_r verneed 0804821c 00021c 000020 00 a 5 1 4
[ 8] .rel.dyn rel 0804823c 00023c 000008 08 a 4 0 4
[ 9] .rel.plt rel 08048244 000244 000010 08 a 4 11 4
[10] .init progbits 08048254 000254 000017 00 ax 0 0 4
[11] .plt progbits 0804826c 00026c 000030 04 ax 0 0 4
[12] .text progbits 080482a0 0002a0 000198 00 ax 0 0 16
[13] .fini progbits 08048438 000438 00001c 00 ax 0 0 4
[14] .rodata progbits 08048454 000454 00000c 00 a 0 0 4
[15] .eh_frame_hdr progbits 08048460 000460 00001c 00 a 0 0 4
[16] .eh_frame progbits 0804847c 00047c 000058 00 a 0 0 4
[17] .ctors progbits 080494d4 0004d4 000008 00 wa 0 0 4
[18] .dtors progbits 080494dc 0004dc 000008 00 wa 0 0 4
[19] .jcr progbits 080494e4 0004e4 000004 00 wa 0 0 4
[20] .dynamic dynamic 080494e8 0004e8 0000c8 08 wa 5 0 4
[21] .got progbits 080495b0 0005b0 000004 04 wa 0 0 4
[22] .got.plt progbits 080495b4 0005b4 000014 04 wa 0 0 4
[23] .data progbits 080495c8 0005c8 000004 00 wa 0 0 4
[24] .bss nobits 080495cc 0005cc 000008 00 wa 0 0 4
[25] .comment progbits 00000000 0005cc 000114 00 0 0 1
[26] .shstrtab strtab 00000000 0006e0 0000e5 00 0 0 1
[27] .symtab symtab 00000000 000c50 000440 10 28 49 4
[28] .strtab strtab 00000000 001090 000249 00 0 0 1
key to flags:
w (write), a (alloc), x (execute), m (merge), s (strings)
i (info), l (link order), g (group), x (unknown)
o (extra os processing required) o (os specific), p (processor specific)


string table:

这里的string是以null结尾的字符序列,用来表示Symbol和Section的名称,用索引来引用该字符串 

对于Section string[.shstrtab] , ELF Header中的成员变量e_shstrndx则指明了所在Section, 

索引则保存在每个Elf32_Shdr的sh_name中

SeeMore

symbol table: 

定位和重定位程序的符号定义和引用

SeeMore

Relocation table:

SeeMore 

@3: 

Program header: 指出怎样创建进程映像,含有每个program header的入口

每个Program segment Header占 32-bytes(即e_phentsize大小)



typedef struct
{
Elf32_Word p_type; /* Segment type */
Elf32_Off p_offset; /* Segment file offset */
Elf32_Addr p_vaddr; /* Segment virtual address */
Elf32_Addr p_paddr; /* Segment physical address */
Elf32_Word p_filesz; /* Segment size in file */
Elf32_Word p_memsz; /* Segment size in memory */
Elf32_Word p_flags; /* Segment flags */
Elf32_Word p_align; /* Segment alignment */
} Elf32_Phdr;


Type of segment(*p_type*)



PT_PHDR:    Specifies the location and size of the program header table itself, both in the file and in the memory image of the program.
PT_LOAD: This segment is a loadable segment.
PT_DYNAMIC: This array element specifies dynamic linking information.
PT_INTERP: This element specified the location and size of a null-terminated path name to invoke as an interpreter.


下面是Program header实例



[root@bogon ~]# readelf -l a.out 
Elf file type is EXEC (Executable file)
Entry point 0x80482a0
There are 8 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
INTERP 0x000134 0x08048134 0x08048134 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x004d4 0x004d4 R E 0x1000
LOAD 0x0004d4 0x080494d4 0x080494d4 0x000f8 0x00100 RW 0x1000
DYNAMIC 0x0004e8 0x080494e8 0x080494e8 0x000c8 0x000c8 RW 0x4
NOTE 0x000148 0x08048148 0x08048148 0x00020 0x00020 R 0x4
GNU_EH_FRAME 0x000460 0x08048460 0x08048460 0x0001c 0x0001c R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag
06 .eh_frame_hdr
07


@4:

Section: 提供了目标文件的各项信息(如指令、数据、符号表、重定位信息等)

2. ELF文件分析

很多工具可以用来分析ELF文件

除了上面的readelf外,还有objdump,objcopy等   



# objdump -x /bin/ls                         # 查看ELF文件的section
# objdump -j .data -s /bin/ls # 显示指定section内容
#
# objcopy -O binary -j .text a.out text.bin # 将.text section导入到text.bin文件中


3. ELF文件解析

很多地方有对ELF文件的解析 Linux对ELF文件的加载: 

execve() –> sys_execve() –> do_execve() –> search_binary_handler() -elf-> load_elf_binary()/load_elf_library()

binutils中readelf很形象的解析了ELF文件

开源项目ELFToolChain

atratus/coLinux/LINE: 其中的ELF Loader值得参考


------------------越是喧嚣的世界,越需要宁静的思考------------------ 合抱之木,生于毫末;九层之台,起于垒土;千里之行,始于足下。 积土成山,风雨兴焉;积水成渊,蛟龙生焉;积善成德,而神明自得,圣心备焉。故不积跬步,无以至千里;不积小流,无以成江海。骐骥一跃,不能十步;驽马十驾,功在不舍。锲而舍之,朽木不折;锲而不舍,金石可镂。蚓无爪牙之利,筋骨之强,上食埃土,下饮黄泉,用心一也。蟹六跪而二螯,非蛇鳝之穴无可寄托者,用心躁也。