This is a complex book, which inquires into the quintessence of the executable file generated by the compiler of any product in the .Net world. The programs that form the core of this book have been written using the C# language. Therefore, the reader is expected to be soundly conversant with the C# programming language.
Before exploring the main concept of MetaData, which is allegedly the mainstay of the .Net world, we need to comprehend the Portable Executable (PE) file format. Metadata is one of the pathways of ushering in the PE file. The PE file is acquainted with the .Net infrastructure.
We have not put forth a protracted dissertation on MetaData rightaway, since this chapter primarily focuses on the unveiling of the Portable Executable file format.
To begin with, create a directory named mdata in the root drive. Within it, create a new file named b.cs with the following contents:
b.cs
public class zzz { public static void Main() { System.Console.WriteLine("hello"); } }
>csc b.cs
When the file b.cs is compiled using the csc command, it results in the generation of an executable file named b.exe. Our focal emphasis here is laid on acquiring requisite insight into the bytes that constitute this executable file. To discern this file, another program named a.cs is obligatory, which steals a look into this file and generates the output in different sections.
a.cs
using System; using System.IO; public class zzz { public static void Main() { zzz a = new zzz(); a.abc(); } public void abc() { FileStream s = new FileStream("C:\\mdata\\b.exe", FileMode.Open); BinaryReader r = new BinaryReader(s); byte a, b; a = r.ReadByte(); b = r.ReadByte(); Console.WriteLine("{0}{1}", (char)a, (char)b); } }
Output
MZ
On running the exe file generated by the compiler, an output of MZ is displayed. Stipulated below is a detailed account of how and why this occurs. So, let us venture forth and explore the intricacies of this exe file.
The .Net world provides the framework code in the form of classes. Thus, in order to handle files, there exists a FileStream class in the set of File handling classes. The constructor of this class takes two parameters: The first parameter is the name of the file that is to be worked on, viz. b.exe. The second parameter is an enum, which apprises the constructor about the action that is to be performed on the file.
For instance, it conveys whether the file is to be opened for reading or writing, or both, or for appending, etc. Since our intent is to read from the file, the 'Open' member of the enum FileMode is employed.
Once the file has been opened, we would like to read from the file. This could be effected either one byte at a time or one int at a time, i.e. 4 bytes. However, there are no functions in the FileStream class that extend this flexibility. Therefore, we employ another class, i.e. the BinaryReader class, which is equipped to read one byte, or one short, or one int at a time.
At the most primitive level, a file merely contains numbers ranging from 0 to 255. Therefore, one byte is adequate for storing a value. Using the ReadByte method from the BinaryReader class, two bytes are read, one at a time and stored in byte variables a and b, respectively. Prior to printing the values, the variables are cast to char, so that the ASCII equivalent is displayed in place of numbers.
The first two bytes of any file in the Microsoft world are M and Z. The presence of these two bytes has a historical significance. Before the world of Windows descended upon us, there existed Microsoft's first operating system named DOS. It was actually called QDOS, which aptly stood for 'Quick and Dirty Operating System'. Every file under DOS started with a series of bytes known as a header, which described the contents stored in the residual portion of the file.
Some form of identification was deemed imperative to determine the type of file. So, the person who designed the Memory Management System of DOS, decided to place the initials of his name i.e. MZ, as the magic number of the file. Along the same lines, every file in the Java programming language commences with the magic code of 'CAFE BABE', in honor of those who served coffee to the programmers working late in the night. This concept was promptly incorporated by the people who designed the MetaData, since they obviously did not wish to lag behind.
You are sure to wonder as to why we are so persistent about DOS, when we should actually be working under Windows!
The tacit reality is that, every executable program under Windows eventually boils down to being a DOS program. It carries the same header as does a DOS program.
We have downloaded and installed a program called UltraEdit-32 to view the contents of the file in the hex mode. On opening the file b.exe, in this program we see a screen which shows the hex as well as the character equivalent of the hex number. In screen 1.1 we have shown you the first few bytes of this file and it clearly portrays MZ as the starting characters in the file.
With the release of the Windows OS, Microsoft introduced a new file format called the PE or Portable Executable file format, which is radically at variance with the file format used under DOS. Since Microsoft was uncertain as to whether Windows would gain popular acceptance as an Operating System or not, they left the file format under DOS unaltered. This resulted in the concurrent existence of two file formats. Nobody could have foreseen at that stage, that Windows would subsequently evolve into being the most domineering operating system in the world.
It was a potent probability that a user would run the Windows PE file under the DOS environment. Microsoft realized that in such a situation, the display of an annoying error message could repulse the user from working with the Windows Operating System. Therefore, it mandated that every PE file be a valid DOS program. However, when such a program is executed under DOS, it displays a polite message to the user before exiting gracefully, wherein, the user is notified that the program in effect, is actually a Windows program.
A fact to be borne in mind is that this concern is flaunted only for a DOS user, ignoring the fact that there are only a handful of DOS programmers remaining in the world today.
a.cs
using System; using System.IO; public class zzz { public static void Main() { zzz a = new zzz(); a.abc(); } public void abc() { FileStream s = new FileStream("C:\\mdata\\b.exe", FileMode.Open); BinaryReader r = new BinaryReader(s); Console.WriteLine(s.Position); s.Seek(60, SeekOrigin.Begin); Console.WriteLine(s.Position); int i = r.ReadInt32(); Console.WriteLine(i); Console.WriteLine("0x" + i.ToString("X")); Console.WriteLine(s.Position); } }
Output
0
60
128
0x80
64
The Position property displays the current location within a file. A file that is recently opened has the position of 0, which signifies that it is positioned at the beginning of the file. To jump to any other part of the file, the Seek method in the FileStream object has to be pressed into action.
This method takes two parameters:
• The first parameter is a number or an offset, which is a position determined by the second parameter.
• The second parameter value must be a member from the enum SeekOrigin.
The Begin member in this enum object refers to the number of positions to be moved from the beginning of the file. Thus, s.Seek(60, SeekOrigin.Begin); signifies moving 60 bytes away from the beginning of the file.
The other two values of the enum are End, which denotes the end of the file, and Current, which denotes the current position. The first parameter is an offset from the position determined by the enum member. This value may either be positive or negative, if and when the enum value is Current.
The Seek function makes the 60th position active. In technical terms, the seek function positions the file pointer at the 60th byte in the file. This is established by using the Position property.
The 60th byte in the file belongs to the DOS header, where the 4 consecutive bytes, or an int, contain a value signifying the beginning of the PE header. To extract this value, the ReadInt32 method is used. The value returned by the function is 128,0x80, which is stored in the variable i. This attests the fact that the PE header begins at an offset of 128.
The last WriteLine also proves the fact that after reading bytes from the file, the file pointer moves ahead accordingly.
We will not ponder over the other bytes of the DOS header, since they apply to DOS and not to Windows.
a.cs
using System; using System.IO; public class zzz { public static void Main() { zzz a = new zzz(); a.abc(); } public void abc() { FileStream s = new FileStream("C:\\mdata\\b.exe", FileMode.Open); BinaryReader r = new BinaryReader(s); s.Seek(60, SeekOrigin.Begin); int i = r.ReadInt32(); s.Seek(i, SeekOrigin.Begin); byte a, b, c, d; a = r.ReadByte(); b = r.ReadByte(); c = r.ReadByte(); d = r.ReadByte(); Console.WriteLine("{0}{1} {2} {3}", (char)a, (char)b, c, d); } }
Output
PE 0 0
After jumping to the 128th byte i.e. 0x80 in the file, the program reads the next 4 bytes individually and stores them in byte variables.
The value extracted from this location is PE 00, which is the magic number or signature of a PE file. If any of the magic numbers are changed, it will result in the operating system generating an error.
a.cs
using System; using System.IO; public class zzz { public static void Main() { zzz a = new zzz(); a.abc(); } public void abc() { FileStream s = new FileStream("C:\\mdata\\b.exe", FileMode.Open); BinaryReader r = new BinaryReader(s); s.Seek(128 + 4, SeekOrigin.Begin); short machine = r.ReadInt16(); Console.WriteLine("Machine {0}", machine.ToString("X")); short sections = r.ReadInt16(); Console.WriteLine("Sections {0}", sections); int time = r.ReadInt32(); Console.WriteLine("Date Time Stamp {0}", time.ToString("X")); int pointer = r.ReadInt32(); Console.WriteLine("Pointer {0}", pointer.ToString("X")); int symbols = r.ReadInt32(); Console.WriteLine("Symbols {0}", symbols.ToString("X")); int headersize = r.ReadInt16(); Console.WriteLine("Size of Optional Header {0}", headersize); int characteristics = r.ReadInt16(); Console.WriteLine("Characteristics {0}", characteristics.ToString("X")); } }
Output
Machine 14C
Sections 3
Date Time Stamp 3C82927f
Pointer 0
Symbols 0
Size of Optional Header 224
Characteristics 10E
This program displays the PE header, which follows the PE signature at byte 128. Microsoft ensured that the PE file under any Windows operating system running on different types of processor chips, remained consistent. Therefore, they have documented every byte in the header of the executable file. This is precisely how we were able to ascertain what the header comprised of.
The first short that consists of two bytes, refers to the machine or the processor. A value of 0x014C denotes the Intel family. Other valid values are 0x162 for MIPS R3000 , 0x166 for MIPS R4000, and 0x183 for DEC Alpha AXP. The 64 bit Intel processor has a value of 0x200.
An executable file stores different entities. Three such entities are global data from our programs, actual code, and resources like menus, graphic files etc. The PE file assigns different areas to the above distinct entities. The varied locations assigned to the entities are termed as sections. The machine type is followed by a short data type, which contains the number of sections that the file embodies.
This is followed by the date and time at which the PE file was created. The number is stored in a long, which contains the number of seconds that have elapsed since 1st January 1970 Greenwich Mean Time (GMT). There are functions aplenty that can translate the above number into a human readable date.
The PE file format also encloses the OBJ, which is a compendium of object files created by the C/C++ compilers. The obj file normally contains functions, which in technical jargon are known as symbols. However, since we are dealing with an exe file at this stage, the symbols information has been zeroed out.
After this structure or header comes another header called the Image Optional Header. Its size is 224 bytes for 32 bit files. For 64 bit files, it has a size of 240 bytes. This value is stored in the PE header after the symbols count. As per the documentation, this value may change, but we have never encountered a PE file with the optional header of a size larger than 224 bytes.
The last part of the PE header is a field called characteristics. The value is displayed in a hex format using the ToString function. We will elucidate the characteristics field after a short diversion.
Bit-Wise Anding
a.cs
using System; public class zzz { public static void Main() { Console.Write(7 & 0x0a); } }
Output
2
The AND operator (&) requires the two compared values to be logically true, in order that the answer may be True. Here, in lieu of values, the entities that are checked are each and every individual bits in the byte.
For a value of 7, the first three bits are 1, whereas, for the letter A, the second and fourth bits are 1, i.e. these bits are set on.
On anding the two values of 7 and 0x0a, only the second bit position is switched on. Therefore, the answer is 2.
The next program uses the and operator extensively to explain the value that is contained in the characteristics field of the PE Header.
a.cs
using System; using System.IO; public class zzz { public static void Main() { zzz a = new zzz(); a.abc(); } public void abc() { FileStream s = new FileStream("C:\\mdata\\b.exe", FileMode.Open); BinaryReader r = new BinaryReader(s); s.Seek(128 + 4, SeekOrigin.Begin); s.Seek(2 + 2 + 4 + 4 + 4 + 2, SeekOrigin.Current); int characteristics = r.ReadInt16(); Console.WriteLine("Characteristics {0}", characteristics.ToString("X")); int i = characteristics & 0x001; if (i == 1) Console.WriteLine("Relocs Stripped {0}", i); if ((characteristics & 0x002) == 2) Console.Write("Executable Image "); if ((characteristics & 0x004) == 0x004) Console.Write("Line Numbers Stripped "); if ((characteristics & 0x008) == 0x008) Console.Write("Local Symbols Stripped "); if ((characteristics & 0x010) == 0x010) Console.Write("Trim Local Set "); if ((characteristics & 0x020) == 0x020) Console.Write("Can Handle Address Larger than 2Gb "); if ((characteristics & 0x080) == 0x080) Console.Write("Bytes Reversed "); if ((characteristics & 0x0100) == 0x0100) Console.Write("32 Bit Machine "); if ((characteristics & 0x0200) == 0x0200) Console.Write("Debugging Info Stripped "); if ((characteristics & 0x0400) == 0x0400) Console.Write("Removable Media Swap "); if ((characteristics & 0x0800) == 0x0800) Console.Write("Net Swap "); if ((characteristics & 0x1000) == 0x1000) Console.Write("System File "); if ((characteristics & 0x2000) == 0x2000) Console.Write("Dll "); if ((characteristics & 0x4000) == 0x4000) Console.Write("Uni-Processor Only "); if ((characteristics & 0x8000) == 0x8000) Console.Write("High Bytes Reversed"); } }
Output
Characteristics 10E
Executable Image Line Numbers Stripped Local Symbols Stripped 32 Bit Machine
In the above example, we have hard-coded most of the values. Since we have already unearthed the DOS header and the location of the PE signature, we shall not write code for it. Instead, using constant values, we shall directly jump to the position that is of significance to us.
After reaching the PE header, we move a few bytes ahead to arrive at the characteristics field. This field is 16 bits wide and it represents the nature of the file. The file can be an executable file having the extension of .exe, or a library file having the extension of .dll.
To access information in a swifter and more effective manner, bit-wise ANDing is incorporated. Here, every bit of the characteristic field represents a single property. By performing a check as to which bits are on, the properties pertaining to the file can be ascertained.
In bit-wise operations, when we AND a bit with a value of 1, we obtain the original bit. However, when we AND a bit-wise with a 0, the resultant value is 0. Thus, to check whether a certain bit is on or not, a bit-wise AND operation is executed with a number pertaining to the bit. All the other bits are set off. If the resultant answer is 0, it indicates that the original bit was off. If the answer is the same as the number that the bit has been ANDed with, it means that the bit was on.
The characteristics field is bitwise ANDed with a value, to ascertain whether the bits are on or not. If the answer returned is 1, it signifies that the bit is on.
Therefore, in the 'if' statement, a check is performed to determine the value of the variable i. If it is 1, then it is assumed that the particular property is present.
In the subsequent examples, we simply shun the use of the variable i, and use the expression in the 'if' statement itself. Here, the brackets are imperative since the & operator has a lower precedence than the == operator.
Let us now analyse each one of these bits individually.
The first bit in the characteristics field pertains to relocation. The next bit denotes whether the file is an executable file or not. Since the bit is on, it signifies that the file is an executable file.
The next bit verifies whether the line numbers have been stripped from the file or not. If they have been, it shrinks the size of the file. This bit is used mainly for debugging purposes. Since the value is on, it proves that the line numbers have been stripped off from the file. The local symbols have also been divested off from the file.
The Working Set has not been aggressively stripped off from the file.
This file is incapable of handling memory that has more than 2 giga bytes of addresses or memory locations.
The bytes when stored can be reversed. This handles the Little endian Big endian problem, wherein it is determined whether the small byte is stored first or second.
The next bit determines if the file has been created on a 32-bit machine. The subsequent bit checks for the presence of debugging information. This debugging information augments the size of the file. However, our file has no debugging information stored in it.
The next two bits indicate whether the exe file is on a removable media or on the net. If it is on the net, it should be copied from the media and then executed from the swap file.
The following two bits indicate whether the file is a System file or a DLL. This is the only dissimilarity between an exe file and a dll file.
This is followed by a flag, which is indicative of whether the file should be run on a uni-processor machine or a multi-processor machine. We have not imposed any such restrictions. The last bit is the bytes reversal bit.
As per the specifications quoted in Partition II 24.2.2.1, a .Net file has three flags on, which imply that the Line Numbers, the Local symbols and the Debug information have been stripped off. The specifications are available as a PDF file on the ECMA site.
a.cs
using System; using System.IO; public class zzz { public static void Main() { zzz a = new zzz(); a.abc(); } public void abc() { FileStream s = new FileStream("C:\\mdata\\b.exe", FileMode.Open); BinaryReader r = new BinaryReader(s); s.Seek(128 + 4 + 20, SeekOrigin.Begin); int magic = r.ReadInt16(); Console.WriteLine("Magic Number {0}", magic.ToString("X")); int major = r.ReadByte(); Console.WriteLine("Major Linker Version {0}", major); int minor = r.ReadByte(); Console.WriteLine("Minor Linker Version {0}", minor); int sizeofcode = r.ReadInt32(); Console.WriteLine("Size of code {0}", sizeofcode); int sizeofdata = r.ReadInt32(); Console.WriteLine("Size of Data {0}", sizeofdata); int sizeofudata = r.ReadInt32(); Console.WriteLine("Size of Data {0}", sizeofudata); int entrypoint = r.ReadInt32(); Console.WriteLine("Memory Address {0}", entrypoint.ToString("X")); int baseofcode = r.ReadInt32(); Console.WriteLine("Base of Code {0}", baseofcode.ToString("X")); int baseofdata = r.ReadInt32(); Console.WriteLine("Base of Data {0}", baseofdata.ToString("X")); int ImageBase = r.ReadInt32(); Console.WriteLine("Image base {0}", ImageBase.ToString("X")); int sectiona = r.ReadInt32(); Console.WriteLine("Section Alignment {0}", sectiona.ToString("X")); int filea = r.ReadInt32(); Console.WriteLine("File Alignment {0}", filea.ToString("X")); int majoros = r.ReadInt16(); Console.WriteLine("Major Operating System Version {0}", majoros.ToString("X")); int minoros = r.ReadInt16(); Console.WriteLine("Minor Operating System Version {0}", minoros.ToString("X")); int majorimage = r.ReadInt16(); Console.WriteLine("Major Image Version {0}", majorimage.ToString("X")); int minorimage = r.ReadInt16(); Console.WriteLine("Minor Image Version {0}", minorimage.ToString("X")); int majorsubsystem = r.ReadInt16(); Console.WriteLine("Major Subsystem Version {0}", majorsubsystem.ToString("X")); int minorsubsystem = r.ReadInt16(); Console.WriteLine("Minor Subsystem Version {0}", minorsubsystem.ToString("X")); int verison = r.ReadInt32(); Console.WriteLine("Version {0}", verison.ToString("X")); int imagesize = r.ReadInt32(); Console.WriteLine("Image Size {0}", imagesize); int sizeofheaders = r.ReadInt32(); Console.WriteLine("Size of Headers {0}", sizeofheaders); int checksum = r.ReadInt32(); Console.WriteLine("CheckSum {0}", checksum); int subsystem = r.ReadInt16(); Console.WriteLine("Subsystem {0}", subsystem); int dllflags = r.ReadInt16(); Console.WriteLine("Dll flags {0}", dllflags); int stackreserve = r.ReadInt32(); Console.WriteLine("Stack Reserve {0}", stackreserve.ToString("X")); int stackcommit = r.ReadInt32(); Console.WriteLine("Stack Commit {0}", stackcommit.ToString("X")); int heapreserve = r.ReadInt32(); Console.WriteLine("Heap Reserve {0}", heapreserve.ToString("X")); int heapcommit = r.ReadInt32(); Console.WriteLine("Heap Commit {0}", heapcommit.ToString("X")); int loader = r.ReadInt32(); Console.WriteLine("Loader flags {0}", loader.ToString("X")); int datad = r.ReadInt32(); Console.WriteLine("Number of Data Directories {0}", datad); } }
Output
Magic Number 10B
Major Linker Verison 6
Minor Linker Verison 0
Size of code 1024
Size of Data 1536
Size of Data 0
Memory Address 22BE
Base of Code 2000
Base of Data 4000
Image base 400000
Section Alignment 2000
File Alignment 200
Major Operating System Version 4
Minor Operating System Version 0
Major Image Version 0
Minor Image Version 0
Major Subsystem Version 4
Minor Subsystem Version 0
Version 0
Image Size 32768
Size of Headers 512
CheckSum 0
Subsystem 3
Dll flags 0
Stack Reserve 100000
Stack Commit 1000
Heap Reserve 100000
Heap Commit 1000
Loader flags 0
Number of Data Directories 16
Close on the heels of the 20-byte PE header comes the Image Optional header, which is 224 bytes large. The header is optional only in the case of obj and lib files.
The first two bytes are magic numbers. A value of 10b signifies a 32-bit header, while a value of 20b signifies a 64-bit header, which is what we aspire to graduate to some day.
In the days preceding .Net, the compiler and the linker were separate products. The linker worked with diverse programming languages, while the compiler was language specific. However, the .Net world has merged the next two bytes of the Image Optional Header to contain the version number. Earlier, the major version and the minor version of the product that created this file were stored. This number normally denotes the version of Visual Studio that created this file.
The next int specifies the Size of the code field. This field, as the name suggests, indicates the size of the code present in all the sections.
Following it are two fields that embody the sizes of the initialized and un-initialized data, respectively. In most cases, the size of the un-initialized data is zero.
The next int refers to the memory location containing the first byte of the executable code at runtime. This is called the address of the entry point. In the case of dll files, its value is normally zero.
The next int points to the memory location, wherein the section that is carrying code shall be loaded. Succeeding this is the memory location for the data section. You may notice that the code section begins at 0x2000 bytes from the loaded position of the file, and the data section begins at 0x4000 bytes from the start.
The Image base stores the location in memory where the PE file is loaded. All PE files are standardized in that they are all loaded at memory location 0x400000.
The section alignment refers to the number of bytes that are contained in a section. The section alignment is set to 0x2000. This implies that the code section starts at 0x2000 and the next section starts at 0x4000.
The file alignment refers to the start of every section in the file. It is set to 512 bytes or 0x200. Thus, the first section starts 512 bytes from the start, the second section starts at 1024 bytes, and so on.
When an executable file is loaded into memory, the initial header bytes in the file are initially placed at the image base of 0x400000. Thereafter, the code section that commences at a boundary of 512 bytes on the disk is loaded at 0x2000 from the image base. The next boundary of 512 bytes on disk, which is the data section, is loaded at 0x4000 from the image base.
The alignment is followed by the major and minor version numbers of the operating system. These have become antiquated since the versions of Windows keep changing with regular frequency. The next two shorts are the major and minor image version numbers that are set by the linker. These bytes are of no utility in the .Net file.
The next two words are the major and minor versions of the Subsystem or the operating system that is essential for running the PE file. These values too are of no use here. One more unexploited field is the Win32 version number, which is always set to 0.
The image size field that follows next, is indicative of the quantum of memory that the operating system needs to store the entire image in memory. The size of the headers represents the size of all the headers, i.e. the DOS header, the PE header, the optional header, as well as, the sections that we shall not touch upon. The value is a multiple of the file alignment, i.e. 512 bytes.
The checksum is a concept that is used to verify whether a file is intact or has been corrupted. Even though the PE file has a field for it, it is never used and hence, its value is zero.
The Subsystem field refers to the user interface type required from Windows. The value of 3 signifies that the program takes a Windows GUI. Other possible values are Console and Native. As the file is not a DLL file, the DLL flags field has a value of zero.
The Stack reserve size field determines the stack area that the thread can use. Normally, the value is 1 MB. However, the application is not allocated the same at startup time. The Stack Commit is the amount of memory that the stack is assigned at startup.
The stack is where all the variables created in the functions are stored, whereas, the heap is used for storing the instance variables. Thus, the next two fields that follow, determine the amount of heap area. The values remain the same for the stack. This is followed by the Loader flags field, which again has been rendered inutile today.
The last field following the 96 bytes of header details, is the count of Data Directories that are present in the file. There are a total of 16 Data Directories. The details of the Data Directories are highlighted in the next program.
a.cs
using System; using System.IO; public class zzz { public static void Main() { zzz a = new zzz(); a.abc(); } public void abc() { FileStream s = new FileStream("C:\\mdata\\b.exe", FileMode.Open); BinaryReader r = new BinaryReader(s); s.Seek(128 + 4 + 20 + 96, SeekOrigin.Begin); int rva, size; rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Export Table RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Import Table RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Win32 Resource Table RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Exception Table RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Certificate Table RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Base Relocation Table RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Debug Table RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Copyright Table RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Mips Global Ptr RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("TLS RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Load Config RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Bound Import RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("IAT RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Delay Import Descriptor RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("CLR Header RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Reserved RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); } }
Output
Export Table RVA=0 Size=0
Import Table RVA=226C Size=4F
Win32 Resource Table RVA=4000 Size=318
Exception Table RVA=0 Size=0
Certificate Table RVA=0 Size=0
Base Relocation Table RVA=6000 Size=C
Debug Table RVA=0 Size=0
Copyright Table RVA=0 Size=0
Mips Global Ptr RVA=0 Size=0
TLS RVA=0 Size=0
Load Config RVA=0 Size=0
Bound Import RVA=0 Size=0
IAT RVA=2000 Size=8
Delay Import Descriptor RVA=0 Size=0
CLR Header RVA=2008 Size=48
Reserved RVA=0 Size=0
The optional header also contains 16 data directory structures, each of which is 8 bytes large. The structures contain significant information about certain areas of the PE file. The 8-bytes in the structure comprise of two ints, one known as the Relative Virtual Address (RVA) and the second as Size.
The above program displays the 16 structures. One of the above listed structures is the CLR header. CLR is the acronym for for Common Language Runtime Infrastructure. The RVA value is shown as 2008 and the size is given as 0x48. This implies that when b.exe gets executed, the CLR header will be at memory location 0x2008 from the image base of 0x4000000; in other words, it shall be positioned at memory location 0x4002008.
The section alignment in memory is 0x2000, and since the RVA for the CLR header is 2008, on subtracting 2000 from 2008, the difference comes to 8. Thus, the CLR header is placed 8 bytes away from the start of the section.
A file on disk has the alignment of 512 bytes. Therefore, the first section would start at position 512 from the start of the file. As the CLR is 8 bytes away from the section start, 8 is added to 512, (section start for a file on disk), thereby arriving at a value of 520. The next 72 bytes (0x48) are picked up from this position, since they constitute the CLR header, and they are loaded at location 0x4002008.
We shall now briefly explicate the Data Directories displayed above.
A PE file allows other PE files to call its functions, provided they are marked as Exports. In the same manner, an executable calls code or imports code from other DLL or EXE files.
The first two tables list out these Exports and Imports. The third entry in the Data Directory points to the resources dwelling in the PE file. The next entry points at the table of exceptions. All CPUs barring the 486, incorporate such a table. The Certificate table is next in sequence, which is not an RVA, but a file offset. Relocation is a method by means of which, PE files can be loaded anywhere in memory.
This is followed by the Debug directory and the Copyright directory. In some cases, it is the architectural specific data. The Global Ptr table is used only on 64 bit machines. Threads use the Thread Local Storage initialization section. The Load Config is only used in Windows NT, Windows 2000 and Windows XP.
The Bound Import contains the details of the Dll files that this PE file is bound to.
The Import Address Table (IAT) points in the direction of the first Import Address Table and deals with Dlls. The Delay Loading Dlls are implemented by the linker for the runtime libraries. The operating system has no cognizance of it. The CLR header or the Com descriptor is the most important table from our viewpoint, as it points towards the first .Net header.
The next program focuses primarily on the CLR structure.
a.cs
using System; using System.IO; public class zzz { public static void Main() { zzz a = new zzz(); a.abc(); } public void abc() { FileStream s = new FileStream("C:\\mdata\\b.exe", FileMode.Open); BinaryReader r = new BinaryReader(s); s.Seek(128 + 4 + 20 + 96 + 112, SeekOrigin.Begin); int rva, size; rva = r.ReadInt32(); size = r.ReadInt32(); int where = rva % 0x2000 + 512; Console.WriteLine(where); s.Seek(where, SeekOrigin.Begin); size = r.ReadInt32(); Console.WriteLine("CLR Header size {0}", size); int majorruntimeversion; majorruntimeversion = r.ReadInt16(); Console.WriteLine("Major Runtime Version {0}", majorruntimeversion); int minorruntimeversion; minorruntimeversion = r.ReadInt16(); Console.WriteLine("Minor Runtime Version {0}", minorruntimeversion); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("MetaData RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); int flags = r.ReadInt32(); Console.Write("Flags "); if ((flags & 0x01) == 0x01) Console.Write("ILONLY "); if ((flags & 0x02) == 0x02) Console.Write("32 Bit Required "); if ((flags & 0x08) == 0x08) Console.Write("Strong Name Signature "); if ((flags & 0x010000) == 0x010000) Console.Write("Track Debug Data "); Console.WriteLine(); int entrypointtoken = r.ReadInt32(); Console.WriteLine("Entry Point Token {0}", entrypointtoken.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Resources RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Strong Name Signature RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Code Manager Table RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("VTable Fixups RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Export Address Table Jumps RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); rva = r.ReadInt32(); size = r.ReadInt32(); Console.WriteLine("Managed Native Header RVA={0} Size={1}", rva.ToString("X"), size.ToString("X")); } }
Output
520
CLR Header size 72
Major Runtime Version 2
Minor Runtime Version 0
MetaData RVA=207C Size=1F0
Flags ILONLY
Entry Point Token 6000001
Resources RVA=0 Size=0
Strong Name Signature RVA=0 Size=0
Code Manager Table RVA=0 Size=0
VTable Fixups RVA=0 Size=0
Export Address Table Jumps RVA=0 Size=0
Managed Native Header RVA=0 Size=0
The second last Data Directory entry is a structure that contains the RVA and the size for the CLR header. The position of the header in Data Directories remains the same for 32 bit machines, but varies for 64 bit machines. It is another matter altogether that we are yet to meet the proud owner of a 64 bit machine, if there is one at all.
In the file, we place the file pointer at the 360th location in order to read the RVA and Size for the CLR header. We arrived at the 360th location by adding the 148 bytes of the PE header to the 208 bytes of the 14 structures, preceding the CLR header. Every structure has a size of 8 bytes. Hence, the number of bytes to be foregone is 112.
The RVA assigns the memory location of 0x2008. In order to ascertain the exact location in the section, we use the modulus operator of this value, with the value of 2000. The remainder value of 8 that is obtained, determines the location of the base of the code of the header. To pick up data from the file, the file alignment of 512 is added to the 8 bytes. The first 512 bytes are skipped, since they contain all the header details. Thus, the CLR header starts 520 bytes from the start of the file.
This header starts with a size field. Thus, the CLR header has a size of 0x48 or 72. This is followed by two shorts, which denote the major and minor runtime version numbers.
The values of 2 and 0, stored in the two fields, are the version numbers of the runtime that are expected when the executable is run.
This is followed by the RVA and the size of the metadata. Ultimately, we have arrived at the very quintessence of this book. The next chapter is devoted to the explanation of the metadata.
Next in sequence is the Flags field that, like the Characteristics field, works at the bit level.
In our case, only the first bit is on, indicating that the image is an IL image. The second bit reveals the system on which the code can be executed, i.e. either 32 bits or 64 bits. If the bit is on, the executable will run only on a 32 bit machine, and a 64 bit runtime would not load the program.
The fourth bit refers to a strong name signature and the following bit has the track debug data flag, which is always zero.
We shall revisit the Entry Point Token in due course.
Following the flag field is a series of Data Directories, where the first one refers to the resources and the second one refers to the String Name Signature. This structure points to the hash data for the PE file, which is used by the loader for binding and versioning.
As per the documentation, the CodeManagerTable structure that follows next, shall always be zero. Virtual functions in a class use a VTable to perform their magic, and thus, a VtableFixups is vital. The last two Data Directories, which are the Export Address Table Jumps and Managed Native Header, always have a value of zero each.
We have left many things unattended here, but with an assurance that we shall expound each one of them, before this book reaches its culmination.