1, 物理内存信息的获取

 0x15中断,功能号:E820H, E801H, E88H

见文件:linux/arch/i386/boot/setup.S

# Try three different memory detection schemes. First, try # e820h, which lets us assemble a memory map, then try e801h, # which returns a 32-bit memory size, and finally 88h, which # returns 0-64m # method E820H: # the memory map from hell. e820h returns memory classified into # a whole bunch of different types, and allows memory holes and # everything. We scan through this memory map and build a list # of the first 32 memory areas, which we return at [E820MAP]. # This is documented at http://www.teleport.com/~acpi/acpihtml/topic245.htm #define SMAP 0x534d4150 meme820: xorl %ebx, %ebx # continuation counter movw $E820MAP, %di # point into the whitelist # so we can have the bios # directly write into it. jmpe820: movl $0x0000e820, %eax # e820, upper word zeroed movl $SMAP, %edx # ascii 'SMAP' movl $20, %ecx # size of the e820rec pushw %ds # data record. popw %es int $0x15 # make the call jc bail820 # fall to e801 if it fails cmpl $SMAP, %eax # check the return is `SMAP' jne bail820 # fall to e801 if it fails # cmpl $1, 16(%di) # is this usable memory? # jne again820 # If this is usable memory, we save it by simply advancing %di by # sizeof(e820rec). # good820: movb (E820NR), %al # up to 32 entries cmpb $E820MAX, %al jnl bail820 incb (E820NR) movw %di, %ax addw $20, %ax movw %ax, %di again820: cmpl $0, %ebx # check to see if jne jmpe820 # %ebx is set to EOF bail820: # method E801H: # memory size is in 1k chunksizes, to avoid confusing loadlin. # we store the 0xe801 memory size in a completely different place, # because it will most likely be longer than 16 bits. # (use 1e0 because that's what Larry Augustine uses in his # alternative new memory detection scheme, and it's sensible # to write everything into the same place.) meme801: stc # fix to work around buggy xorw %cx,%cx # BIOSes which dont clear/set xorw %dx,%dx # carry on pass/error of # e801h memory size call # or merely pass cx,dx though # without changing them. movw $0xe801, %ax int $0x15 jc mem88 cmpw $0x0, %cx # Kludge to handle BIOSes jne e801usecxdx # which report their extended cmpw $0x0, %dx # memory in AX/BX rather than jne e801usecxdx # CX/DX. The spec I have read movw %ax, %cx # seems to indicate AX/BX movw %bx, %dx # are more reasonable anyway... e801usecxdx: andl $0xffff, %edx # clear sign extend shll $6, %edx # and go from 64k to 1k chunks movl %edx, (0x1e0) # store extended memory size andl $0xffff, %ecx # clear sign extend addl %ecx, (0x1e0) # and add lower memory into # total size. # Ye Olde Traditional Methode. Returns the memory size (up to 16mb or # 64mb, depending on the bios) in ax. mem88: #endif movb $0x88, %ah int $0x15 movw %ax, (2)

      执行完上面的代码后,内存信息被分为多条信息放在E820MAP位置处,每个信息条目长20字节,包含一个内存区间的信息,条目数放在E820NR处。即实际上条目信息被放到了empty_zero_page的偏移为E820MAP(即0x2d0)处,条目数被放到偏移为E820NR(即0x1e8)处。

 

2, start_kernel()->setup_arch() 

       setup_arch()中与内存有关的主要工作为初始化一些与物理页面管理有关的数据结构,如描述整个物理内存信息的e820变量,描述内存结点的pg_data_t对象,启动时低端内存分配器bootmem_data对象,以及最终物理页面的主要管理机构zone-buddy system。

 

void __init setup_arch(char **cmdline_p) { unsigned long max_low_pfn; ... ... setup_memory_region(); ... ... max_low_pfn = setup_memory(); ... ... paging_init(); ... ... register_memory(max_low_pfn); ... ... }

 

      setup_memory_region()主要是把第一步中获得的内存信息从empty_zero_page中拷到e820变量中来。因为内存信息报告的区段有可能是凌乱的,比如重叠,顺序颠倒等,所以setup_memory_region()首先调用sanitize_e820_map()原地将信息进行整理,再调用copy_e820_map()将信息拷贝到e820变量中。如果信息有误,Linux保守的估计两个内存区段来初始化e820变量。

      setup_memory()主要完成高低端内存的划分,启动时低端内存分配器bootmem_data对象的初始化,并将e820中低端内存导入此分配器进行管理。

      setup_memory()调用find_max_low_pfn()进行高低端内存的划分;所谓低端内存是指内核直接映射的内存,高端内存指内核不直接映射的内存。因为Linux将3GB~4GB范围的1GB线性地址用于内核空间,而4GB顶端的128MB线性地址空间用于vmalloc以及fixed-address特殊映射,还剩下1GB-128MB=896MB的线性地址可用于直接映射物理内存,所以在物理内存充足的情况下,至少有896MB物理内存是直接映射到内核空间的,称为低端内存;大于896MB的不能直接映射到内核空间(因为线性地址不够)称为高端内存。显然896为低端内存的最大值提供了一个限定。然而,高低端内存的划分还必须可以由人为介入(由启动参数控制),尤其是当物理内存小于896MB的时候,在这种情况下,没有理由规定内核必须将所有的内存都直接映射到3GB~4GB中,所以可以通过highmem_pages变量来指定将多少内存“不用于”直接映射,即将多少内存用于高端内存。调用完find_max_low_pfn()以后,以下变量被设置:

      max_low_pfn: 低端内存的终止页框号

      max_pfn:         内存的终止页框号,即实际使用的物理页框数。它可能会小于真正存在的物理页框总数(e820中的最大页框数), 例如大于896MB的物理内存数多于用户指定的高端内存数时,以用户指定的为准,这样实际上有很多页框不会被使用。

     接着setup_memory()设置高端内存界限。highstart_pfn = max_low_pfn; highend_pfn = max_pfn; 显然[max_low_pfn, max_pfn)为高端内存。然后init_bootmem()初始化启动时内存分配器,用于在伙伴系统及slab分配器建立之前分配内存。分配器初始化后用register_bootmem_low_pages()将所有的低端内存页框纳入其管理之下。至此就可以使用此分配器将某些特殊作用的内存页面设置为保留(即永久性占有),或是使用其分配内存等等。

      paging_init()主要是设置swapper_pg_dir页全局目录,并使用之导入cr3,以及初始化伙伴系统管理机构,即内存结点对象和各个zone。pagetable_init()用来设置swapper_pg_dir页全局目录, 此页目录为其它进程页全局目录之内核空间映射的创建提供模板。此外还应该注意从实模式到保护模式时swapper_pg_dir的内容以及pg0, pg1, empty_zero_page这几个页面的位置(分别在内核映射起始处的第2,3,4,5页,实模式到保护模式切换时,线性空间0~8MB和0xc0000000~0xc0000000+8MB都映射到物理内存0~8MB上,这个8MB映射的页表就是pg0, pg1,即swapper_pg_dir[0]/swapper_pg_dir[768]指向pg0, swapper_pg_dir[1]/swapper_pg_dir[769]指向pg1)。paging_init()->zone_sizes_init()->free_area_init()->free_area_init_core()完成内存结点及zone的初始化,至此伙伴系统管理结构初始化完成,只等放入页面了。

      register_memory()主要是给内存注册总线地址空间。重点关注对总线地址空间的管理,两棵resoure树(iomem_resource/ioport_resource)。

 

3, start_kernel()->mem_init()

      运行到此,伙伴系统管理数据结构已初始化完毕,接着就只需要往里面放入物理页面了。mem_init()主要就做这件事。其调用free_pages_init()一方面把启动时分配器中空闲的低端内存页面放入伙伴系统,另一方面直接把[highstart_pfn, highend_pfn)中的高端内存页面放入伙伴系统。

 

void __init mem_init(void) { int codesize, reservedpages, datasize, initsize; if (!mem_map) BUG(); set_max_mapnr_init(); high_memory = (void *) __va(max_low_pfn * PAGE_SIZE); memset(empty_zero_page, 0, PAGE_SIZE); reservedpages = free_pages_init(); ... }


物理页面到了zone-buddy system中,往后就简单多了。