ACPI_BIOS_USING_OS_MEMORY


最近我们的BIOS碰到一个奇怪的bug,最初是插上4G的Memory, BIOS Setup中只能显示3G,后来BIOS修改代码以后总算在Setup menu里面显示出了4G。显示虽然对了可再也进不去OS了,每次都是蓝底白字直接blue screen。死状如下图1所示。


 

          可能是因为SWPM知道我之前有写过一篇讲述ACPI debug的文章,于是她就请我support让我帮帮分析这个问题。说实话蓝屏分析我实在没经 验,从没分析过L。不过最近也不是很忙,那就当作业练习一下了。架起WinDbg,将1394转PCIE的接口插到板子上(开发板上没有1394接口)。然后让debuggee run,一旦蓝屏WinDbg就会被断下,让我们来看看蓝屏的信息吧。 

Waiting to reconnect...
Connected to Windows 6001 x86 compatible target, ptr64 FALSE
Kernel Debugger connection established.
Symbol search path is: D:/Vista32-sp1symbol;D:/websymbols-sp1;C:/WINNT/Symbols
Executable search path is: 
Windows Kernel Version 6001 MP (1 procs) Free x86 compatible
Built by: 6001.18000.x86fre.longhorn_rtm.080118-1840
Kernel base = 0x8203d000 PsLoadedModuleList = 0x82154c70
System Uptime: not available
 
*** Fatal System Error: 0x000000a5
                       (0x00001000,0x00000000,0xFFFFFF00,0x00000105)
 
Break instruction exception - code 80000003 (first chance)
 
A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.
 
A fatal system error has occurred.
 
Connected to Windows 6001 x86 compatible target, ptr64 FALSE
Loading Kernel Symbols
........................................
Loading User Symbols
 
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
 
Use !analyze -v to get detailed debugging information.
 
BugCheck A5, {1000, 0, ffffff00, 105}
 
Probably caused by : acpi.sys ( acpi!MapPhysMem+39 )
 
Followup: MachineOwner
---------
 
nt!RtlpBreakWithStatusInstruction:
820f5514 cc              int     3

由上面的信息我们知道系统发生了致命的错误错误代码为0x000000a5。Fatal System Error: 0x000000a5而且这个错误应该是由于acpi.sys (acpi!MapPhysMem+39)导致的。能获得的信息就是这么多了。更详细的信息需要输入

0: kd> !analyze –v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
 
ACPI_BIOS_ERROR (a5)
The ACPI Bios in the system is not fully compliant with the ACPI specification.
The first value indicates where the incompatibility lies:
This bug check covers a great variety of ACPI problems.  If a kernel debugger
is attached, use "!analyze -v".  This command will analyze the precise problem,
and display whatever information is most useful for debugging the specific
error.
Arguments:
Arg1: 00001000, ACPI_BIOS_USING_OS_MEMORY
   ACPI had a fatal error when processing a memory 
operation region.The memory operation region tried to 
map memory that has been allocated for OS usage.
Arg2: 00000000, The high portion of the physical address 
 of the memory region.
Arg3: ffffff00, The low portion of the physical address 
 of the memory region.
Arg4: 00000105, The length of memory being mapped.
 
Debugging Details:
------------------
 
 
DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT
 
BUGCHECK_STR:  0xA5
 
PROCESS_NAME:  System
 
CURRENT_IRQL:  2
 
LOCK_ADDRESS:  821715e0 -- (!locks 821715e0)
 
Resource @ nt!PiEngineLock (0x821715e0)    Exclusively owned
     Threads: 85d28668-01<*> 
1 total locks, 1 locks currently held
 
PNP_TRIAGE: 
   Lock address  : 0x821715e0
   Thread Count  : 1
   Thread address: 0x85d28668
   Thread wait   : 0x28
 
LAST_CONTROL_TRANSFER:  from 8210a2d7 to 820f5514
 
STACK_TEXT:  
8039920c 8210a2d7 00000003 bac8867b 00000000 nt!RtlpBreakWithStatusInstruction
8039925c 8210adbd 00000003 00000105 ffffff00 nt!KiBugCheckDebugBreak+0x1c
80399628 8210a163 000000a5 00001000 00000000 nt!KeBugCheck2+0x66d
8039964c 80627376 000000a5 00001000 00000000 nt!KeBugCheckEx+0x1e
80399674 80627f9b ffffff00 00000105 85d60e58 acpi!MapPhysMem+0x39
80399690 8062aba1 85d5f000 ffffff00 00000105 acpi!MapUnmapPhysMem+0x2f
803996b4 8062f4cb 85d5f000 00000000 00008000 acpi!OpRegion+0xcb
803996d0 806287dc 85d5f000 85d60e58 00000000 acpi!ParseTerm+0x14d
803996f8 80629c75 00000000 00000000 806374b8 acpi!RunContext+0x65
80399710 80629d40 85d5f000 00000000 806374b8 acpi!InsertReadyQueue+0xa7
8039972c 8062991b 85d5f000 00000000 0000205b acpi!RestartContext+0x27
80399768 80623af9 85d5f000 86121380 85d5f000 acpi!SyncLoadDDB+0xde
8039977c 8063c4b9 ffd19010 8039979c 80636fc4 acpi!AMLILoadDDB+0x66
80399794 8063c521 86a972b0 00000000 8200df60 acpi!ACPIInitializeDDB+0x37
803997b0 8063c645 8200dec0 80637b60 80637d00 acpi!ACPIInitializeDDBs+0x47
803997c4 8061b43c 86a97bf8 86a97a38 00000000 acpi!ACPIInitialize+0xe9
803997f4 80640f7c 86a97bf8 86a7e3b0 80640ec8 acpi!ACPIInitStartACPI+0x6a
80399820 80616e4b 86a97bf8 86a7e3b0 86a7e3b0 acpi!ACPIRootIrpStartDevice+0xb4
80399850 820f9053 86a97bf8 86a97a38 803998cc acpi!ACPIDispatchIrp+0xff
80399868 821a1605 00000000 85d104c0 86a97878 nt!IofCallDriver+0x63
80399884 8204912a 803998a8 82048f47 86a97878 nt!PnpAsynchronousCall+0x96
803998d0 821a24f6 82048f47 86a97878 86a7edc8 nt!PnpStartDevice+0xb7
8039992c 821a23b1 86a97878 00000012 00000000 nt!PnpStartDeviceNode+0x13a
80399948 8219f4db 00000000 00000000 8216f530 nt!PipProcessStartPhase1+0x65
80399b44 820489e8 85d58260 00000000 80399b88 nt!PipProcessDevNodeTree+0x187
80399b9c 820488f8 00000000 86a7e5d0 833fa3b8 nt!PnpDeviceActionWorker+0xde
80399bb8 82396078 00000000 00000007 00000000 nt!PnpRequestDeviceAction+0x127
80399c34 82398ff1 808108c4 8080e430 00000000 nt!IopInitializeBootDrivers+0x3b0
80399c94 8239ccb3 808108c4 85d28990 85d28668 nt!IoInitSystem+0x5af
80399d74 82195af1 80399dc0 82212a1c 808108c4 nt!Phase1InitializationDiscard+0xb86
80399d7c 82212a1c 808108c4 bac889e7 00000000 nt!Phase1Initialization+0xd
80399dc0 8206ba3e 82195ae4 808108c4 00000000 nt!PspSystemThreadStartup+0x9d
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16
 
 
STACK_COMMAND:  kb
 
FOLLOWUP_IP: 
acpi!MapPhysMem+39
80627376 6a00            push    0
 
SYMBOL_STACK_INDEX:  4
 
FOLLOWUP_NAME:  MachineOwner
 
MODULE_NAME: acpi
 
IMAGE_NAME:  acpi.sys
 
DEBUG_FLR_IMAGE_TIMESTAMP:  47918b80
 
SYMBOL_NAME:  acpi!MapPhysMem+39
 
FAILURE_BUCKET_ID:  0xA5_acpi!MapPhysMem+39
 
BUCKET_ID:  0xA5_acpi!MapPhysMem+39
 
Followup: MachineOwner
---------
 
0: kd> lmvm acpi
start    end        module name
8060e000 80654000   acpi       (pdb symbols)          C:/WINNT/Symbols/sys/acpi.pdb
    Loaded symbol image file: acpi.sys
    Image path: /SystemRoot/system32/drivers/acpi.sys
    Image name: acpi.sys
    Timestamp:        Fri Jan 18 21:32:48 2008 (47918B80)
    CheckSum:         00041A1F
    ImageSize:        00046000
    File version:     6.0.6001.18000
    Product version:  6.0.6001.18000
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        3.7 Driver
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft? Windows? Operating System
    InternalName:     ACPI.sys
    OriginalFilename: ACPI.sys
    ProductVersion:   6.0.6001.18000
    FileVersion:      6.0.6001.18000 (longhorn_rtm.080118-1840)
    FileDescription:  ACPI Driver for NT
    LegalCopyright:   ? Microsoft Corporation. All rights reserved.

这次信息比较详细了,给出了call stack以及acpi.pdb这个符号文件的信息了经过分析我得出了导致蓝屏的主要原因如下:最直接的原因是acpi!MapPhysMem+39该处的代码导致的,而该代码会出错又是因为BIOS中的ACPI memory map出错,OS使用的memory被BIOS占用了。原因大概明了了,那么到底是哪段BIOS code出了问题了呢?静下心来,我开始努力搜索脑海中的记忆,既然是ACPI出错那么应该是在asl code中,我记得以前看过的 asl code中当需要使用系统资源的时候通常要声OperationRegion比如我需要使用一些IO资源那么我要这么写:

OperationRegion( IO_, SystemIO, 0x62, 5 ) 
如需要使用SystemMemory则需要下面的写法:
OperationRegion(XPEX, SystemMemory, 0xE0020100, 0x100)
 
bug analysis info报出如下错误:
Arguments:
Arg1: 00001000, ACPI_BIOS_USING_OS_MEMORY
   ACPI had a fatal error when processing a memory 
operation region.The memory operation region tried to 
map memory that has been allocated for OS usage.
Arg2: 00000000, The high portion of the physical address 
 of the memory region.
Arg3: ffffff00, The low portion of the physical address 
 of the memory region.
Arg4: 00000105, The length of memory being mapped.

从字面上看我觉得应该是在asl code中声明一段System Memory然后acpi.sys这支driver解析该段System Memory的时候出错了。么到底是哪一段code导致的呢?arg3、arg4道出了天机。应该是像下面写法的一段code导致的:

OperationRegion(???, SystemMemory, 0xffffff00, 0x105)
那么我就开始搜索asl code 最终在发现了罪犯的踪迹,下面的一段aslcode存在重大作案嫌疑:
Scope(/)

{

   OperationRegion(ATFB,SystemMemory,0xFFFFFF00,0x105)// Relocatable operationRegion.

   Field(ATFB,AnyAcc,NoLock,Preserve)              // Field

   {

        BCMD,8,

        DID,32,

        INFO,2048,

   }

}

   BIOS拿掉这段code以后,板子工作正常了,愉快的进入了OS。Bug是解掉了,可是到底为什么这么声明一段区域会导致错误呢?BIOS给出的解释是这部分code没有被用到,可是它和BIOS声明给OS的资源在地址上有冲突。于是就蓝屏了J。

 

Peter