一、gdb 硬件watch断点原理
1、直观的例子
硬件断点的watch功能是查找内存被改写的一个必备工具,和其它调试器断点功能一样,它可以快速的理解一个系统的特定方便而不用理解整个系统。对于某些关键变量,我们想知道有哪些地方使用或者初始化这些数据,只需要在该表达式打上数据断点,待命中时查看调用链即可。
硬件断点需要CPU硬件支持,例如对于我们常见的386处理器来说,CPU内部定义了4个硬件断点寄存器,这也意味着我们通常观察的断点数量有一个硬件的限制(不确定有没有软件实现,例如把断点地址所在整个页面设置为写保护,当异常之后再修改回去,这样可行,但是估计效率会比较低)。
即使CPU支持有4个断点,也并不意味着我们可以观察4个表达式,我们可以举个例子,来说明有时候可能一个表达式都无法观察到:
tsecer@harry #cat warden.c
#include <stdio.h>
struct S0
{
int iHolder;
};
struct S1
{
struct S0 * pS0;
};
struct S11
{
struct S1 *pS1;
};
struct S2
{
struct S11 *pS11;
};
struct S22
{
struct S2 *pS2;
};
struct S3
{
struct S22 *pS22;
};
int foo(struct S3 *);
struct S0 s0 = {1};
struct S1 s1 = {&s0};
struct S11 s11 = {&s1};
struct S2 s2 = {&s11};
struct S22 s22 = {&s2};
struct S3 s3 = {&s22};//, s33 = {&s2}, s333 = {&s2};
int main()
{
return foo(&s3);
}
int foo(struct S3 * ps3)
{
return 0;
}
tsecer@harry #gcc -g warden.c
tsecer@harry #gdb ./a.out
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i586-suse-linux"...
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) b main
Breakpoint 1 at 0x80483b5: file warden.c, line 43.
(gdb) r
Starting program: /data/harry/harrytest/watch/a.out
Breakpoint 1, main () at warden.c:43
43 return foo(&s3);
(gdb) watch s22->s2->pS11->pS1->pS0->iHolder
There is no member named s2.
(gdb) watch s22->pS2->pS11->pS1->pS0->iHolder
Hardware watchpoint 2: s22->pS2->pS11->pS1->pS0->iHolder
(gdb) c
Continuing.
warning: Could not remove hardware watchpoint 2.
Warning:
Could not insert hardware watchpoint 2.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.
(gdb) watch s2->pS11->pS1->pS0->iHolder
Hardware watchpoint 3: s2->pS11->pS1->pS0->iHolder
(gdb) c
Continuing.
Program exited normally.
(gdb)
由于这个例子是我试了几次的结果,所以有些冗余,如果有人看可以将就一下。
这里可以看到,由于watch的表达式中大量使用了指针,所以需要在表达式链表的每个节点上都打上断点。由于我们设置的链表故意很长,消耗掉了系统中全部4个断点,所以watch失败。两个例子也验证了调试寄存器的数量。
2、gdb部分相关代码
watch_command-->>watch_command_1--->>can_use_hardware_watchpoint
for (; v; v = v->next)
{
if (VALUE_LVAL (v) == lval_memory)
{
if (VALUE_LAZY (v))
/* A lazy memory lvalue is one that GDB never needed to fetch;
we either just used its address (e.g., `a' in `a.b') or
we never needed it at all (e.g., `a' in `a,b'). */
;
else
{
/* Ahh, memory we actually used! Check if we can cover
it with hardware watchpoints. */
struct type *vtype = check_typedef (VALUE_TYPE (v));
We only watch structs and arrays if user asked for it
explicitly, never if they just happen to appear in a
middle of some value chain. */
if (v == head
|| (TYPE_CODE (vtype) != TYPE_CODE_STRUCT
&& TYPE_CODE (vtype) != TYPE_CODE_ARRAY))
{
CORE_ADDR vaddr = VALUE_ADDRESS (v) + VALUE_OFFSET (v);
int len = TYPE_LENGTH (VALUE_TYPE (v));
if (!TARGET_REGION_OK_FOR_HW_WATCHPOINT (vaddr, len))
return 0;
else
found_memory_cnt++;
}
}
}
else if (v->lval != not_lval && v->modifiable == 0)
return 0; /* ??? What does this represent? */
else if (v->lval == lval_register)
return 0; /* cannot watch a register with a HW watchpoint */
}
二、调试内核时call函数
如果有同学调试过内核,可能会注意到,在调试器下调用内核函数,例如sys_getpid,此时内核会直接panic。查看panic信息,可以发现EIP会显示一个0x100000的地址。
这个问题看起来比较诡异,但是如果理解gdb断点实现原理,这个现象解释起来就有些道理。我在之前的一片日志中说明了gdb命令行call函数(或者对表达式求值中有函数调用)时,会伪造一个返回地址为entry的函数调用,并且把断点设置在该位置。
内核的生成使用了特殊的链接脚本,该脚本中定义的entry为内核加载时基础地址,而代码中其它内核符号的地址则为我们常见3G以上地址。在调试内核时,内核符号vmlinux显示被调试文件的entry地址为0x100000。该逻辑地址在内核启动之后已经是一个非法地址,被调用函数返回之后,eip被设置到该地址,由于该地址已经不在内核合法页表地址中,此时产生内核态访问异常,也就是我们看到的内核panic。
和用户态不同,内核不会向调试器发送SIGSEGV之类的信号,用户调试器收到该信号通常都是由内核发送并通知调试器的,在内核被调试的时候,那就没办法了,内核自己默默的panic,
相关变量及所在文件定义如下
内核生成时链接脚本内容
linux-2.6.21\arch\i386\kernel\vmlinux.lds.S
OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
OUTPUT_ARCH(i386)
ENTRY(phys_startup_32)
jiffies = jiffies_64;
_proxy_pda = 1;
SECTIONS
{
. = LOAD_OFFSET + LOAD_PHYSICAL_ADDR;
phys_startup_32 = startup_32 - LOAD_OFFSET;
linux-2.6.21\include\asm-i386\boot.h
/* Physical address where kenrel should be loaded. */
#define LOAD_PHYSICAL_ADDR
+ (CONFIG_PHYSICAL_ALIGN - 1)) \
& ~(CONFIG_PHYSICAL_ALIGN - 1))
linux-2.6.21\arch\i386\Kconfig
config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
default "0x100000"
help
This gives the physical address where the kernel is loaded.
If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then
bzImage will decompress itself to above physical address and
run from there. Otherwise, bzImage will run from the address where
it has been loaded by the boot loader and will ignore above physical
address.
In normal kdump cases one does not have to set/change this option
as now bzImage can be compiled as a completely relocatable image
(CONFIG_RELOCATABLE=y) and be used to load and run from a different
address. This option is mainly useful for the folks who don't want
to use a bzImage for capturing the crash dump and want to use a
vmlinux instead. vmlinux is not relocatable hence a kernel needs
to be specifically compiled to run from a specific memory area
(normally a reserved region) and this option comes handy.
So if you are using bzImage for capturing the crash dump, leave
the value here unchanged to 0x100000 and set CONFIG_RELOCATABLE=y.
Otherwise if you plan to use vmlinux for capturing the crash dump
change this value to start of the reserved region (Typically 16MB
0x1000000). In other words, it can be set based on the "X" value as
specified in the "crashkernel=YM@XM" command line boot parameter
passed to the panic-ed kernel. Typically this parameter is set as
crashkernel=64M@16M. Please take a look at
Documentation/kdump/kdump.txt for more details about crash dumps.
Usage of bzImage for capturing the crash dump is recommended as
one does not have to build two kernels. Same kernel can be used
as production kernel and capture kernel. Above option should have
gone away after relocatable bzImage support is introduced. But it
is present because there are users out there who continue to use
vmlinux for dump capture. This option should go away down the
line.
Don't change this unless you know what you are doing.