使用qemu结合eclipse或者DDD等gdb的图形前端,跟踪协议栈或者文件系统内存管理等都会很方便。就是与硬件驱动相关的跟踪可能差点。
下载Linux Kernel源代码,并编译生成压缩的kernel镜像(/bak/linux/linux-2.6/arch/x86_64/boot/bzImage)与用于gdb的非压缩的kernel ELF文件(/bak/linux/linux-2.6/vmlinux, ELF object file, symbols included, including debug info)。
cd /bak/linux/ && git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
怎样编译内核參考:编译linux kernel及制作initrd ( by quqi99 )
sudo apt-get install libncurses5-dev
make menuconfig make -j 8 bzImage
制作initrd, 使用initrd时的kernel要使用CONFIG_BLK_DEV_INITRD=y编译。
sudo apt-get install build-essential initramfs-tools
sudo make modules_install #将生成/lib/modules/4.5.0-rc2+
mkinitramfs -o initrd.img -v 4.5.0-rc2+
mkdir -p /bak/linux/initramfs/{bin,sbin,etc,proc,sys,newroot}cd /bak/linuxtouch initramfs/etc/mdev.confwget http://jootamam.net/initramfs-files/busybox-1.10.1-static.bz2 -O - | bunzip2 > initramfs/bin/busyboxchmod +x initramfs/bin/busyboxtouch initramfs/init
chmod +x initramfs/init
initramfs/init文件例如以下:
#!/bin/sh
#Mount things needed by this script
mount -t proc proc /proc
mount -t sysfs sysfs /sys
#Disable kernel messages from popping onto the screen
echo 0 > /proc/sys/kernel/printk
#Clear the screen
clear
#Create all the symlinks to /bin/busybox
busybox --install -s
#Create device nodes
mknod /dev/null c 1 3
mknod /dev/tty c 5 0
mdev -s
#Function for parsing command line options with "=" in them
# get_opt("init=/sbin/init") will return "/sbin/init"
get_opt() {
echo "$@" | cut -d "=" -f 2
}
#Defaults
init="/sbin/init"
root="/dev/hda1"
#Process command line options
for i in $(cat /proc/cmdline); do
case $i in
root\=*)
root=$(get_opt $i)
;;
init\=*)
init=$(get_opt $i)
;;
esac
done
#Mount the root device
mount "${root}" /newroot
#Check if $init exists and is executable
if [[ -x "/newroot/${init}" ]] ; then
#Unmount all other mounts so that the ram used by
#the initramfs can be cleared after switch_root
umount /sys /proc
#Switch to the new root and execute init
exec switch_root /newroot "${init}"
fi
#This will only be run if the exec above failed
echo "Failed to switch_root, dropping to a shell"
exec sh
cd initramfs
find . | cpio -H newc -o > ../initramfs.cpio
cd ..
cat initramfs.cpio | gzip > initramfs.igz
但上述busybox-1.10.1-static.bz2似乎没有ext2模块不能识别qemu的-hda參数传进去ext2格式的硬盘,所以最后改成从busybox-1.24.0的源代码编译。
wget https://busybox.net/downloads/busybox-1.24.0.tar.bz2
make menuconfig
CONFIG_MKFS_EXT2=y
Busybox Settings --->
Build Options --->
[*] Build BusyBox as a static binary (no shared libs) //静态方式编译
make & make install
cp -avR /bak/linux/busybox-1.24.0/_install/* /bak/linux/initramfs/
qemu载入内核
wget http://www.nongnu.org/qemu/linux-0.2.img.bz2
sudo qemu-system-x86_64 -hda /bak/images/linux-0.2.img -hdb /bak/linux/disk.img -kernel /bak/linux/linux-2.6/arch/x86_64/boot/bzImage -initrd /bak/linux/initramfs.igz -append "root=/dev/sda init=sbin/init console=ttyS0" -nographic -smp 1,cores=1 -S -s
參数解释例如以下:
- 当中-s为开启GDB的调试端口1234,而-S则表示运行QEMU时冻结待GDB运行(c)ontinue操作。
- console=ttyS0" -nographic表示不开新的图形化窗体。直接使用敲命令的bash窗体
- -append "root=/dev/sda init=sbin/init应该与initrd文件中的init脚本一致。
- 加--enable-debug參数编译的QEMU会自己主动加入符号表
使用gdb调试内核
qemu的-s參数会默认在1234端口开启gdbserver。
hua@node1:~$ sudo netstat -anp |grep 1234
tcp 0 0 0.0.0.0:1234 0.0.0.0:* LISTEN 24309/qemu-system-x
hua@node1:~$ /bak/java/gdb/bin/gdb /bak/linux/linux-2.6/vmlinux
...
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000000000000000 in irq_stack_union ()
(gdb) b start_kernel
Breakpoint 1 at 0xffffffff81d66b09: file init/main.c, line 498.
(gdb) info registers
(gdb) bt
(gdb) c
(gdb) list
(gdb) set architecture
Requires an argument. Valid arguments are i386, i386:x86-64, i386:x64-32, i8086, i386:intel,i386:x86-64:intel, i386:x64-32:intel, auto.
# Inside VM, echo 'c' | sudo tee /proc/sysrq-trigger
(gdb) add-symbol-file vmlinux 0xffffffff81000000 #echo 0x$(sudo cat /proc/kallsyms | egrep -e "T _text$" | awk '{print $1}')
(gdb) b sysrq_handle_crash
1, Linux源代码size太大,设置workspace全局禁止使用eclipse去给代码做自己主动build。索引能够仍然交由eclipse来做,这样方便在eclipse中进行搜索及代码导航。
- Preferences -> Generl -> Workspace -> Build automatically (Disable)
2, 将Kernel源代码导入为eclipseproject, toolChain选为Linux GCC.
Import -> C/C++ -> Existing Code as Makefile Project
3, 创建一个debug启动器(Debug configurations -> C/C++ Remote Application)
选择GDB(DSF) Manual Remote Debugging Launcher
Main TAB -> -C/C++ Application指向实际uncompress kernel: /bak/linux/linux-2.6/vmlinux
Main TAB -> -Disable auto build
Debugger TAB -> Stop on startup at 'start_kernel'
Debugger TAB -> connection -> Host Name or IP Address -> = localhost
Debugger TAB -> connection -> Port number = 1234
编译gdb解决错误“Remote 'g' packet reply is too long”
cd /bak/java && wget http://ftp.gnu.org/gnu/gdb/gdb-7.7.tar.gz
改动gdb/remote.c文件,在process_g_packet函数里,将例如以下代码:
if (buf_len > 2 * rsa->sizeof_g_packet)
error (_("Remote 'g' packet reply is too long: %s"), rs->buf);
改动上两行代码为以下的代码,或者直接凝视上两行什么也不加:
if (buf_len > 2 * rsa->sizeof_g_packet) {
rsa->sizeof_g_packet = buf_len ;
for (i = 0; i < gdbarch_num_regs (gdbarch); i++) {
if (rsa->regs[i].pnum == -1)
continue;
if (rsa->regs[i].offset >= rsa->sizeof_g_packet)
rsa->regs[i].in_g_packet = 0;
else
rsa->regs[i].in_g_packet = 1;
}
}
./configure --prefix=/bak/java/gdb && make && make install
接下来又一次配置下Eclipse,点击菜单“Run”->“Debug Configurations…”,在弹出的对话框中,切换到“Debugger”下的“Main”页,改动“GDB debugger:”为刚编译出来的GDB(/bak/java/gdb/bin/gdb),而不是默认的gdb
參考
[1] http://blog.chinaunix.net/uid-26009923-id-3825761.html
[2] http://mgalgs.github.io/2012/03/23/how-to-build-a-custom-linux-kernel-for-qemu.html
[3] http://www.kgdb.info/kgdb/use_kgdb/using_kgdb_base_qemu/
1, 创建cscope.files
LNX=/bak/linux/linux-2.6
cd /
find $LNX \
-path "$LNX/arch/*" ! -path "$LNX/arch/i386*" -prune -o \
-path "$LNX/include/asm-*" ! -path "$LNX/include/asm-i386*" -prune -o \
-path "$LNX/tmp*" -prune -o \
-path "$LNX/Documentation*" -prune -o \
-path "$LNX/scripts*" -prune -o \
-path "$LNX/drivers*" -prune -o \
-name "*.[chxsS]" -print >/bak/linux/linux-2.6/cscope/cscope.files
2, 创建索引数据库
cd /bak/linux/linux-2.6/cscope
3, 使用索引数据库
cscope -d
ELF(Executable and Linking Format),它是一种容器格式。用于存放可运行文件及相关数据。逻辑上分为许多section(可使用objdump -h 或readelf -S命令查看),包含:
- executable code & data (.text, .data, .bss, etc) #.data包含初始化的全局数据 .bss未初始化的数据。 .text为可运行代码
- symbol tables (.symtab)
- ELF string tables (.strtab, .shstrtab)
- debug information (.debug_info, .debug_line, .eh_frame, etc)
- metadata (.notes, .comment)
- dynamic linking information (.plt, .got, etc)
hua@node1:/bak/linux/linux-2.6$ readelf -n vmlinux
Displaying notes found at file offset 0x0094e5f8 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 8930dc42387f290d882a43eafffb3e6105dd4df0
hua@node1:/bak/linux/linux-2.6$ readelf -p .comment vmlinux
String dump of section '.comment':
[ 0] GCC: (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
#readelf -S 用于读ELS镜像中的全部符号表
hua@node1:/bak/linux/linux-2.6$ readelf -S vmlinux
There are 44 section headers, starting at offset 0xa081ae0:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS ffffffff81000000 00200000
000000000074e5f8 0000000000000000 AX 0 0 4096
[ 2] .notes NOTE ffffffff8174e5f8 0094e5f8
0000000000000024 0000000000000000 AX 0 0 4
[ 3] __ex_table PROGBITS ffffffff8174e620 0094e620
0000000000002158 0000000000000000 A 0 0 8
[ 4] .rodata PROGBITS ffffffff81800000 00a00000
000000000033f9fe 0000000000000000 A 0 0 64
[ 5] __bug_table PROGBITS ffffffff81b3fa00 00d3fa00
00000000000072fc 0000000000000000 A 0 0 1
[ 6] .pci_fixup PROGBITS ffffffff81b46d00 00d46d00
0000000000003270 0000000000000000 A 0 0 8
[ 7] .builtin_fw PROGBITS ffffffff81b49f70 00d49f70
0000000000000120 0000000000000000 A 0 0 8
[ 8] .tracedata PROGBITS ffffffff81b4a090 00d4a090
0000000000000078 0000000000000000 A 0 0 1
[ 9] __ksymtab PROGBITS ffffffff81b4a110 00d4a110
00000000000118a0 0000000000000000 A 0 0 16
[10] __ksymtab_gpl PROGBITS ffffffff81b5b9b0 00d5b9b0
000000000000ecc0 0000000000000000 A 0 0 16
[11] __kcrctab PROGBITS ffffffff81b6a670 00d6a670
0000000000008c50 0000000000000000 A 0 0 8
[12] __kcrctab_gpl PROGBITS ffffffff81b732c0 00d732c0
0000000000007660 0000000000000000 A 0 0 8
[13] __ksymtab_strings PROGBITS ffffffff81b7a920 00d7a920
00000000000268c3 0000000000000000 A 0 0 1
[14] __init_rodata PROGBITS ffffffff81ba1200 00da1200
0000000000000240 0000000000000000 A 0 0 32
[15] __param PROGBITS ffffffff81ba1440 00da1440
00000000000025d0 0000000000000000 A 0 0 8
[16] __modver PROGBITS ffffffff81ba3a10 00da3a10
00000000000005f0 0000000000000000 A 0 0 8
[17] .data PROGBITS ffffffff81c00000 00e00000
0000000000144140 0000000000000000 WA 0 0 4096
[18] .vvar PROGBITS ffffffff81d45000 00f45000
0000000000001000 0000000000000000 WA 0 0 16
[19] .data..percpu PROGBITS 0000000000000000 01000000
000000000001f918 0000000000000000 WA 0 0 4096
[20] .init.text PROGBITS ffffffff81d66000 01166000
0000000000060879 0000000000000000 AX 0 0 16
[21] .init.data PROGBITS ffffffff81dc7000 011c7000
00000000000c2e90 0000000000000000 WA 0 0 4096
[22] .x86_cpu_dev.init PROGBITS ffffffff81e89e90 01289e90
0000000000000018 0000000000000000 A 0 0 8
[23] .altinstructions PROGBITS ffffffff81e89ea8 01289ea8
0000000000005f44 0000000000000000 A 0 0 1
[24] .altinstr_replace PROGBITS ffffffff81e8fdec 0128fdec
00000000000017db 0000000000000000 AX 0 0 1
[25] .iommu_table PROGBITS ffffffff81e915c8 012915c8
00000000000000f0 0000000000000000 A 0 0 8
[26] .apicdrivers PROGBITS ffffffff81e916b8 012916b8
0000000000000030 0000000000000000 WA 0 0 8
[27] .exit.text PROGBITS ffffffff81e916e8 012916e8
0000000000001e26 0000000000000000 AX 0 0 1
[28] .smp_locks PROGBITS ffffffff81e94000 01294000
0000000000007000 0000000000000000 A 0 0 4
[29] .data_nosave PROGBITS ffffffff81e9b000 0129b000
0000000000001000 0000000000000000 WA 0 0 4
[30] .bss NOBITS ffffffff81e9c000 0129c000
0000000000142000 0000000000000000 WA 0 0 4096
[31] .brk NOBITS ffffffff81fde000 0129c000
0000000000026000 0000000000000000 WA 0 0 1
[32] .comment PROGBITS 0000000000000000 0129c000
0000000000000029 0000000000000001 MS 0 0 1
[33] .debug_aranges PROGBITS 0000000000000000 0129c030
0000000000023880 0000000000000000 0 0 16
[34] .debug_info PROGBITS 0000000000000000 012bf8b0
000000000724bc4f 0000000000000000 0 0 1
[35] .debug_abbrev PROGBITS 0000000000000000 0850b4ff
00000000002d7be9 0000000000000000 0 0 1
[36] .debug_line PROGBITS 0000000000000000 087e30e8
000000000072232c 0000000000000000 0 0 1
[37] .debug_frame PROGBITS 0000000000000000 08f05418
00000000001f5cd0 0000000000000000 0 0 8
[38] .debug_str PROGBITS 0000000000000000 090fb0e8
00000000002b5264 0000000000000001 MS 0 0 1
[39] .debug_loc PROGBITS 0000000000000000 093b034c
0000000000925080 0000000000000000 0 0 1
[40] .debug_ranges PROGBITS 0000000000000000 09cd53d0
00000000003ac530 0000000000000000 0 0 16
[41] .shstrtab STRTAB 0000000000000000 0a081900
00000000000001dd 0000000000000000 0 0 1
[42] .symtab SYMTAB 0000000000000000 0a0825e0
00000000002490c0 0000000000000018 43 63525 8
[43] .strtab STRTAB 0000000000000000 0a2cb6a0
0000000000219339 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
#查看调试信息
hua@node1:/bak/linux/linux-2.6$ readelf -S vmlinux |grep debug
[33] .debug_aranges PROGBITS 0000000000000000 0129c030
[34] .debug_info PROGBITS 0000000000000000 012bf8b0
[35] .debug_abbrev PROGBITS 0000000000000000 0850b4ff
[36] .debug_line PROGBITS 0000000000000000 087e30e8
[37] .debug_frame PROGBITS 0000000000000000 08f05418
[38] .debug_str PROGBITS 0000000000000000 090fb0e8
[39] .debug_loc PROGBITS 0000000000000000 093b034c
[40] .debug_ranges PROGBITS 0000000000000000 09cd53d0
#readelf -e 用于读ELS镜像中的全部段
hua@node1:/bak/linux/linux-2.6$ readelf -e vmlinux
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1000000
Start of program headers: 64 (bytes into file)
Start of section headers: 168303328 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 5
Size of section headers: 64 (bytes)
Number of section headers: 44
Section header string table index: 41
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS ffffffff81000000 00200000
000000000074e5f8 0000000000000000 AX 0 0 4096
[ 2] .notes NOTE ffffffff8174e5f8 0094e5f8
0000000000000024 0000000000000000 AX 0 0 4
[ 3] __ex_table PROGBITS ffffffff8174e620 0094e620
0000000000002158 0000000000000000 A 0 0 8
[ 4] .rodata PROGBITS ffffffff81800000 00a00000
000000000033f9fe 0000000000000000 A 0 0 64
[ 5] __bug_table PROGBITS ffffffff81b3fa00 00d3fa00
00000000000072fc 0000000000000000 A 0 0 1
[ 6] .pci_fixup PROGBITS ffffffff81b46d00 00d46d00
0000000000003270 0000000000000000 A 0 0 8
[ 7] .builtin_fw PROGBITS ffffffff81b49f70 00d49f70
0000000000000120 0000000000000000 A 0 0 8
[ 8] .tracedata PROGBITS ffffffff81b4a090 00d4a090
0000000000000078 0000000000000000 A 0 0 1
[ 9] __ksymtab PROGBITS ffffffff81b4a110 00d4a110
00000000000118a0 0000000000000000 A 0 0 16
[10] __ksymtab_gpl PROGBITS ffffffff81b5b9b0 00d5b9b0
000000000000ecc0 0000000000000000 A 0 0 16
[11] __kcrctab PROGBITS ffffffff81b6a670 00d6a670
0000000000008c50 0000000000000000 A 0 0 8
[12] __kcrctab_gpl PROGBITS ffffffff81b732c0 00d732c0
0000000000007660 0000000000000000 A 0 0 8
[13] __ksymtab_strings PROGBITS ffffffff81b7a920 00d7a920
00000000000268c3 0000000000000000 A 0 0 1
[14] __init_rodata PROGBITS ffffffff81ba1200 00da1200
0000000000000240 0000000000000000 A 0 0 32
[15] __param PROGBITS ffffffff81ba1440 00da1440
00000000000025d0 0000000000000000 A 0 0 8
[16] __modver PROGBITS ffffffff81ba3a10 00da3a10
00000000000005f0 0000000000000000 A 0 0 8
[17] .data PROGBITS ffffffff81c00000 00e00000
0000000000144140 0000000000000000 WA 0 0 4096
[18] .vvar PROGBITS ffffffff81d45000 00f45000
0000000000001000 0000000000000000 WA 0 0 16
[19] .data..percpu PROGBITS 0000000000000000 01000000
000000000001f918 0000000000000000 WA 0 0 4096
[20] .init.text PROGBITS ffffffff81d66000 01166000
0000000000060879 0000000000000000 AX 0 0 16
[21] .init.data PROGBITS ffffffff81dc7000 011c7000
00000000000c2e90 0000000000000000 WA 0 0 4096
[22] .x86_cpu_dev.init PROGBITS ffffffff81e89e90 01289e90
0000000000000018 0000000000000000 A 0 0 8
[23] .altinstructions PROGBITS ffffffff81e89ea8 01289ea8
0000000000005f44 0000000000000000 A 0 0 1
[24] .altinstr_replace PROGBITS ffffffff81e8fdec 0128fdec
00000000000017db 0000000000000000 AX 0 0 1
[25] .iommu_table PROGBITS ffffffff81e915c8 012915c8
00000000000000f0 0000000000000000 A 0 0 8
[26] .apicdrivers PROGBITS ffffffff81e916b8 012916b8
0000000000000030 0000000000000000 WA 0 0 8
[27] .exit.text PROGBITS ffffffff81e916e8 012916e8
0000000000001e26 0000000000000000 AX 0 0 1
[28] .smp_locks PROGBITS ffffffff81e94000 01294000
0000000000007000 0000000000000000 A 0 0 4
[29] .data_nosave PROGBITS ffffffff81e9b000 0129b000
0000000000001000 0000000000000000 WA 0 0 4
[30] .bss NOBITS ffffffff81e9c000 0129c000
0000000000142000 0000000000000000 WA 0 0 4096
[31] .brk NOBITS ffffffff81fde000 0129c000
0000000000026000 0000000000000000 WA 0 0 1
[32] .comment PROGBITS 0000000000000000 0129c000
0000000000000029 0000000000000001 MS 0 0 1
[33] .debug_aranges PROGBITS 0000000000000000 0129c030
0000000000023880 0000000000000000 0 0 16
[34] .debug_info PROGBITS 0000000000000000 012bf8b0
000000000724bc4f 0000000000000000 0 0 1
[35] .debug_abbrev PROGBITS 0000000000000000 0850b4ff
00000000002d7be9 0000000000000000 0 0 1
[36] .debug_line PROGBITS 0000000000000000 087e30e8
000000000072232c 0000000000000000 0 0 1
[37] .debug_frame PROGBITS 0000000000000000 08f05418
00000000001f5cd0 0000000000000000 0 0 8
[38] .debug_str PROGBITS 0000000000000000 090fb0e8
00000000002b5264 0000000000000001 MS 0 0 1
[39] .debug_loc PROGBITS 0000000000000000 093b034c
0000000000925080 0000000000000000 0 0 1
[40] .debug_ranges PROGBITS 0000000000000000 09cd53d0
00000000003ac530 0000000000000000 0 0 16
[41] .shstrtab STRTAB 0000000000000000 0a081900
00000000000001dd 0000000000000000 0 0 1
[42] .symtab SYMTAB 0000000000000000 0a0825e0
00000000002490c0 0000000000000018 43 63525 8
[43] .strtab STRTAB 0000000000000000 0a2cb6a0
0000000000219339 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000
0x0000000000ba4000 0x0000000000ba4000 R E 200000
LOAD 0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
0x0000000000146000 0x0000000000146000 RW 200000
LOAD 0x0000000001000000 0x0000000000000000 0x0000000001d46000
0x000000000001f918 0x000000000001f918 RW 200000
LOAD 0x0000000001166000 0xffffffff81d66000 0x0000000001d66000
0x0000000000136000 0x000000000029e000 RWE 200000
NOTE 0x000000000094e5f8 0xffffffff8174e5f8 0x000000000174e5f8
0x0000000000000024 0x0000000000000024 4
Section to Segment mapping:
Segment Sections...
00 .text .notes __ex_table .rodata __bug_table .pci_fixup .builtin_fw .tracedata __ksymtab __ksymtab_gpl __kcrctab __kcrctab_gpl __ksymtab_strings __init_rodata __param __modver
01 .data .vvar
02 .data..percpu
03 .init.text .init.data .x86_cpu_dev.init .altinstructions .altinstr_replacement .iommu_table .apicdrivers .exit.text .smp_locks .data_nosave .bss .brk
04 .notes
DWARF(Debugging With Attributed Record Formats)和ELF是同义词,从gcc 4.8開始使用DWARF version 4作为默认格式(Linux Kernel开关是:DEBUG_INFO_DWARF4)。
#内存越界错误。或者相似unable to handle kernel paging request at xxx, unable to handle kernel NULL pointer dereference at
[158108.522856] general protection fault: 0000 [#1] SMP
#模块信息
[158108.531877] Modules linked in: dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_crypt gpio_ich xfs x86_pkg_temp_thermal intel_powerclamp coretemp bridge kvm_intel stp kvm joydev llc mei_me mei shpchp lpc_ich ipmi_si acpi_power_meter acpi_pad mac_hid btrfs xor raid6_pq libcrc32c ses enclosure crct10dif_pclmul crc32_pclmul ixgbe igb aesni_intel aes_x86_64 hid_generic dca lrw gf128mul ptp glue_helper usbhid ablk_helper cryptd hid pps_core i2c_algo_bit megaraid_sas mdio wmi
#CPU是20, PID是0, command是swapper/20, 内核版本号。硬件信息
[158108.654066] CPU: 20 PID: 0 Comm: swapper/20 Not tainted 3.13.0-74-generic #118-Ubuntu
[158108.675000] Hardware name: Cisco Systems Inc UCSC-C240-M4SX/UCSC-C240-M4SX, BIOS C240M4.2.0.8b.0.080620151546 08/06/2015
#task_struct(per-cpu variable current_task的内核地址,ti是current_thread_info的内核地址
[158108.699921] task: ffff883f2653b000 ti: ffff883f26536000 task.ti: ffff883f26536000
#寄存器信息, 对于x86,%cr2中的是近期的page fault address, RAX是非法值
[158108.720992] RIP: 0010:[<ffffffff810756a4>] [<ffffffff810756a4>] detach_if_pending+0x34/0xb0
[158108.744725] RSP: 0018:ffff887f7f083d10 EFLAGS: 00010002
[158108.757586] RAX: dead000000200200 RBX: ffffffffa012f040 RCX: 0000000000001896
[158108.779778] RDX: ffff887f25d00938 RSI: ffff887f25eb8000 RDI: ffffffffa012f040
[158108.802864] RBP: ffff887f7f083d30 R08: 0000000000000086 R09: ffff887f25d74000
[158108.826882] R10: 0000000000000002 R11: 0000000000000005 R12: ffffffffa012f040
[158108.851851] R13: ffff887f25eb8000 R14: 0000000000000001 R15: 0000000000000001
[158108.877347] FS: 0000000000000000(0000) GS:ffff887f7f080000(0000) knlGS:0000000000000000
[158108.903997] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[158108.918882] CR2: 00000000006f0e58 CR3: 0000000001c0e000 CR4: 00000000001407e0
#栈的raw十六进制信息
[158108.943906] Stack:
[158108.954323] ffffffffa012f040 0000000000000000 ffff887f25eb8000 ffff883f22d7ea00
[158108.978987] ffff887f7f083d60 ffffffff81075766 0000000000000086 ffffffffa012f020
[158109.003697] ffff887f7f083d98 0000000000000100 ffff887f7f083d88 ffffffff81082369
#符号call stack backtrace,结合%rip这是很实用的信息。它也提供了函数的大小及偏移信息, 函数call前加问号。
[158109.028467] Call Trace:
[158109.039181] <IRQ>
[158109.041507] [<ffffffff81075766>] del_timer+0x46/0x70
[158109.062562] [<ffffffff81082369>] try_to_grab_pending+0xa9/0x160
[158109.076953] [<ffffffff81082453>] mod_delayed_work_on+0x33/0x70
[158109.091233] [<ffffffffa012c3ba>] set_timeout+0x3a/0x40 [ib_addr]
[158109.105194] [<ffffffffa012c559>] netevent_callback+0x29/0x30 [ib_addr]
[158109.120083] [<ffffffff8173125c>] notifier_call_chain+0x4c/0x70
[158109.134153] [<ffffffff81634a60>] ? neigh_table_clear+0x120/0x120
[158109.148004] [<ffffffff817312ba>] atomic_notifier_call_chain+0x1a/0x20
[158109.162487] [<ffffffff8163100b>] call_netevent_notifiers+0x1b/0x20
[158109.176677] [<ffffffff81634b21>] neigh_timer_handler+0xc1/0x2c0
[158109.189976] [<ffffffff810745d6>] call_timer_fn+0x36/0x100
[158109.202723] [<ffffffff81634a60>] ? neigh_table_clear+0x120/0x120
[158109.216443] [<ffffffff8107556f>] run_timer_softirq+0x1ef/0x2f0
[158109.229444] [<ffffffff8106cd2c>] __do_softirq+0xec/0x2c0
[158109.241890] [<ffffffff8106d275>] irq_exit+0x105/0x110
[158109.253555] [<ffffffff81737b15>] smp_apic_timer_interrupt+0x45/0x60
[158109.266647] [<ffffffff8173649d>] apic_timer_interrupt+0x6d/0x80
[158109.279320] <EOI>
[158109.281647] [<ffffffff815d65b2>] ?
cpuidle_enter_state+0x52/0xc0
[158109.300117] [<ffffffff815d66d9>] cpuidle_idle_call+0xb9/0x1f0
[158109.312100] [<ffffffff8101d3ee>] arch_cpu_idle+0xe/0x30
[158109.323777] [<ffffffff810bf475>] cpu_startup_entry+0xc5/0x290
[158109.335775] [<ffffffff810415ed>] start_secondary+0x21d/0x2d0
#原生字节(instruction stream),反汇编时才实用
[158109.347654] Code: 89 e5 41 56 41 89 d6 41 55 41 54 49 89 fc 53 48 8b 17 48 85 d2 74 55 49 89 f5 0f 1f 44 00 00 49 8b 44 24 08 45 84 f6 48 89 42 08 <48> 89 10 74 08 49 c7 04 24 00 00 00 00 41 f6 44 24 18 01 48 b8
#Reprint of instruction pointer, current function, and stack pointer
[158109.386072] RIP [<ffffffff810756a4>] detach_if_pending+0x34/0xb0
[158109.398404] RSP <ffff887f7f083d10>
使用上面的内核及RIP寄存器信息找到相关代码:
addr2line -e ddeb/vmlinux-3.13.0-74-generic 0xffffffff810756a4
linux-3.13.0/include/linux/list.h:89
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
next->prev = prev;
prev->next = next; <<=== HERE
}
上面的高层C代码看不出什么东西。我们继续去ELF文件(vmlinux或者System.map)中通过符号找到相应的汇编代码:
% objdump -d -l ddeb/vmlinux-3.13.0-74-generic 0xffffffff810756a4
[...]
ffffffff81075670 <detach_if_pending>:
[...]
detach_timer():
/build/linux-_xRakU/linux-3.13.0/kernel/timer.c:662
ffffffff81075698: 49 8b 44 24 08 mov 0x8(%r12),%rax
/build/linux-_xRakU/linux-3.13.0/kernel/timer.c:663
ffffffff8107569d: 45 84 f6 test %r14b,%r14b
__list_del():
/build/linux-_xRakU/linux-3.13.0/include/linux/list.h:88
ffffffff810756a0: 48 89 42 08 mov %rax,0x8(%rdx)
/build/linux-_xRakU/linux-3.13.0/include/linux/list.h:89
ffffffff810756a4: 48 89 10 mov %rdx,(%rax)
deatch_if_pending -> detach_timer -> __list_del之间发生了嵌套调用,它是造成panic的根原因。
static int detach_if_pending(struct timer_list *timer, struct tvec_base *base,
bool clear_pending)
{
if (!timer_pending(timer))
return 0;
detach_timer(timer, clear_pending); <== HERE
...
static inline int timer_pending(const struct timer_list * timer)
{
return timer->entry.next != NULL;
}
结合堆栈信息查看代码。然后依据一些得到的大致的字眼搜索git log看bug是否已经被fix,
#GFP_ATOMIC=0x4020。意思是:the caller cannot sleep and wait for memory to be made available
[3387282.901263] ceph-osd: page allocation failure: order:2, mode:0x4020
[3387282.901271] Pid: 10125, comm: ceph-osd Tainted: G C 3.2.0-51-generic #77-Ubuntu
#堆栈说明错误并非開始想象的是由ceph-osd造成的, 而是一个网络设备在分配接收缓存
#上面的order:2说明在分配2的2次方的pages(共16K bytes),为mtu=9000大帧分配的。可是找不着连续的16K的内存了。
[3387282.901274] Call Trace:
[3387282.901277] <IRQ> [<ffffffff8111e9a6>] warn_alloc_failed+0xf6/0x150
[3387282.901294] [<ffffffff815349ac>] ? sk_reset_timer+0x1c/0x30
[3387282.901301] [<ffffffff81599773>] ?
tcp_send_delayed_ack+0xe3/0xf0
[3387282.901308] [<ffffffff8158d3c0>] ? __tcp_ack_snd_check+0x70/0xa0
[3387282.901314] [<ffffffff81122737>] __alloc_pages_nodemask+0x6d7/0x8f0
[3387282.901320] [<ffffffff8159d7bf>] ? tcp_v4_do_rcv+0xff/0x1d0
[3387282.901330] [<ffffffff8164bf15>] kmalloc_large_node+0x57/0x85
[3387282.901338] [<ffffffff81167bb5>] __kmalloc_node_track_caller+0x195/0x1e0
[3387282.901344] [<ffffffff81538a4b>] ?
__alloc_skb+0x4b/0x240
[3387282.901349] [<ffffffff815390c4>] ? __netdev_alloc_skb+0x24/0x50
[3387282.901354] [<ffffffff81538a78>] __alloc_skb+0x78/0x240
[3387282.901359] [<ffffffff815390c4>] __netdev_alloc_skb+0x24/0x50
[3387282.901373] [<ffffffffa00a8909>] ixgbe_alloc_rx_buffers+0x289/0x350 [ixgbe]
[3387282.901380] [<ffffffff81546fc0>] ?
napi_skb_finish+0x50/0x70
[3387282.901385] [<ffffffff815475f5>] ? napi_gro_receive+0xf5/0x140
[3387282.901393] [<ffffffffa00a91bb>] ixgbe_clean_rx_irq+0x7eb/0x8a0 [ixgbe]
[3387282.901401] [<ffffffffa00a99ee>] ixgbe_poll+0xae/0x1a0 [ixgbe]
[3387282.901406] [<ffffffff81547844>] net_rx_action+0x134/0x290
[3387282.901412] [<ffffffff8115d753>] ? isolate_migratepages+0x333/0x660
[3387282.901418] [<ffffffff8106f9e8>] __do_softirq+0xa8/0x210
[3387282.901425] [<ffffffff816606be>] ?
_raw_spin_lock+0xe/0x20
[3387282.901432] [<ffffffff8166af6c>] call_softirq+0x1c/0x30
[3387282.901439] [<ffffffff810162f5>] do_softirq+0x65/0xa0
[3387282.901444] [<ffffffff8106fdce>] irq_exit+0x8e/0xb0
[3387282.901450] [<ffffffff8166b833>] do_IRQ+0x63/0xe0
[3387282.901455] [<ffffffff81660b6e>] common_interrupt+0x6e/0x6e
[3387282.901458] <EOI> [<ffffffff8115d753>] ? isolate_migratepages+0x333/0x660
[3387282.901467] [<ffffffff8115d74d>] ?
isolate_migratepages+0x32d/0x660
[3387282.901472] [<ffffffff8115dadf>] compact_zone.part.14+0x5f/0x270
[3387282.901478] [<ffffffff8115ddd7>] compact_zone+0x37/0x50
[3387282.901482] [<ffffffff8115df63>] compact_zone_order+0x83/0xb0
[3387282.901488] [<ffffffff8115e05d>] try_to_compact_pages+0xcd/0x100
[3387282.901494] [<ffffffff8164b17e>] __alloc_pages_direct_compact+0xb2/0x178
[3387282.901500] [<ffffffff81122595>] __alloc_pages_nodemask+0x535/0x8f0
[3387282.901508] [<ffffffff8164bf15>] kmalloc_large_node+0x57/0x85
[3387282.901514] [<ffffffff81167bb5>] __kmalloc_node_track_caller+0x195/0x1e0
[3387282.901520] [<ffffffff81538a4b>] ? __alloc_skb+0x4b/0x240
[3387282.901526] [<ffffffff81589034>] ?
sk_stream_alloc_skb+0x44/0x120
[3387282.901531] [<ffffffff81538a78>] __alloc_skb+0x78/0x240
[3387282.901536] [<ffffffff81589034>] sk_stream_alloc_skb+0x44/0x120
[3387282.901541] [<ffffffff81589518>] tcp_sendmsg+0x408/0xd90
[3387282.901548] [<ffffffff815af564>] inet_sendmsg+0x64/0xb0
[3387282.901554] [<ffffffff81057d15>] ? reweight_entity+0x165/0x180
[3387282.901562] [<ffffffff812d9837>] ? apparmor_socket_sendmsg+0x17/0x20
[3387282.901569] [<ffffffff8152e49e>] sock_sendmsg+0x10e/0x130
[3387282.901574] [<ffffffff8105725d>] ?
set_next_entity+0xad/0xd0
[3387282.901580] [<ffffffff810573fa>] ? finish_task_switch+0x4a/0xf0
[3387282.901586] [<ffffffff8165e14c>] ? __schedule+0x3cc/0x6f0
[3387282.901591] [<ffffffff8165e79f>] ? schedule+0x3f/0x60
[3387282.901596] [<ffffffff8153c766>] ? verify_iovec+0x56/0xd0
[3387282.901602] [<ffffffff81530076>] ___sys_sendmsg+0x396/0x3b0
[3387282.901609] [<ffffffff8109fd16>] ? get_futex_key+0x166/0x2d0
[3387282.901614] [<ffffffff816606be>] ?
_raw_spin_lock+0xe/0x20
[3387282.901619] [<ffffffff810a02f3>] ?
futex_wake+0x113/0x130
[3387282.901624] [<ffffffff8109ff81>] ?
futex_wait+0x1/0x210
[3387282.901630] [<ffffffff81532029>] __sys_sendmsg+0x49/0x90
[3387282.901636] [<ffffffff81532089>] sys_sendmsg+0x19/0x20
[3387282.901642] [<ffffffff81668d02>] system_call_fastpath+0x16/0x1b
#NUMA节点的相关信息,用途不大,>=kernel4.1取消了这部分信息。
#DMA, 为ISA设备保留的,低于16MB的物理地址
#DMA32, 为32位的pci设备保留的,低于4GB的物理地址
#Normal, x86_64,所以保留的内存,i686是(16MB -> 896MB)
#HighMem, 对于i686为>896MB以上的内存,须要物理的MMU映射才干訪问
#除上面4个zone外的其它zone如active_anon,略。
[3387282.901645] Mem-Info:
[3387282.901647] Node 0 DMA per-cpu:
[3387282.901651] CPU 0: hi: 0, btch: 1 usd: 0
[3387282.901654] CPU 1: hi: 0, btch: 1 usd: 0
[3387282.901657] CPU 2: hi: 0, btch: 1 usd: 0
[3387282.901660] CPU 3: hi: 0, btch: 1 usd: 0
[3387282.901663] CPU 4: hi: 0, btch: 1 usd: 0
[3387282.901666] CPU 5: hi: 0, btch: 1 usd: 0
[3387282.901669] CPU 6: hi: 0, btch: 1 usd: 0
[3387282.901672] CPU 7: hi: 0, btch: 1 usd: 0
[3387282.901675] CPU 8: hi: 0, btch: 1 usd: 0
[3387282.901677] CPU 9: hi: 0, btch: 1 usd: 0
[3387282.901680] CPU 10: hi: 0, btch: 1 usd: 0
[3387282.901683] CPU 11: hi: 0, btch: 1 usd: 0
[3387282.901686] CPU 12: hi: 0, btch: 1 usd: 0
[3387282.901689] CPU 13: hi: 0, btch: 1 usd: 0
[3387282.901692] CPU 14: hi: 0, btch: 1 usd: 0
[3387282.901695] CPU 15: hi: 0, btch: 1 usd: 0
[3387282.901697] Node 0 DMA32 per-cpu:
[3387282.901701] CPU 0: hi: 186, btch: 31 usd: 86
[3387282.901704] CPU 1: hi: 186, btch: 31 usd: 0
[3387282.901707] CPU 2: hi: 186, btch: 31 usd: 0
[3387282.901710] CPU 3: hi: 186, btch: 31 usd: 0
[3387282.901712] CPU 4: hi: 186, btch: 31 usd: 0
[3387282.901715] CPU 5: hi: 186, btch: 31 usd: 96
[3387282.901718] CPU 6: hi: 186, btch: 31 usd: 0
[3387282.901721] CPU 7: hi: 186, btch: 31 usd: 16
[3387282.901724] CPU 8: hi: 186, btch: 31 usd: 0
[3387282.901727] CPU 9: hi: 186, btch: 31 usd: 0
[3387282.901730] CPU 10: hi: 186, btch: 31 usd: 0
[3387282.901732] CPU 11: hi: 186, btch: 31 usd: 0
[3387282.901735] CPU 12: hi: 186, btch: 31 usd: 0
[3387282.901738] CPU 13: hi: 186, btch: 31 usd: 78
[3387282.901741] CPU 14: hi: 186, btch: 31 usd: 0
[3387282.901744] CPU 15: hi: 186, btch: 31 usd: 0
[3387282.901746] Node 0 Normal per-cpu:
[3387282.901750] CPU 0: hi: 186, btch: 31 usd: 162
[3387282.901753] CPU 1: hi: 186, btch: 31 usd: 29
[3387282.901756] CPU 2: hi: 186, btch: 31 usd: 40
[3387282.901759] CPU 3: hi: 186, btch: 31 usd: 42
[3387282.901762] CPU 4: hi: 186, btch: 31 usd: 42
[3387282.901765] CPU 5: hi: 186, btch: 31 usd: 221
[3387282.901768] CPU 6: hi: 186, btch: 31 usd: 37
[3387282.901771] CPU 7: hi: 186, btch: 31 usd: 182
[3387282.901774] CPU 8: hi: 186, btch: 31 usd: 0
[3387282.901777] CPU 9: hi: 186, btch: 31 usd: 0
[3387282.901780] CPU 10: hi: 186, btch: 31 usd: 29
[3387282.901783] CPU 11: hi: 186, btch: 31 usd: 22
[3387282.901786] CPU 12: hi: 186, btch: 31 usd: 0
[3387282.901789] CPU 13: hi: 186, btch: 31 usd: 156
[3387282.901792] CPU 14: hi: 186, btch: 31 usd: 6
[3387282.901795] CPU 15: hi: 186, btch: 31 usd: 0
[3387282.901802] active_anon:277242 inactive_anon:22700 isolated_anon:0
[3387282.901804] active_file:5468942 inactive_file:9468439 isolated_file:0
[3387282.901805] unevictable:0 dirty:95 writeback:0 unstable:0
[3387282.901807] free:103654 slab_reclaimable:700786 slab_unreclaimable:89064
[3387282.901808] mapped:3932 shmem:22 pagetables:3338 bounce:0
#系统的静态统计信息(/proc/vmstat, /proc/zoneinfo)。假设free和slab_reclaimable很低。说明物理内存不够了。
[3387282.901811] Node 0 DMA free:15896kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15640kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[3387282.901825] lowmem_reserve[]: 0 1936 64432 64432
#free数据是大的,说明问题不是物理内存不够造成的
[3387282.901830] Node 0 DMA32 free:250560kB min:2028kB low:2532kB high:3040kB active_anon:12kB inactive_anon:116kB active_file:31272kB inactive_file:276136kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1982592kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:846404kB slab_unreclaimable:144884kB kernel_stack:3696kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[3387282.901845] lowmem_reserve[]: 0 0 62496 62496
[3387282.901850] Node 0 Normal free:148160kB min:65536kB low:81920kB high:98304kB active_anon:1108956kB inactive_anon:90684kB active_file:21844496kB inactive_file:37597620kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:63995904kB mlocked:0kB dirty:380kB writeback:0kB mapped:15724kB shmem:88kB slab_reclaimable:1956740kB slab_unreclaimable:211372kB kernel_stack:18960kB pagetables:13352kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[3387282.901864] lowmem_reserve[]: 0 0 0 0
[3387282.901869] Node 0 DMA: 0*4kB 1*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15896kB
[3387282.901883] Node 0 DMA32: 2010*4kB 2207*8kB 4168*16kB 2405*32kB 949*64kB 124*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 250560kB
#里面的0*16KB说明仅仅有0个16KB的内存了,显然问题就发生了。
[3387282.901897] Node 0 Normal: 36611*4kB 16*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 150668kB
[3387282.901917] 14937512 total pagecache pages
[3387282.901920] 8 pages in swap cache
[3387282.901923] Swap cache stats: add 250, delete 242, find 465/466
[3387282.901925] Free swap = 3902516kB
[3387282.901927] Total swap = 3903484kB