交换分区功能测试,首先打开交换分区:

caozilong@caozilong-Vostro-3268:~/Workspace$ sudo swapon -a
caozilong@caozilong-Vostro-3268:~/Workspace$ sudo swapon -v
NAME TYPE SIZE USED PRIO
/dev/sdb7 partition 6.7G 711.5M -2
caozilong@caozilong-Vostro-3268:~/Workspace$
caozilong@caozilong-Vostro-3268:~/Workspace/linux-compile$ free
总计 已用 空闲 共享 缓冲/缓存 可用
内存: 8058628 766840 6044272 277584 1247516 6721384
交换: 6972412 0 6972412
caozilong@caozilong-Vostro-3268:~/Workspace/linux-compile$

 然后在__swap_writepage中加入调试打印。

Linux交换分区功能以及OOM测试_linux

Linux交换分区功能以及OOM测试_ios_02

Linux交换分区功能以及OOM测试_#include_03

Linux交换分区功能以及OOM测试_ios_04

caozilong@caozilong-Vostro-3268:~/Workspace/linux-compile/linux-5.4.129$ git diff
diff --git a/mm/page_io.c b/mm/page_io.c
index bcf27d057..85290948b 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -276,6 +276,9 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
int ret;
struct swap_info_struct *sis = page_swap_info(page);

+ printf("%s line %d, comm %s.\n", __func__, __LINE__, current->comm);
+ dump_stack();
+
VM_BUG_ON_PAGE(!PageSwapCache(page), page);
if (sis->flags & SWP_FS) {
struct kiocb kiocb;
caozilong@caozilong-Vostro-3268:~/Workspace/linux-compile/linux-5.4.129$

 重新编译内核,重启系统选择新内核:

caozilong@caozilong-Vostro-3268:~/Workspace/linux-compile$ uname -r
5.4.129+
caozilong@caozilong-Vostro-3268:~/Workspace/linux-compile$

开发内存泄漏模型的用户程序:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <time.h>
#include <unistd.h>

int main (void)
{
char *p = NULL;
int count = 1;
while(1){
p = (char *)malloc(1024*1024*100);
if(!p){
printf("malloc error!\n");
return -1;
}
memset(p, 0, 1024*1024*100);
printf("malloc %dM memory\n", 100*count++);
usleep(500000);
}

return 0;
}

运行测试程序

caozilong@caozilong-Vostro-3268:~/Workspace$ ./a.out 
malloc 100M memory
malloc 200M memory
malloc 300M memory
malloc 400M memory
malloc 500M memory
malloc 600M memory
malloc 700M memory
malloc 800M memory
malloc 900M memory
malloc 1000M memory
malloc 1100M memory
malloc 1200M memory
malloc 1300M memory
malloc 1400M memory
malloc 1500M memory
malloc 1600M memory
malloc 1700M memory
malloc 1800M memory
malloc 1900M memory
malloc 2000M memory
malloc 2100M memory
malloc 2200M memory
malloc 2300M memory
malloc 2400M memory

这个时候,由于运行次数比较少,我们在__swap_writepage中添加的打印钩子函数并没有执行,继续执行测试程序。

caozilong@caozilong-Vostro-3268:~/Workspace$ ./a.out 
malloc 100M memory
malloc 200M memory
malloc 300M memory
malloc 400M memory
malloc 500M memory
malloc 600M memory
malloc 700M memory
malloc 800M memory
malloc 900M memory
malloc 1000M memory
malloc 1100M memory
malloc 1200M memory
malloc 1300M memory
malloc 1400M memory
malloc 1500M memory
malloc 1600M memory
malloc 1700M memory
malloc 1800M memory
malloc 1900M memory
malloc 2000M memory
malloc 2100M memory
malloc 2200M memory
malloc 2300M memory
malloc 2400M memory
malloc 2500M memory
malloc 2600M memory
malloc 2700M memory
malloc 2800M memory
malloc 2900M memory
malloc 3000M memory
malloc 3100M memory
malloc 3200M memory
malloc 3300M memory
malloc 3400M memory
malloc 3500M memory
malloc 3600M memory
malloc 3700M memory
malloc 3800M memory
malloc 3900M memory
malloc 4000M memory
malloc 4100M memory
malloc 4200M memory
malloc 4300M memory
malloc 4400M memory
malloc 4500M memory
malloc 4600M memory
malloc 4700M memory
malloc 4800M memory
malloc 4900M memory
malloc 5000M memory
malloc 5100M memory
malloc 5200M memory
malloc 5300M memory
malloc 5400M memory
malloc 5500M memory
malloc 5600M memory
malloc 5700M memory
malloc 5800M memory
malloc 5900M memory
malloc 6000M memory
malloc 6100M memory
malloc 6200M memory
malloc 6300M memory
malloc 6400M memory
malloc 6500M memory
malloc 6600M memory
malloc 6700M memory
malloc 6800M memory
malloc 6900M memory
malloc 7000M memory
malloc 7100M memory
malloc 7200M memory
malloc 7300M memory
malloc 7400M memory
malloc 7500M memory
malloc 7600M memory
malloc 7700M memory
malloc 7800M memory
malloc 7900M memory
malloc 8000M memory
malloc 8100M memory
malloc 8200M memory
malloc 8300M memory
malloc 8400M memory
malloc 8500M memory
malloc 8600M memory
malloc 8700M memory
malloc 8800M memory
malloc 8900M memory
malloc 9000M memory
malloc 9100M memory
malloc 9200M memory
malloc 9300M memory
malloc 9400M memory
malloc 9500M memory
malloc 9600M memory
malloc 9700M memory
malloc 9800M memory
malloc 9900M memory
malloc 10000M memory
malloc 10100M memory
malloc 10200M memory
malloc 10300M memory
malloc 10400M memory
malloc 10500M memory
malloc 10600M memory
malloc 10700M memory
malloc 10800M memory
malloc 10900M memory
malloc 11000M memory
malloc 11100M memory
malloc 11200M memory
malloc 11300M memory
malloc 11400M memory
malloc 11500M memory
malloc 11600M memory
malloc 11700M memory
malloc 11800M memory
malloc 11900M memory
malloc 12000M memory
malloc 12100M memory
malloc 12200M memory
malloc 12300M memory
malloc 12400M memory
malloc 12500M memory
malloc 12600M memory
已杀死
caozilong@caozilong-Vostro-3268:~/Workspace$

可以看到成功的触发了OOM.

那么内核发生了什么呢?

调试发现,调用__swap_writepage的地方有两点,一个是kswapd0线程。

[  460.690149] CPU: 3 PID: 102 Comm: kswapd0 Not tainted 5.4.129+ #25
[ 460.690149] Hardware name: Dell Inc. Vostro 3268/0TJYKK, BIOS 1.11.1 12/11/2018
[ 460.690150] Call Trace:
[ 460.690156] dump_stack+0x6d/0x8b
[ 460.690159] __swap_writepage+0x61/0x450
[ 460.690162] ? smp_call_function_many+0x1de/0x270
[ 460.690164] ? cpumask_next_and+0x1e/0x20
[ 460.690166] ? smp_call_function_many+0x1de/0x270
[ 460.690168] ? __frontswap_store+0x73/0x100
[ 460.690170] swap_writepage+0x34/0x90
[ 460.690173] pageout.isra.58+0x11d/0x350
[ 460.690175] shrink_page_list+0x9eb/0xbb0
[ 460.690177] shrink_inactive_list+0x204/0x3d0
[ 460.690179] shrink_node_memcg+0x3b4/0x820
[ 460.690182] shrink_node+0xb5/0x410
[ 460.690183] ? shrink_node+0xb5/0x410
[ 460.690185] balance_pgdat+0x293/0x5f0
[ 460.690188] kswapd+0x156/0x3c0
[ 460.690190] ? wait_woken+0x80/0x80
[ 460.690192] kthread+0x121/0x140
[ 460.690194] ? balance_pgdat+0x5f0/0x5f0
[ 460.690195] ? kthread_park+0x90/0x90
[ 460.690197] ret_from_fork+0x35/0x40
[ 460.692259] __swap_writepage line 279, comm kswapd0.

另一个是在page_fault中:

[  460.721281] CPU: 0 PID: 273 Comm: systemd-journal Not tainted 5.4.129+ #25
[ 460.721282] Hardware name: Dell Inc. Vostro 3268/0TJYKK, BIOS 1.11.1 12/11/2018
[ 460.721282] Call Trace:
[ 460.721285] dump_stack+0x6d/0x8b
[ 460.721287] __swap_writepage+0x61/0x450
[ 460.721289] ? __frontswap_store+0x73/0x100
[ 460.721290] swap_writepage+0x34/0x90
[ 460.721291] pageout.isra.58+0x11d/0x350
[ 460.721293] shrink_page_list+0x9eb/0xbb0
[ 460.721294] shrink_inactive_list+0x204/0x3d0
[ 460.721296] shrink_node_memcg+0x3b4/0x820
[ 460.721298] shrink_node+0xb5/0x410
[ 460.721299] ? shrink_node+0xb5/0x410
[ 460.721300] do_try_to_free_pages+0xcf/0x380
[ 460.721301] try_to_free_pages+0xee/0x1d0
[ 460.721303] __alloc_pages_slowpath+0x417/0xe50
[ 460.721305] __alloc_pages_nodemask+0x2cd/0x320
[ 460.721306] alloc_pages_current+0x6a/0xe0
[ 460.721307] __page_cache_alloc+0x6a/0xa0
[ 460.721309] __do_page_cache_readahead+0xa5/0x190
[ 460.721310] filemap_fault+0x65c/0xb80
[ 460.721312] ? __switch_to+0x2ce/0x490
[ 460.721313] ? __switch_to+0x2ce/0x490
[ 460.721314] ? devkmsg_poll+0x6b/0xa0
[ 460.721316] ? xas_load+0xc/0x80
[ 460.721317] ? xas_find+0x16f/0x1b0
[ 460.721318] ? filemap_map_pages+0x181/0x3b0
[ 460.721320] ext4_filemap_fault+0x31/0x50
[ 460.721322] __do_fault+0x57/0x110
[ 460.721323] __handle_mm_fault+0xdae/0x1290
[ 460.721325] handle_mm_fault+0xcb/0x210
[ 460.721327] __do_page_fault+0x2a1/0x4d0
[ 460.721328] do_page_fault+0x2c/0xe0
[ 460.721329] page_fault+0x34/0x40
[ 460.721330] RIP: 0033:0x7f5ad4dc7a47

可以看出,随着剩余内存的逐渐减少,__swap_writepage最终被调用。

swap_readpage调用发生在页面缺失异常中,用于恢复页面。

[  679.923286] swap_readpage line356, comm gnome-terminal-.
[ 679.923288] CPU: 0 PID: 2007 Comm: gnome-terminal- Not tainted 5.4.129+ #26
[ 679.923288] Hardware name: Dell Inc. Vostro 3268/0TJYKK, BIOS 1.11.1 12/11/2018
[ 679.923289] Call Trace:
[ 679.923293] dump_stack+0x6d/0x8b
[ 679.923295] swap_readpage+0x52/0x220
[ 679.923297] read_swap_cache_async+0x40/0x60
[ 679.923298] swap_cluster_readahead+0x211/0x2b0
[ 679.923300] ? wait_for_completion+0xc5/0x140
[ 679.923301] ? wake_up_q+0x80/0x80
[ 679.923302] swapin_readahead+0x60/0x4e0
[ 679.923304] ? swapin_readahead+0x60/0x4e0
[ 679.923305] ? pagecache_get_page+0x2c/0x2c0
[ 679.923307] do_swap_page+0x31b/0x960
[ 679.923309] ? do_swap_page+0x31b/0x960
[ 679.923310] ? poll_select_finish+0x210/0x210
[ 679.923312] __handle_mm_fault+0x77d/0x1290
[ 679.923314] handle_mm_fault+0xcb/0x210
[ 679.923315] __do_page_fault+0x2a1/0x4d0
[ 679.923317] do_page_fault+0x2c/0xe0
[ 679.923318] page_fault+0x34/0x40
[ 679.923319] RIP: 0033:0x7fd5bb47b0a0

进程被OOM杀死的堆栈如下:

[  460.720509] CPU: 1 PID: 2435 Comm: a.out Not tainted 5.4.129+ #25
[ 460.720509] Hardware name: Dell Inc. Vostro 3268/0TJYKK, BIOS 1.11.1 12/11/2018
[ 460.720510] Call Trace:
[ 460.720515] dump_stack+0x6d/0x8b
[ 460.720517] dump_header+0x4f/0x200
[ 460.720518] oom_kill_process+0xe6/0x120
[ 460.720520] out_of_memory+0x109/0x510
[ 460.720521] __alloc_pages_slowpath+0xa9b/0xe50
[ 460.720523] __alloc_pages_nodemask+0x2cd/0x320
[ 460.720525] alloc_pages_vma+0x88/0x210
[ 460.720527] __handle_mm_fault+0x87e/0x1290
[ 460.720529] handle_mm_fault+0xcb/0x210
[ 460.720531] __do_page_fault+0x2a1/0x4d0
[ 460.720532] do_page_fault+0x2c/0xe0
[ 460.720534] page_fault+0x34/0x40

Out Of Memory的调用路径:

Linux交换分区功能以及OOM测试_ios_05

由于内存8G,交换空间8G,所以OOM发生在malloc泄漏将近16G的时刻。说明a.out已经将内存空间和交换空间几乎用完了。

receive the kill signal and then exit by call do_exit:

[   94.674292] do_exit line 728, comm woman.
[ 94.674292] CPU: 6 PID: 2235 Comm: woman Tainted: G W 5.4.128+ #2
[ 94.674293] Hardware name: TIMI RedmiBook 14/TM1814, BIOS RMRWL400P0503 11/13/2019
[ 94.674293] Call Trace:
[ 94.674294] dump_stack+0x6d/0x8b
[ 94.674296] do_exit+0xbd2/0xbe0
[ 94.674297] ? mem_cgroup_commit_charge+0x63/0x490
[ 94.674299] ? mem_cgroup_try_charge+0x75/0x190
[ 94.674300] ? mem_cgroup_throttle_swaprate+0x1d/0x140
[ 94.674301] do_group_exit+0x43/0xa0
[ 94.674303] get_signal+0x14f/0x860
[ 94.674305] do_signal+0x34/0x6d0
[ 94.674306] ? handle_mm_fault+0xcb/0x210
[ 94.674307] ? __do_page_fault+0x2be/0x4d0
[ 94.674309] exit_to_usermode_loop+0x90/0x130
[ 94.674310] prepare_exit_to_usermode+0x91/0xa0
[ 94.674311] retint_user+0x8/0x8
[ 94.674312] RIP: 0033:0x7f2cbacc4e6d
[ 94.674313] Code: Bad RIP value.
[ 94.674313] RSP: 002b:00007ffd927178c8 EFLAGS: 00010206
[ 94.674314] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000002103010
[ 94.674314] RDX: 00007f2aa12e0010 RSI: 0000000000000000 RDI: 00007f2aa55dd000
[ 94.674315] RBP: 00007ffd927178e0 R08: 00000000ffffffff R09: 0000000000000000
[ 94.674315] R10: 0000000000000022 R11: 0000000000000246 R12: 0000562a0ff9f660
[ 94.674315] R13: 00007ffd927179c0 R14: 0000000000000000 R15: 0000000000000000
[ 94.790308] oom_reaper: reaped process 2235 (woman), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
czl@czl-RedmiBook-14:~/Workspace/oom$

Tina中,触发OOM现场之后的打印:

[ 7559.251061] kthreadd invoked oom-killer: gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=0, order=1, oom_score_adj=0
[ 7559.275050] COMPACTION is disabled!!!
[ 7559.279345] CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.9.191 #42
[ 7559.286253] Hardware name: sun8iw19
[ 7559.290442] [<c0016190>] (unwind_backtrace) from [<c0013110>] (show_stack+0x10/0x14)
[ 7559.299215] [<c0013110>] (show_stack) from [<c00cc384>] (dump_header.constprop.4+0x84/0x1d0)
[ 7559.308917] [<c00cc384>] (dump_header.constprop.4) from [<c008a240>] (oom_kill_process+0x334/0x554)
[ 7559.319245] [<c008a240>] (oom_kill_process) from [<c008a75c>] (out_of_memory+0x100/0x418)
[ 7559.328480] [<c008a75c>] (out_of_memory) from [<c008f450>] (__alloc_pages_nodemask+0xa28/0xa60)
[ 7559.338458] [<c008f450>] (__alloc_pages_nodemask) from [<c001fda0>] (copy_process.part.3+0x104/0x17c8)
[ 7559.349112] [<c001fda0>] (copy_process.part.3) from [<c00215ac>] (_do_fork+0xa4/0x394)
[ 7559.358122] [<c00215ac>] (_do_fork) from [<c00218e8>] (kernel_thread+0x2c/0x34)
[ 7559.366459] [<c00218e8>] (kernel_thread) from [<c003e848>] (kthreadd+0xe8/0x170)
[ 7559.374852] [<c003e848>] (kthreadd) from [<c000f6c8>] (ret_from_fork+0x14/0x2c)
[ 7559.383172] Mem-Info:
[ 7559.385765] active_anon:58403 inactive_anon:0 isolated_anon:0
[ 7559.385765] active_file:48 inactive_file:57 isolated_file:0
[ 7559.385765] unevictable:0 dirty:0 writeback:0 unstable:0
[ 7559.385765] slab_reclaimable:210 slab_unreclaimable:1178
[ 7559.385765] mapped:43 shmem:0 pagetables:135 bounce:0
[ 7559.385765] free:557 free_pcp:89 free_cma:310
[ 7559.421467] Node 0 active_anon:233612kB inactive_anon:0kB active_file:192kB inactive_file:228kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:172kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:2890 all_unreclaimable? yes
[ 7559.448324] Normal free:2228kB min:1988kB low:2484kB high:2980kB active_anon:233612kB inactive_anon:0kB active_file:192kB inactive_file:228kB unevictable:0kB writepending:0kB present:262144kB managed:251776kB mlocked:0kB slab_reclaimable:840kB slab_unreclaimable:4712kB kernel_stack:432kB pagetables:540kB bounce:0kB free_pcp:356kB local_pcp:356kB free_cma:1240kB
[ 7559.484288] lowmem_reserve[]: 0 0 0
[ 7559.488262] Normal: 11*4kB (M) 3*8kB (MC) 3*16kB (EC) 0*32kB 1*64kB (C) 2*128kB (MC) 1*256kB (M) 1*512kB (M) 1*1024kB (C) 0*2048kB 0*4096kB = 2228kB
[ 7559.503469] 105 total pagecache pages
[ 7559.507659] 0 pages in swap cache
[ 7559.511423] Swap cache stats: add 0, delete 0, find 0/0
[ 7559.517272] Free swap = 0kB
[ 7559.520659] Total swap = 0kB
[ 7559.523904] 65536 pages RAM
[ 7559.527029] 0 pages HighMem/MovableOnly
[ 7559.531442] 2592 pages reserved
[ 7559.534958] 1024 pages cma reserved
[ 7559.538863] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 7559.548570] [ 872] 0 872 286 19 4 0 0 0 adbd
[ 7559.558215] [ 882] 0 882 169 39 3 0 0 0 swupdate-progre
[ 7559.568940] [ 889] 0 889 230 11 4 0 0 0 sh
[ 7559.578368] [ 920] 0 920 229 10 4 0 0 0 sh
[ 7559.587817] [29127] 0 29127 58506 58382 117 0 0 0 a.out
[ 7559.597554] Out of memory: Kill process 29127 (a.out) score 901 or sacrifice child
[ 7559.606161] Killed process 29127 (a.out) total-vm:234024kB, anon-rss:233384kB, file-rss:144kB, shmem-rss:0kB
Killed
root@(none):/mnt/extsd#

我们分析一下,a.out在被杀死前一刻,RSS表示的真实分配的物理页面个数,它为58382个page.

那么就是说有58382*4=233528K的物理内存分配。这和anon-rss:233384kB的物理分配非常接近.

OOM的控制:

内核检测到系统内存不足、挑选并杀掉某个进程的过程可以参考内核源代码 linux/mm/oom_kill.c,当系统内存不足的时候,out_of_memory() 被触发,然后调用 select_bad_process() 选择一个 “bad” 进程杀掉,如何判断和选择一个 “bad” 进程呢,总不能随机选吧?挑选的过程由 oom_badness() 决定,挑选的算法和想法都很简单很朴实:最 bad 的那个进程就是那个最占用内存的进程。

每个进程的/proc/$PID/目录下均有三个文件,用来展示OOM的产生逻辑以及状态

czl@czl-VirtualBox:~$ ls -l /proc/1/oom*
-rw-r--r-- 1 root root 0 8月 23 15:15 /proc/1/oom_adj
-r--r--r-- 1 root root 0 8月 23 15:15 /proc/1/oom_score
-rw-r--r-- 1 root root 0 8月 23 15:15 /proc/1/oom_score_adj
czl@czl-VirtualBox:~$

它们对应内核TCB中的结果成员signal_struct中的如下member:

Linux交换分区功能以及OOM测试_#include_06

其中

/proc/$PID/oom_adj:用来控制每个进程的 oom_adj 内核参数来决定哪些进程不这么容易被 OOM killer 选中杀掉,oom_adj的范围为[-17,15],其中15最大-16最小,-17为禁止使用OOM,至于为什么用-17而不用其他数值(默认值为0),这个是由linux内核定义的,查看内核源码可知:路径为linux-xxxxx/include /uapi/linux/oom.h

Linux交换分区功能以及OOM测试_#include_07

 /proc/$PID/oom_score_adj:它的功能类似于oom_adj,如果说oom_score_adj是打分的话,oom_adj则是挡位。oom_score_adj的范围是[-1000, 1000],当为-1000时,关闭OOM机制。

总结:

  • 写oom_score_adj时,内核里都记录在变量 ​​task->signal->oom_score_adj​​ 中;
  • 读oom_score_adj时,从内核的变量 ​​task->signal->oom_score_adj​​ 中读取;
  • 写oom_adj时,也是记录到变量 ​​task->signal->oom_score_adj​​ 中,会根据oom_adj值按比例换算成oom_score_adj。
  • 读oom_adj时,也是从内核变量 ​​task->signal->oom_score_adj​​ 中读取,只不过显示时又按比例换成oom_adj的范围。

它们的关系如下:

root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
0
0
0
root@czl-VirtualBox:/home/czl# echo -17 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-17
0
-1000
root@czl-VirtualBox:/home/czl# echo -16 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-15
0
-941
root@czl-VirtualBox:/home/czl# echo -15 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-14
0
-882
root@czl-VirtualBox:/home/czl# echo -14 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-13
0
-823
root@czl-VirtualBox:/home/czl# echo -13 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-12
0
-764
root@czl-VirtualBox:/home/czl# echo -12 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-11
0
-705
root@czl-VirtualBox:/home/czl# echo -11 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-10
0
-647
root@czl-VirtualBox:/home/czl# echo -10 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-9
0
-588
root@czl-VirtualBox:/home/czl# echo -9 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-8
0
-529
root@czl-VirtualBox:/home/czl# echo -8 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-7
0
-470
root@czl-VirtualBox:/home/czl# echo -7 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-6
0
-411
root@czl-VirtualBox:/home/czl# echo -6 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-5
0
-352
root@czl-VirtualBox:/home/czl# echo -5 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-4
0
-294
root@czl-VirtualBox:/home/czl# echo -4 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-3
0
-235
root@czl-VirtualBox:/home/czl# echo -3 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-2
0
-176
root@czl-VirtualBox:/home/czl# echo -2 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
-1
0
-117
root@czl-VirtualBox:/home/czl# echo -1 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
0
0
-58
root@czl-VirtualBox:/home/czl# echo 0 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
0
0
0
root@czl-VirtualBox:/home/czl# echo 1 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
0
0
58
root@czl-VirtualBox:/home/czl# echo 2 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
1
0
117
root@czl-VirtualBox:/home/czl# echo 3 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
2
0
176
root@czl-VirtualBox:/home/czl# echo 4 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
3
0
235
root@czl-VirtualBox:/home/czl# echo 5 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
4
0
294
root@czl-VirtualBox:/home/czl# echo 6 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
5
0
352
root@czl-VirtualBox:/home/czl# echo 7 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
6
0
411
root@czl-VirtualBox:/home/czl# echo 8 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
7
0
470
root@czl-VirtualBox:/home/czl# echo 9 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
8
0
529
root@czl-VirtualBox:/home/czl# echo 10 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
9
0
588
root@czl-VirtualBox:/home/czl# echo 11 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
10
0
647
root@czl-VirtualBox:/home/czl# echo 12 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
11
0
705
root@czl-VirtualBox:/home/czl# echo 13 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
12
0
764
root@czl-VirtualBox:/home/czl# echo 14 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
13
0
823
root@czl-VirtualBox:/home/czl# echo 15 >/proc/1/oom_adj
root@czl-VirtualBox:/home/czl# cat /proc/1/oom*
15
0
1000
root@czl-VirtualBox:/home/czl#

/proc/$pid/oom_score:是真是的分数,它在用户态是只读的,只有内核会写.用户空间可以通过操作每个进程的 oom_score_adj 内核参数来调整进程的分数,比如查看进程号为981的 omm_score,这个分数被上面提到的 omm_score_adj 参数调整后(-15),就变成了3:

# cat /proc/981/oom_score
18

# echo -15 > /proc/981/oom_score_adj
# cat /proc/981/oom_score
3

oom_score_adj/oom_adj的写逻辑是这样实现的:

Linux交换分区功能以及OOM测试_#include_08

Linux交换分区功能以及OOM测试_ios_09

Tina中的测试:

测试代码,每次泄露1M,在256M V833上测试.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <time.h>
#include <unistd.h>

int main (void)
{
char *p = NULL;
int count = 1;
while(1){
p = (char *)malloc(1024*1024*1);
if(!p){
printf("malloc error!\n");
return -1;
}
memset(p, 0, 1024*1024*1);
printf("malloc %dM memory\n", 1*count++);
usleep(500000);
}

return 0;
}

以下脚本用于每隔1S输出oom数据,当score变为890的时候,OOM发生,应用被杀死.

root@(none):/# while true; do echo /proc/`pidof a.out`/oom*;cat /proc/`pidof a.o
ut`/oom*;sleep 1;done
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
216
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
224
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
232
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
239
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
247
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
255
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
263
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
271
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
279
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
287
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
295
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
303
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
311
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
323
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
331
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
339
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
347
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
355
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
363
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
370
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
378
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
386
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
394
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
402
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
410
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
418
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
426
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
438
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
446
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
454
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
462
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
470
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
478
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
486
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
493
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
501
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
509
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
517
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
525
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
533
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
541
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
553
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
561
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
569
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
577
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
585
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
593
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
601
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
609
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
616
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
624
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
632
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
640
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
648
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
660
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
668
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
676
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
684
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
692
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
700
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
708
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
716
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
724
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
732
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
739
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
747
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
755
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
763
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
775
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
783
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
791
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
799
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
807
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
815
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
823
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
831
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
839
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
847
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
855
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
862
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
870
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
878
0
/proc/29152/oom_adj /proc/29152/oom_score /proc/29152/oom_score_adj
0
890
0

关于和交换线程kswaperd的联系,可以简单叙述如下,当出现OOM之前,系统检测到内存不足,会唤醒kswaperd,执行交换操作,这里的交换操作,并不一定是写入交换分区,因为小型嵌入式系统一般不启动交换分区,但是kswaperd一样会被唤醒,唤醒后,它执行的动作类似于fluser线程,就是将脏页回刷,将page cache腾出来给系统用。

唤醒kswaperd的地方有很多,基本上会被串到大部分内存分配的调用路径当中:

Linux交换分区功能以及OOM测试_#include_10

Linux交换分区功能以及OOM测试_#include_11

Linux交换分区功能以及OOM测试_linux_12

kswapper的初始化:

每个具备N_MEMORY状态的NODE节点创建一个kswapd守护线程,. The kswapd of node 0 is kswapd0, the kswapd of node 1 is kswapd1, and so on. Each kswapd calls kthread_run() after created.

Linux交换分区功能以及OOM测试_ios_13

swapper换出的关键函数是pageout:

Linux交换分区功能以及OOM测试_#include_14


结束!