systemd中的target systemd target机制

转载

mob6454cc623087 2024-04-28 19:59:31

文章标签 systemd中的target python性能分析 linux #include 内核模块 文章分类 机器学习人工智能

一、 Systemtap简介

systemtap最早用于kernel分析，扩展了utrace/uprobe模块后可以用于监视用户程序。Systemtap允许使用者向内核代码或者用户空间的程序设置一个观测点，当内核代码或者用户程序运行到这个观测点时，使用者有机会执行一个自己编写的内核函数，读取该观测点上下文，进行分析与统计。常见的用法有函数调用的Callgraph生成，程序性能分析时用到的FlameGraph，内存瓶颈分析，网络流量统计等。例如下面的python脚本性能分析时画出的FlameGraph：

systemd中的target systemd target机制_python性能分析

二、 systemtap安装

在debian8上的编译安装，首先下载systemtap-3.1，
确认gcc、make等基础编译环境已经安装好，然后：

$sudo apt-get install gettext
$sudo apt-get install elfutils
$sudo apt-get install libdw-dev
$cd systemtap-3.1
$./configure
$make & sudo make install

编译完成后，在systemtap-3.1目录下会生成stap-prep，执行./stap-prep按照提示安装缺少的linux headers和debug包

三、 systemtap基本使用

Systemtap与ptrace相比，在易用性上有明显优势，ptrace要求写一段复杂的C代码进调用ptrace API完成追踪的功能，而systemtap只需要提供一个高级的awk-like脚本，格式为：

probe event {statements}

其中probe是脚本关键字，表示定义一个探测点，event是需要探测的事件名字，例如syscall.open, process(‘procName’).function(‘funcName’)，一个简单的探测脚本只需要几行：

probe system.open
{
    printf("%s open %s\n", pid(), user_string($filename))
}

四、 systemtap原理

为了实现探测，systemtap主要对脚本进行以下几步处理：

stap命令解析systemtap脚本，在/usr/share/systemtap/tapset/找到输入脚本依赖的所有库脚本
把systemtap脚本翻译成C代码，然后调用GCC把C代码编译成一个内核模块，翻译的过程中需要用到被探测的程序的符号信息，所以目标程序是需要包含debug信息的或者是通过-d参数向systemtap提供符号表信息
Staprun命令通过调用modprobe命令加载编译后的模块到内核，同时打开probes开关
事件触发时就执行我们写的handler，所有事件完成最后卸载内核模块

###内核模块###
内核模块是为了精简linux kernel而设计的，对于一些不一定需要的模块，启动时并不需要加载，等真正需要用的时候再通过内核提供的接口动态加载，可以保证内核比较精简的情况下，又不失可扩展的灵活性。内核模块可以被单独编译，但不能单独运行，必须通过内核加载后才能发挥作用，内核模块并没有main函数，主要有两种写法：

#include <linux/module.h>       /* Needed by all modules */
#include <linux/kernel.h>       /* Needed for KERN_INFO */

int init_module(void)
{
        printk(KERN_INFO "Hello world 1.\n");

        /*  
         * A non 0 return means init_module failed; module can't be loaded. 
         */
        return 0;
}

void cleanup_module(void)
{
        printk(KERN_INFO "Goodbye world 1.\n");
}

上面的写法需要实现两个接口，init_module和cleanup_module，名字不能修改，但在linux2.4以后，支持自己定义内核模块入口等名字，只需要用module_init和module_exit宏来指定：

#include <linux/module.h>       /* Needed by all modules */
#include <linux/kernel.h>       /* Needed for KERN_INFO */
#include <linux/init.h>         /* Needed for the macros */

static int __init hello_2_init(void)
{
        printk(KERN_INFO "Hello, world 2\n");
        return 0;
}

static void __exit hello_2_exit(void)
{
        printk(KERN_INFO "Goodbye, world 2\n");
}

module_init(hello_2_init);
module_exit(hello_2_exit);

systemtap使用的是第一种写法，可以指定参数查看systemtap生成的C代码来确定：
stap -v test.stp -p 3 可以看到生成的C代码中有#include “runtime.h”，其中又包含了

#include <linux/runtime.h>

然后linux/runtime.h中实现了init_module和cleanup_module接口

systemtap使用kprobe模块来监测内核，用uprobe模块监测用户空间程序，两个模块工作原理类似，平时也主要需要调试用户程序，所以只简单写一下uprobes模块的原理，具体使用查看man手册。

uprobe的入口API是：

int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
@inode: the file in which the probe has to be placed.
@offset: offset from the start of the file.
@uc: information on howto handle the probe..

注册一个探测点，其中inode是目标程序的文件inode，offset是探测点距离elf文件加载开始位置的偏移量，uc是自定义的事件处理器，可以发现uprobe是基于文件和符号地址工作的，所以所有从这个elf文件启动的进程都将受uprobe模块监测，并且如果目标进程是多线程的，每个线程都会触发探测点事件。
不难看出，上面这个接口的关键在于找到这个偏移量，那么怎么找这个便宜量呢，举个例子假设我们有一个已经在运行的进程a.out,我们想监测a.out中名字为reg_server的函数，首先我们查看a.out进程的内存空间分布：

zt@debian:~$cat /proc/`pgrep a.out`/maps|grep a.out
08048000-080ac000 r-xp 00000000 08:01 791802     /home/zt/server/bin/a.out
080ac000-080ae000 rw-p 00063000 08:01 791802     /home/zt/server/bin/a.out

第二列是段的权限，可以看出上面的段是只读的，下面的是读写的，所以上面的是.text,下面的是.data，那现在要找的reg_server函数是在.text中，加载的开始地址是0x08048000我们再去看看a.out中这个函数的地址：

zt@debian:~/server$ readelf -Ws ./bin/a.out | c++filt | grep 'reg_server'
   153: 08090183    13 OBJECT  LOCAL  DEFAULT   15 unreg_server
   155: 08090d14    11 OBJECT  LOCAL  DEFAULT   15 reg_server
   848: 08060e80   980 FUNC    GLOBAL DEFAULT   13 unreg_server
  1581: 08061260  1503 FUNC    GLOBAL DEFAULT   13 reg_server

可以看到函数reg_server的地址为0x08061260，最终计算偏移量为0x08061260-0x08048000=0x19260，找到这个地址，uprobe模块会把该地址的指令替换为0xcc（即int 3指令），当程序执行到这里的时候就会产生一个sigtrap信号，用户程序就停到这里，然后执行uprobe_consumer 参数指定的自定义的函数，完成一次事件触发及事件处理

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。