在内核层面上分析事件有很多的工具:​​SystemTap​​​、​​ktap​​​、​​Sysdig​​​、​​LTTNG​​ 等等,你也可以在网络上找到关于这些工具的大量介绍文章和资料。

而对于使用 Linux 原生机制去跟踪系统事件以及检索/分析故障信息的方面的资料却很少找的到。这就是 ​​ftrace​​,它是添加到内核中的第一款跟踪工具,今天我们来看一下它都能做什么.

ftrace 是 Function Trace 的简写,但它能做的远不止这些:它可以跟踪上下文切换、测量进程阻塞时间、计算高优先级任务的活动时间等等。

ftrace 是由 Steven Rostedt 开发的,从 2008 年发布的内核 2.6.27 中开始就内置了。这是为记录数据提供的一个调试 Ring 缓冲区的框架。这些数据由集成到内核中的跟踪程序来采集。

ftrace 工作在 debugfs 文件系统上,在大多数现代 Linux 发行版中都默认挂载了。要开始使用 ftrace,你将进入到 ​​sys/kernel/debug/tracing​​ 目录(仅对 root 用户可用):

第一步:确认机器已经挂载debugfs

利用ftrace进行内核性能分析_数据

第二步: 查看系统当前支持的tracer

root@czl-VirtualBox:/sys/kernel/debug/tracing# cat available_tracers 
hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop
root@czl-VirtualBox:/sys/kernel/debug/tracing#

第三步:设置其中一个tracer:

echo wakeup_rt >current_tracer 
echo 1 > tracing_on
echo 0 > tracing_on

在echo 1和 echo 0中间停留片刻,等待ftrace将ringbuffer填充

之后执行如下命令查看结果:

cat trace
root@czl-VirtualBox:/sys/kernel/debug/tracing# cat trace
# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 5.4.0-81-generic
# --------------------------------------------------------------------
# latency: 112 us, #154/154, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:1)
# -----------------
# | task: migration/0-12 (uid:0 nice:0 policy:1 rt_prio:99)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
<idle>-0 0dNh. 1us+: 0:120:R + [000] 12: 0:R migration/0
<idle>-0 0dNh. 20us : <stack trace>
=> __trace_stack
=> probe_wakeup
=> ttwu_do_wakeup
=> ttwu_do_activate
=> try_to_wake_up
=> wake_up_q
=> cpu_stop_queue_work
=> stop_one_cpu_nowait
=> watchdog_timer_fn
=> __hrtimer_run_queues
=> hrtimer_interrupt
=> smp_apic_timer_interrupt
=> apic_timer_interrupt
=> mwait_idle
=> arch_cpu_idle
=> default_idle_call
=> do_idle
=> cpu_startup_entry
=> rest_init
=> arch_call_rest_init
=> start_kernel
=> x86_64_start_reservations
=> x86_64_start_kernel
=> secondary_startup_64
<idle>-0 0dNh. 21us : ttwu_do_activate <-try_to_wake_up
<idle>-0 0dNh. 21us : _raw_spin_unlock_irqrestore <-try_to_wake_up
<idle>-0 0dNh. 22us : ktime_get <-watchdog_timer_fn
<idle>-0 0dNh. 22us : hrtimer_forward <-watchdog_timer_fn
<idle>-0 0dNh. 23us : _raw_spin_lock_irq <-__hrtimer_run_queues
<idle>-0 0dNh. 23us : enqueue_hrtimer <-__hrtimer_run_queues
<idle>-0 0dNh. 26us : __remove_hrtimer <-__hrtimer_run_queues
<idle>-0 0dNh. 27us : _raw_spin_unlock_irqrestore <-__hrtimer_run_queues
<idle>-0 0dNh. 27us : tick_sched_timer <-__hrtimer_run_queues
<idle>-0 0dNh. 28us : ktime_get <-tick_sched_timer
<idle>-0 0dNh. 28us : tick_sched_do_timer <-tick_sched_timer
<idle>-0 0dNh. 29us : tick_do_update_jiffies64.part.14 <-tick_sched_do_timer
<idle>-0 0dNh. 29us : _raw_spin_lock <-tick_do_update_jiffies64.part.14
<idle>-0 0dNh. 29us : do_timer <-tick_do_update_jiffies64.part.14
<idle>-0 0dNh. 30us : calc_global_load <-do_timer
<idle>-0 0dNh. 30us : update_wall_time <-tick_do_update_jiffies64.part.14
<idle>-0 0dNh. 31us : timekeeping_advance <-update_wall_time
<idle>-0 0dNh. 31us : _raw_spin_lock_irqsave <-timekeeping_advance
<idle>-0 0dNh. 32us : ntp_tick_length <-timekeeping_advance
<idle>-0 0dNh. 32us : ntp_tick_length <-timekeeping_advance
<idle>-0 0dNh. 33us : timekeeping_update <-timekeeping_advance
<idle>-0 0dNh. 33us : ntp_get_next_leap <-timekeeping_update
<idle>-0 0dNh. 33us : update_vsyscall <-timekeeping_update
<idle>-0 0dNh. 34us : raw_notifier_call_chain <-timekeeping_update
<idle>-0 0dNh. 34us : notifier_call_chain <-raw_notifier_call_chain
<idle>-0 0dNh. 35us : update_fast_timekeeper <-timekeeping_update
<idle>-0 0dNh. 35us : update_fast_timekeeper <-timekeeping_update
<idle>-0 0dNh. 36us : _raw_spin_unlock_irqrestore <-timekeeping_advance
<idle>-0 0dNh. 36us : tick_sched_handle <-tick_sched_timer
<idle>-0 0dNh. 36us : update_process_times <-tick_sched_handle
<idle>-0 0dNh. 37us : account_process_tick <-update_process_times
<idle>-0 0dNh. 37us : run_local_timers <-update_process_times
<idle>-0 0dNh. 38us : hrtimer_run_queues <-run_local_timers
<idle>-0 0dNh. 38us : raise_softirq <-run_local_timers
<idle>-0 0dNh. 38us : rcu_sched_clock_irq <-update_process_times
<idle>-0 0dNh. 39us : rcu_qs <-rcu_sched_clock_irq
<idle>-0 0dNh. 39us : invoke_rcu_core <-rcu_sched_clock_irq
<idle>-0 0dNh. 40us : raise_softirq <-invoke_rcu_core
<idle>-0 0dNh. 40us : scheduler_tick <-update_process_times
<idle>-0 0dNh. 41us : _raw_spin_lock <-scheduler_tick
<idle>-0 0dNh. 41us : update_rq_clock <-scheduler_tick
<idle>-0 0dNh. 41us : update_rq_clock.part.82 <-update_rq_clock
<idle>-0 0dNh. 42us : task_tick_idle <-scheduler_tick
<idle>-0 0dNh. 42us : calc_global_load_tick <-scheduler_tick
<idle>-0 0dNh. 43us : trigger_load_balance <-scheduler_tick
<idle>-0 0dNh. 43us : run_posix_cpu_timers <-update_process_times
<idle>-0 0dNh. 44us : profile_tick <-tick_sched_handle
<idle>-0 0dNh. 44us : hrtimer_forward <-tick_sched_timer
<idle>-0 0dNh. 45us : _raw_spin_lock_irq <-__hrtimer_run_queues
<idle>-0 0dNh. 45us : enqueue_hrtimer <-__hrtimer_run_queues
<idle>-0 0dNh. 46us : hrtimer_update_next_event <-hrtimer_interrupt
<idle>-0 0dNh. 46us : __hrtimer_get_next_event <-hrtimer_update_next_event
<idle>-0 0dNh. 47us : __hrtimer_next_event_base <-__hrtimer_get_next_event
<idle>-0 0dNh. 47us : __hrtimer_get_next_event <-hrtimer_update_next_event
<idle>-0 0dNh. 47us : __hrtimer_next_event_base <-__hrtimer_get_next_event
<idle>-0 0dNh. 48us : _raw_spin_unlock_irqrestore <-hrtimer_interrupt
<idle>-0 0dNh. 48us : tick_program_event <-hrtimer_interrupt
<idle>-0 0dNh. 49us : clockevents_program_event <-tick_program_event
<idle>-0 0dNh. 49us : ktime_get <-clockevents_program_event
<idle>-0 0dNh. 49us : lapic_next_event <-clockevents_program_event
<idle>-0 0dNh. 57us : irq_exit <-smp_apic_timer_interrupt
<idle>-0 0dN.. 57us : ksoftirqd_running <-irq_exit
<idle>-0 0dN.. 58us : __do_softirq <-irq_exit
<idle>-0 0.Ns. 58us : run_timer_softirq <-__do_softirq
<idle>-0 0.Ns. 59us : _raw_spin_lock_irq <-run_timer_softirq
<idle>-0 0.Ns. 59us : _raw_spin_lock_irq <-run_timer_softirq
<idle>-0 0.Ns. 60us : rcu_core_si <-__do_softirq
<idle>-0 0.Ns. 60us : rcu_core <-rcu_core_si
<idle>-0 0.Ns. 60us : note_gp_changes <-rcu_core
<idle>-0 0.Ns. 61us : _raw_spin_lock_irqsave <-rcu_core
<idle>-0 0dNs. 61us : rcu_accelerate_cbs <-rcu_core
<idle>-0 0dNs. 62us : rcu_segcblist_pend_cbs <-rcu_accelerate_cbs
<idle>-0 0dNs. 62us : rcu_report_qs_rnp <-rcu_core
<idle>-0 0dNs. 63us : _raw_spin_unlock_irqrestore <-rcu_report_qs_rnp
<idle>-0 0.Ns. 63us : rcu_gp_kthread_wake <-rcu_report_qs_rnp
<idle>-0 0.Ns. 64us : swake_up_one <-rcu_gp_kthread_wake
<idle>-0 0.Ns. 64us : _raw_spin_lock_irqsave <-swake_up_one
<idle>-0 0dNs. 64us : swake_up_locked.part.4 <-swake_up_one
<idle>-0 0dNs. 65us : wake_up_process <-swake_up_locked.part.4
<idle>-0 0dNs. 65us : try_to_wake_up <-wake_up_process
<idle>-0 0dNs. 66us : _raw_spin_lock_irqsave <-try_to_wake_up
<idle>-0 0dNs. 66us : _raw_spin_lock <-try_to_wake_up
<idle>-0 0dNs. 66us : update_rq_clock <-try_to_wake_up
<idle>-0 0dNs. 67us : update_rq_clock.part.82 <-update_rq_clock
<idle>-0 0dNs. 67us : ttwu_do_activate <-try_to_wake_up
<idle>-0 0dNs. 68us : activate_task <-ttwu_do_activate
<idle>-0 0dNs. 68us : psi_task_change <-activate_task
<idle>-0 0dNs. 68us : record_times <-psi_task_change
<idle>-0 0dNs. 69us : enqueue_task_fair <-activate_task
<idle>-0 0dNs. 70us : enqueue_entity <-enqueue_task_fair
<idle>-0 0dNs. 70us : update_curr <-enqueue_entity
<idle>-0 0dNs. 71us : __update_load_avg_se <-update_load_avg
<idle>-0 0dNs. 71us : __accumulate_pelt_segments <-__update_load_avg_se
<idle>-0 0dNs. 72us : __update_load_avg_cfs_rq <-update_load_avg
<idle>-0 0dNs. 72us : __accumulate_pelt_segments <-__update_load_avg_cfs_rq
<idle>-0 0dNs. 73us : update_cfs_group <-enqueue_entity
<idle>-0 0dNs. 73us : account_entity_enqueue <-enqueue_entity
<idle>-0 0dNs. 74us : __enqueue_entity <-enqueue_entity
<idle>-0 0dNs. 74us : hrtick_update <-enqueue_task_fair
<idle>-0 0dNs. 75us : ttwu_do_wakeup <-ttwu_do_activate
<idle>-0 0dNs. 75us : check_preempt_curr <-ttwu_do_wakeup
<idle>-0 0dNs. 76us : resched_curr <-check_preempt_curr
<idle>-0 0dNs. 78us : _raw_spin_unlock_irqrestore <-try_to_wake_up
<idle>-0 0dNs. 78us : _raw_spin_unlock_irqrestore <-swake_up_one
<idle>-0 0.Ns. 79us : rcu_segcblist_ready_cbs <-rcu_core
<idle>-0 0.Ns. 79us : rcu_segcblist_ready_cbs <-rcu_core
<idle>-0 0dNs. 79us : rcu_segcblist_extract_done_cbs <-rcu_core
<idle>-0 0.Ns. 80us : rcu_cblist_dequeue <-rcu_core
<idle>-0 0.Ns. 81us : kfree <-rcu_core
<idle>-0 0.Ns. 82us : __slab_free <-kfree
<idle>-0 0.Ns. 82us : put_cpu_partial <-__slab_free
<idle>-0 0.Ns. 83us : rcu_cblist_dequeue <-rcu_core
<idle>-0 0.Ns. 83us : kfree <-rcu_core
<idle>-0 0.Ns. 84us : __slab_free <-kfree
<idle>-0 0.Ns. 84us : put_cpu_partial <-__slab_free
<idle>-0 0.Ns. 85us : rcu_cblist_dequeue <-rcu_core
<idle>-0 0.Ns. 86us : kfree <-rcu_core
<idle>-0 0.Ns. 86us : __slab_free <-kfree
<idle>-0 0.Ns. 86us : put_cpu_partial <-__slab_free
<idle>-0 0.Ns. 87us : rcu_cblist_dequeue <-rcu_core
<idle>-0 0.Ns. 87us : kfree <-rcu_core
<idle>-0 0.Ns. 87us : __slab_free <-kfree
<idle>-0 0.Ns. 88us : rcu_cblist_dequeue <-rcu_core
<idle>-0 0.Ns. 88us : kfree <-rcu_core
<idle>-0 0.Ns. 89us : rcu_cblist_dequeue <-rcu_core
<idle>-0 0.Ns. 89us : kfree <-rcu_core
<idle>-0 0.Ns. 90us : rcu_cblist_dequeue <-rcu_core
<idle>-0 0.Ns. 90us : kfree <-rcu_core
<idle>-0 0.Ns. 90us : __slab_free <-kfree
<idle>-0 0.Ns. 91us : rcu_cblist_dequeue <-rcu_core
<idle>-0 0.Ns. 91us : kfree <-rcu_core
<idle>-0 0.Ns. 91us : rcu_cblist_dequeue <-rcu_core
<idle>-0 0dNs. 92us : rcu_segcblist_insert_done_cbs <-rcu_core
<idle>-0 0dNs. 92us : rcu_segcblist_insert_count <-rcu_core
<idle>-0 0.Ns. 93us : rcu_segcblist_ready_cbs <-rcu_core
<idle>-0 0dN.. 93us : idle_cpu <-irq_exit
<idle>-0 0dN.. 94us : rcu_irq_exit <-irq_exit
<idle>-0 0.N.. 96us : rcu_idle_exit <-do_idle
<idle>-0 0dN.. 96us : rcu_dynticks_eqs_exit <-rcu_idle_exit
<idle>-0 0.N.. 97us : arch_cpu_idle_exit <-do_idle
<idle>-0 0.N.. 97us : tick_nohz_idle_exit <-do_idle
<idle>-0 0.N.. 97us : sched_ttwu_pending <-do_idle
<idle>-0 0.N.. 98us : schedule_idle <-do_idle
<idle>-0 0dN.. 98us : rcu_note_context_switch <-__schedule
<idle>-0 0dN.. 99us : rcu_qs <-rcu_note_context_switch
<idle>-0 0dN.. 99us : _raw_spin_lock <-__schedule
<idle>-0 0dN.. 100us : update_rq_clock <-__schedule
<idle>-0 0dN.. 100us : put_prev_task_idle <-__schedule
<idle>-0 0dN.. 101us : pick_next_task_stop <-__schedule
<idle>-0 0d... 102us : __schedule <-schedule_idle
<idle>-0 0d... 102us : 0:120:R ==> [000] 12: 0:R migration/0
<idle>-0 0d... 112us : <stack trace>
=> __trace_stack
=> probe_wakeup_sched_switch
=> __schedule
=> schedule_idle
=> do_idle
=> cpu_startup_entry
=> rest_init
=> arch_call_rest_init
=> start_kernel
=> x86_64_start_reservations
=> x86_64_start_kernel
=> secondary_startup_64
root@czl-VirtualBox:/sys/kernel/debug/tracing#

2.functin_graph tracer

echo function_graph > current_tracer
echo 1 > tracing_on
echo 0 > tracing_on

之后cat trace查看:

0)               |  do_syscall_64() {
0) | __x64_sys_close() {
0) | __close_fd() {
0) 0.188 us | _raw_spin_lock();
0) | filp_close() {
0) 0.191 us | dnotify_flush();
0) 0.191 us | locks_remove_posix();
0) | fput() {
0) 0.188 us | fput_many();
0) 0.556 us | }
0) 1.675 us | }
0) 2.425 us | }
0) 2.802 us | }
0) 0.188 us | fpregs_assert_state_consistent();
0) 3.949 us | }
0) | do_syscall_64() {
0) | __x64_sys_write() {
0) | ksys_write() {
0) | __fdget_pos() {
0) 0.203 us | __fget_light();
0) 0.599 us | }
0) | vfs_write() {
0) | rw_verify_area() {
0) | security_file_permission() {
0) | apparmor_file_permission() {
0) | common_file_perm() {
0) 0.197 us | aa_file_perm();
0) 0.615 us | }
0) 0.997 us | }
0) 1.423 us | }
0) 1.813 us | }
0) | __sb_start_write() {
0) | _cond_resched() {
0) 0.188 us | rcu_all_qs();
0) 0.559 us | }
0) 0.930 us | }
0) | __vfs_write() {
0) | __check_object_size() {
0) 0.196 us | check_stack_object();
0) 0.593 us | }
0) | mutex_lock() {
0) | _cond_resched() {
0) 0.193 us | rcu_all_qs();
0) 0.591 us | }
0) 0.962 us | }
root@czl-VirtualBox:/sys/kernel/debug/tracing#

3.wakeup_dl

同样的操作,这个没有内容.

echo wakeup_dl >current_tracer
......
root@czl-VirtualBox:/sys/kernel/debug/tracing# cat trace
# tracer: wakeup_dl
#
root@czl-VirtualBox:/sys/kernel/debug/tracing#

同样的方式可以追踪其它几个,这里就不一一尝试了.


结束!