Linux性能监控及定位（具体方法）

原创

wx63186321c235c 2022-09-08 10:14:14 博主文章分类：性能测试 ©著作权

©著作权归作者所有：来自51CTO博客作者wx63186321c235c的原创作品，请联系作者获取转载授权，否则将追究法律责任

一、负载高

场景一：CPU密集型进程

第一个终端运行 stress 命令，模拟一个 CPU 使用率 100% 的场景

stress --cpu 1 --timeout 600

[root@iz2ze2w3v37sit3vf71kuez ~]# stress --cpu 1 --timeout 600 stress: info: [20859] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd

在第二个终端运行 top 查看平均负载的变化情况

在第三个终端运行 mpstat 查看 CPU 使用率的变化情况：

mpstat -P ALL 5

-P ALL 表示监控所有cpu

5 表示间隔5秒输出一次数据

Linux性能监控及定位（具体方法）_java

我的机器只有一个CPU。下图是一个多cpu截图（来源与网络

Linux性能监控及定位（具体方法）_java_02

终端二中可以看到，1 分钟的平均负载会慢慢增加到 1.00，而从终端三中还可以看到，正好有一个 CPU 的使用率为 100%，但它的 iowait 只有 0。

这说明，平均负载的升高正是由于 CPU 使用率为 100%

使用pidstat 来查询哪个进程在使用cpu：

Linux性能监控及定位（具体方法）_top命令_03

场景二：IO密集型进程

1、加压stress -i 1 --timeout 600

[root@iz2ze2w3v37sit3vf71kuez ~]# stress -i 1 --timeout 600
stress: info: [21497] dispatching hogs: 0 cpu, 1 io, 0 vm, 0 hdd

2、top

3、mpstat 查看 CPU 使用率的变化情况

[root@iz2ze2w3v37sit3vf71kuez ~]# mpstat -P ALL 5
Linux 3.10.0-514.26.2.el7.x86_64 (iz2ze2w3v37sit3vf71kuez)  2021年08月17日   _x86_64_  (1 CPU)

13时41分08秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
13时41分13秒  all    1.00    0.00   96.79    2.21    0.00    0.00    0.00    0.00    0.00    0.00
13时41分13秒    0    1.00    0.00   96.79    2.21    0.00    0.00    0.00    0.00    0.00    0.00

13时41分13秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
13时41分18秒  all    0.80    0.00   96.38    2.82    0.00    0.00    0.00    0.00    0.00    0.00
13时41分18秒    0    0.80    0.00   96.38    2.82    0.00    0.00    0.00    0.00    0.00    0.00

4、查看哪个进程占用cpu

[root@iz2ze2w3v37sit3vf71kuez ~]# pidstat -u 5 1
Linux 3.10.0-514.26.2.el7.x86_64 (iz2ze2w3v37sit3vf71kuez)  2021年08月17日   _x86_64_  (1 CPU)

13时42分36秒   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
13时42分41秒     0       253    0.00    0.80    0.00    0.00    0.80     0  kworker/0:1H
13时42分41秒     0      2270    0.20    0.20    0.00    0.00    0.40     0  AliSecGuard
13时42分41秒     0     13308    0.00    0.20    0.00    0.00    0.20     0  java
13时42分41秒     0     21498    0.20   91.00    0.00    1.60   91.20     0  stress
13时42分41秒     0     21578    0.00    0.20    0.00    0.00    0.20     0  pidstat
13时42分41秒     0     30350    0.40    0.40    0.00    0.80    0.80     0  AliYunDun

平均时间:   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
平均时间:     0       253    0.00    0.80    0.00    0.00    0.80     -  kworker/0:1H
平均时间:     0      2270    0.20    0.20    0.00    0.00    0.40     -  AliSecGuard
平均时间:     0     13308    0.00    0.20    0.00    0.00    0.20     -  java
平均时间:     0     21498    0.20   91.00    0.00    1.60   91.20     -  stress
平均时间:     0     21578    0.00    0.20    0.00    0.00    0.20     -  pidstat
平均时间:     0     30350    0.40    0.40    0.00    0.80    0.80     -  AliYunDun

案例分析