Linux下各种监控系统的命令小结

linux里面监控了各种系统资源,也采取了不同的命令,这里说下几种常用的命令

1、top命令
以下是top命令中,几个关键字段的含义
Table 17-1. top Column headers
HeaderMeaning
PID Tasks process ID. This unique identifier allows you to manipulate a task.
USER The user name of the tasks owner, the account it runs as.
PR The task priority.
NI The task niceness, an indication of how willing this task is to yield CPU cycles to other tasks. A lower or negative niceness means a high priority.
VIRTThe total amount of memory used by the task, including shared and swap memory.
RESThe total amount of physical memory used by the task, excluding swap memory.
SHRThe amount of shared memory used by the task. This memory is usually allocated by libraries and also usable by other tasks.
S Task status. This indicates whether a task is running (R), sleeping (D or S), stopped (T),or zombie (Z).
%CPU Percentage of available CPU cycles this task has used since the last screen update.
%MEM Percentage of available RAM used by this task.--记住,这里不包括swap
TIME+ Total CPU time the task has used since it started.
COMMAND The name of the task being monitored.
----------------------------------------------------------------------------------------------

还有一点要记住,top命令对于cpu不是叠加的,也就是说,如果有8个cpu,那么,有可能top出来的某个pid的cpu占有率已经100%了,我们只能说这个程序耗掉了一个cpu的资源,其它cpu可能是空闲的。

2、vmstat
以下是从man vmstat里面摘抄出来的vmstat的解释:
Procs--process
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep.
Memory
swpd: the amount of virtual memory used.
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)
Swap
si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).
IO
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).
System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.
CPU
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
这里,我重点说下swap、IO和CPU这三个值
首先说下swap和CPU的wa
当os出现性能下降时,我们通过vmstat观察发现swap的si和so两项值特别高时,我们就应该注意,是不是os的内存不够用,导致了过度的swap的page in和page out,而当swap的si和so值较高时,往往CPU的wa值也很高,这个wa值表示cpu花在等待swap数据的时间。所以,一旦这三个指标都很高时,就要注意,应用是否占用了过多的内存,导致了swap不断得page in和page out
接着,再来说下IO列
Bi表示读入数据,bo表示写入数据
这个值监控系统的IO状态,我们可以根据vmstat的采样时间,和bo或bi的数据,估算下,这段时间内,系统的IO负载,然后根据硬盘的最大读写负载,看看,是否有IO问题。
3、iostat
这个命令比较简单,查看IO的,基本上都应该看得懂
4、mpstat
mpstat是Multiprocessor Statistics的缩写,是实时系统监控工具。其报告与CPU的一些统计信息,这些信息存放在/proc/stat文件中。在多CPUs系统里,其不但能查看所有CPU的平均状况信息,而且能够查看特定CPU的信息。下面只介绍mpstat与CPU相关的参数,
CPU处理器ID
user在internal时间段里,用户态的CPU时间(%),不包含nice值为负进程?usr/?total*100
nice在internal时间段里,nice值为负进程的CPU时间(%)?nice/?total*100
system在internal时间段里,核心时间(%)?system/?total*100
iowait在internal时间段里,硬盘IO等待时间(%)?iowait/?total*100
irq在internal时间段里,软中断时间(%)?irq/?total*100
soft在internal时间段里,软中断时间(%)?softirq/?total*100
idle在internal时间段里,CPU除去等待磁盘IO操作外的因为任何原因而空闲的时间闲置时间(%)?idle/?total*100
intr/s在internal时间段里,每秒CPU接收的中断的次数?intr/?total*100
mpstat的语句如下:
$mpstat –查看所有cpu的汇总信息
$mpstat –P ALL分开显示不同cpu的汇总信息
$mpstat –P 1只显示cpu 1的汇总信息
5、pidstat
可以列出具体的某个进程所使用的IO、cpu、memory等等
6、netstat
列出网络资源状态
7、sar
collects, reports and saves system activity information (CPU, memory, disks, interrupts, network interfaces, TTY, kernel tables,etc.)

可以说,sar的功能是很强大的,它会收集历史的统计信息,保存到文本文档或者二进制的文档中,以后要看时,就可以随时查看了。