我们经常使用top命令来查看CPU利用率,如
root@ubuntu:~# top 
top – 09:16:29 up 6 min, 4 users, load average: 0.01, 0.22, 0.17 
Tasks: 149 total, 1 running, 147 sleeping, 0 stopped, 1 zombie 
Cpu(s): 2.8%us, 6.7%sy, 0.2%ni, 89.9%id, 0.3%wa, 0.0%hi, 0.1%si, 0.0%st 
Mem: 508000k total, 404092k used, 103908k free, 47764k buffers 
Swap: 522236k total, 0k used, 522236k free, 184992k cached 
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
1 root 20 0 3040 1812 1252 S 0.0 0.4 0:01.81 init 
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 
3 root 20 0 0 0 0 S 0.0 0.0 0:00.06 ksoftirqd/0 
5 root 20 0 0 0 0 S 0.0 0.0 0:00.56 kworker/u:0 
6 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 
 
  Linux系统中计算CPU利用率是通过读取/proc/stat文件数据而计算得来。CPU利用率计算方法如下:
root@ubuntu:~# cat /proc/stat
cpu 711 56 2092 7010 104 0 20 0 0 0

cpu0 711 56 2092 7010 104 0 20 0 0 0

intr 31161 94 64 0 1 75 0 3 0 0 0 0 0 1423 0 0 382 2825 4798 0 226 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

ctxt 101085

btime 1307117390
processes 2078

procs_running 1

procs_blocked 0

softirq 32534 0 7796 151 143 4225 0 81 0 12 20126

root@ubuntu:~#
 
  第一行cpu为总的信息,cpu0 … cpun为各个具体CPU信息
cpu 711 56 2092 7010 104 0 20 0 0 0
 
  上面共有10个值(单位:jiffies),前面8个值分别为:
User time, 711 Nice time, 56
System time, 2092 Idle time,7010
Waiting time,104 Hard Irq time, 0
SoftIRQ time,20 Steal time,0
 
  CPU时间=user+system+nice+idle+iowait+irq+softirq+Stl
%us=(User time + Nice time)/CPU时间*100%
%sy=(System time + Hard Irq time +SoftIRQ time)/CPU时间*100%
%id=(Idle time)/CPU时间*100%
%ni=(Nice time)/CPU时间*100%
%wa=(Waiting time)/CPU时间*100%
%hi=(Hard Irq time)/CPU时间*100%
%si=(SoftIRQ time)/CPU时间*100%
%st=(Steal time)/CPU时间*100%
 
  我们根据/proc/stat文件来分析Linux内核统计数据实现方式。
 
  内核实现
下面以内核源码版本2.6.32-71.29.1.el6 x86_64为例,来介绍内核源码实现。
/proc/stat文件的创建由函数proc_stat_init()实现,在文件fs/proc/stat.c中,在内核初始化时调用。./proc/stat文件相关函数时间均在stat.c文件中。
 
  对/proc/stat文件的读写方法为proc_stat_operations。
00160: static const struct file_operations proc_stat_operations = {
00161:    .open    = stat_open,
00162:    .read    = seq_read,
00163:    .llseek    = seq_lseek,
00164:    .release    = single_release,
00165: };
打开文件函数stat_open(),函数首先申请大小为size的内存,来存放临时数据(也是我们看到的stat里的最终数据)。
00136: static int stat_open(struct inode *inode, struct file *file)
00137: {
00138:    unsigned size = 4096 * (1 + num_possible_cpus() / 32);
00139:    char *buf;
00140:    struct seq_file *m;
00141:    int res;
00142:
00143:    / * don’t ask for more than the kmalloc() max size, currently 128 KB */
00144:    if (size > 128 * 1024)
00145:    size = 128 * 1024;
00146:    buf = kmalloc(size, GFP_KERNEL);
00147:    if (! buf)
00148:    return – ENOMEM;
00149:
00150:    res = single_open(file, show_stat, NULL);
00151:    if (! res) {
00152:    m = file– >private_data;
00153:    m– >buf = buf;
00154:    m– >size = size;
00155:    } else
00156:    kfree(buf);
00157:    return res;
00158: } ? end stat_open ?
00159:
/proc/stat文件的数据由show_stat()函数填充。注意43行for_each_possible_cpu(i)循环,
 是计算所有CPU的数据,如我们前面的示例看到的/proc/stat文件中第一行cpu值。
00025: static int show_stat(struct seq_file *p, void *v)
00026: {
00027:    int i, j;
00028:    unsigned long jif;
00029:    cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
00030:    cputime64_t guest;
00031:    u64 sum = 0;
00032:    u64 sum_softirq = 0;
00033:    unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
00034:    struct timespec boottime;
00035:    unsigned int per_irq_sum;
00036:
00037:    user = nice = system = idle = iowait =
00038:    irq = softirq = steal = cputime64_zero;
00039:    guest = cputime64_zero;
00040:    getboottime(&boottime);
00041:    jif = boottime.tv_sec;
00042:
00043:    for_each_possible_cpu(i) {
00044:    user = cputime64_add(user, kstat_cpu(i).cpustat.user);
00045:    nice = cputime64_add(nice, kstat_cpu(i).cpustat.nice);
00046:    system = cputime64_add(system, kstat_cpu(i).cpustat.system);
00047:    idle = cputime64_add(idle, kstat_cpu(i).cpustat.idle);
00048:    idle = cputime64_add(idle, arch_idle_time(i));
00049:    iowait = cputime64_add(iowait, kstat_cpu(i).cpustat.iowait);
00050:    irq = cputime64_add(irq, kstat_cpu(i).cpustat.irq);
00051:    softirq = cputime64_add(softirq, kstat_cpu(i).cpustat.softirq);
00052:    steal = cputime64_add(steal, kstat_cpu(i).cpustat.steal);
00053:    guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
00054:    for_each_irq_nr(j) {
00055:    sum += kstat_irqs_cpu(j, i);
00056:    }
 
  计算总的CPU各个值user、nice、system、idle、iowait、irq、softirq、steal后,就分别计算各个CPU的使用情况(78~103行)。
00057:    sum += arch_irq_stat_cpu(i);
00058:
00059:    for (j = 0; j < NR_SOFTIRQS; j++) {
00060:    unsigned int softirq_stat = kstat_softirqs_cpu(j, i);
00061:
00062:    per_softirq_sums[j] += softirq_stat;
00063:    sum_softirq += softirq_stat;
00064:    }
00065:    }
00066:    sum += arch_irq_stat();
00067:
00068:    seq_printf(p, “cpu %llu %llu %llu %llu %llu %llu %llu %llu %llu\n”,
00069:    (unsigned long long)cputime64_to_clock_t(user),
00070:    (unsigned long long)cputime64_to_clock_t(nice),
00071:    (unsigned long long)cputime64_to_clock_t(system),
00072:    (unsigned long long)cputime64_to_clock_t(idle),
00073:    (unsigned long long)cputime64_to_clock_t(iowait),
00074:    (unsigned long long)cputime64_to_clock_t(irq),
00075:    (unsigned long long)cputime64_to_clock_t(softirq),
00076:    (unsigned long long)cputime64_to_clock_t(steal),
00077:    (unsigned long long)cputime64_to_clock_t(guest));
00078:    for_each_online_cpu(i) {
00079:
00080:    / * Copy values here to work around gcc- 2.95.3, gcc- 2.96 */
00081:    user = kstat_cpu(i).cpustat.user;
00082:    nice = kstat_cpu(i).cpustat.nice;
00083:    system = kstat_cpu(i).cpustat.system;
00084:    idle = kstat_cpu(i).cpustat.idle;
00085:    idle = cputime64_add(idle, arch_idle_time(i));
00086:    iowait = kstat_cpu(i).cpustat.iowait;
00087:    irq = kstat_cpu(i).cpustat.irq;
00088:    softirq = kstat_cpu(i).cpustat.softirq;
00089:    steal = kstat_cpu(i).cpustat.steal;
00090:    guest = kstat_cpu(i).cpustat.guest;
00091:    seq_printf(p,
00092:    “cpu%d %llu %llu %llu %llu %llu %llu %llu %llu %llu\n”,
00093:    i,
00094:    (unsigned long long)cputime64_to_clock_t(user),
00095:    (unsigned long long)cputime64_to_clock_t(nice),
00096:    (unsigned long long)cputime64_to_clock_t(system),
00097:    (unsigned long long)cputime64_to_clock_t(idle),
00098:    (unsigned long long)cputime64_to_clock_t(iowait),
00099:    (unsigned long long)cputime64_to_clock_t(irq),
00100:    (unsigned long long)cputime64_to_clock_t(softirq),
00101:    (unsigned long long)cputime64_to_clock_t(steal),
00102:    (unsigned long long)cputime64_to_clock_t(guest));
00103:    }
00104:    seq_printf(p, “intr %llu”, (unsigned long long)sum);
00105:
00106:    / * sum again ? it could be updated? */
00107:    for_each_irq_nr(j) {
00108:    per_irq_sum = 0;
00109:    for_each_possible_cpu(i)
00110:    per_irq_sum += kstat_irqs_cpu(j, i);
00111:
00112:    seq_printf(p, ” %u”, per_irq_sum);
00113:    }
00114:
00115:    seq_printf(p,
00116:    “\nctxt %llu\n”
00117:    “btime %lu\n”
00118:    “processes %lu\n”
00119:    “procs_running %lu\n”
00120:    “procs_blocked %lu\n”,
00121:    nr_context_switches(),
00122:    (unsigned long)jif,
00123:    total_forks ,
00124:    nr_running(),
00125:    nr_iowait());
00126:
00127:    seq_printf(p, “softirq %llu”, (unsigned long long)sum_softirq);
00128:
00129:    for (i = 0; i < NR_SOFTIRQS; i++)
00130:    seq_printf(p, ” %u”, per_softirq_sums[i]);
00131:    seq_printf(p, “\n”);
00132:
00133:    return 0;
00134: } ? end show_stat ?
00135:
104
行计算所有CPU上中断次数,107~113行计算CPU上每个中断向量的中断次数。注意:/proc/stat文件中,将所有可能的NR_IRQS个中
断向量计数都记录下来,但我们的机器上通过只是用少量的中断向量,这就是看到/proc/stat文件中,intr一行后面很多值为0的原因。
show_stat()函数最后获取进程切换次数nctxt、内核启动的时间btime、所有创建的进程processes、正在运行进程的数量procs_running、阻塞的进程数量procs_blocked和所有io等待的进程数量。
 
  最后我们解释一下user、nice、system、idle、iowait、irq、softirq、steal值的含义:
• 用户时间(User time)
表示CPU执行用户进程的时间,包括nices时间。通常期望用户空间CPU越高越好。
• 系统时间(System time)
表示CPU在内核运行时间,包括IRQ和softirq时间。系统CPU占用率高,表明系统某部分存在瓶颈。通常值越低越好。
• 等待时间(Waiting time)
CPI在等待I/O操作完成所花费的时间。系统部应该花费大量时间来等待I/O操作,否则就说明I/O存在瓶颈。
• 空闲时间(Idle time)
系统处于空闲期,等待进程运行。
• Nice时间(Nice time)
系统调整进程优先级所花费的时间。
• 硬中断处理时间(Hard Irq time)
系统处理硬中断所花费的时间。
• 软中断处理时间(SoftIrq time)
系统处理软中断中断所花费的时间。
• 丢失时间(Steal time)
被强制等待(involuntary wait)虚拟CPU的时间,此时hypervisor在为另一个虚拟处理器服务