虽然I/O的性能指标很多,响应的性能分析工具也有好几个,但他们之间有一定关联
找出I/O性能瓶颈后,下一步就是优化了,也就是如何以最快的速度完成I/O操作,或者换个思路,减少甚至避免磁盘的I/O操作
I/O基准测试
优化之前,首先问自己,I/O性能优化的目标是什么,换句话说,我们观察到的这些I/O性能指标(IOPS,吞吐量,延迟等),要达到多少才算合适
I/O性能指标对于每个应用场景,使用的文件系统和物理磁盘等,都会不一样
为了更客户观评估优化效果,首先要对磁盘和文件系统进程基准测试,得到文件系统或磁盘I/O的极限性能
fio(Flexible I/O Tester)正是最常用的文件系统和磁盘I/O性能基准测试工具,它提供了大量可定制化选项,用来测试,躶盘或者文件系统在各种场景下的I/O性能,包括了不同块大小,不同I/O引擎以及是否使用缓存等场景
fio的安装
yum install -y fio
安装完成后,可以man fio看下使用方法
fio的选项非常多,这里介绍一些常用的场景,包括随机读,随机写,顺序读,顺序写等
# 随机读
fio -name=randread -direct=1 -iodepth=64 -rw=randread -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=/dev/sdb
# 随机写
fio -name=randwrite -direct=1 -iodepth=64 -rw=randwrite -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=/dev/sdb
# 顺序读
fio -name=read -direct=1 -iodepth=64 -rw=read -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=/dev/sdb
# 顺序写
fio -name=write -direct=1 -iodepth=64 -rw=write -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=/dev/sdb
有几个重点需要关注的参数
- direct,表示是否跳过系统缓存,上面例子中,设置了1 表示跳过系统缓存
- iodepth,表示使用异步I/O(asynchronous I/O,AIO)时,同时发出的I/O请求上限,上面例子是64
- rw,表示I/O模式,read/write分别表示顺序读/写,randread/randwrite分别表示随机读/写
- ioengine,表示I/O引擎,它支持同步(sync),异步(libaio),内存映射(mmap),网络(net)等各种引擎
- bs,表示I/O的大小,上面例子设置了4K(这是默认值)
- filename,表示文件路径,可以是磁盘路径(测试磁盘性能),也可以是文件路径(测试文件系统性能),上面例子中设置成了磁盘/dev/sda1,
需要注意! 用磁盘路径测试写,会破坏这个磁盘中的文件系统,如果只有一块磁盘,则不能测试!
测试随机读(无缓冲)
fio -name=readread -direct=1 -iodepth=64 -rw=randread -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=/dev/vda5
readread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.0.13
Starting 1 process
Jobs: 1 (f=1): [r] [100.0% done] [34424K/0K/0K /s] [8606 /0 /0 iops] [eta 00m:00s]
readread: (groupid=0, jobs=1): err= 0: pid=63799: Wed Jan 30 13:11:57 2019
read : io=1024.0MB, bw=16877KB/s, iops=4219 , runt= 62131msec
slat (usec): min=1 , max=570 , avg=10.47, stdev= 5.24
clat (usec): min=1 , max=798566 , avg=15142.21, stdev=29041.26
lat (usec): min=26 , max=798571 , avg=15153.09, stdev=29041.29
clat percentiles (usec):
| 1.00th=[ 46], 5.00th=[ 116], 10.00th=[ 159], 20.00th=[ 548],
| 30.00th=[ 2736], 40.00th=[ 4384], 50.00th=[ 6496], 60.00th=[ 9408],
| 70.00th=[13760], 80.00th=[20608], 90.00th=[35584], 95.00th=[56064],
| 99.00th=[140288], 99.50th=[189440], 99.90th=[337920], 99.95th=[395264],
| 99.99th=[536576]
bw (KB/s) : min=13240, max=35384, per=99.88%, avg=16856.03, stdev=3536.84
lat (usec) : 2=0.01%, 20=0.01%, 50=1.44%, 100=2.81%, 250=11.94%
lat (usec) : 500=3.19%, 750=2.09%, 1000=0.79%
lat (msec) : 2=3.79%, 4=11.44%, 10=23.95%, 20=17.74%, 50=14.81%
lat (msec) : 100=4.14%, 250=1.63%, 500=0.23%, 750=0.01%, 1000=0.01%
cpu : usr=2.05%, sys=6.90%, ctx=222511, majf=0, minf=86
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=262144/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=1024.0MB, aggrb=16876KB/s, minb=16876KB/s, maxb=16876KB/s, mint=62131msec, maxt=62131msec
Disk stats (read/write):
vda: ios=260420/83, merge=41/77, ticks=3958195/5556, in_queue=3964522, util=99.85%
这个报告中,重点关注的是 slat,clat,lat,以及bw,iops这几行
- slat,是指从I/O提交到实际执行的I/O的时长(Submission latency)
- clat,是指从I/O提交到I/O完成的时长(Completion latncy)
- lat,是指从fio创建I/O到I/O完成的总时长。需要注意,对同步I/O来说,由于I/O提交和I/O完成是一个动作,所以slat实际上就是I/O完成的时间,而clat是0,使用异步I/O,lat近似等于slat+clat之和
- bw,代表吞吐量,上面测试结果是 16856.03,也就是16856.03/1024.0大约16M
- iops,也就是每秒I/O的次数,测试结果是iops=4219
通常情况下,应用程序的I/O都是读写并行的,而且每次I/O大小也不一定相同,刚刚说的这几种场景,并不能精确模拟应用程序的I/O模型
fio支持I/O的重放,借助blktrace,再配合fio,就可以实现对应用程序I/O模拟的基准测试
需要先用blktrace,记录磁盘设备的I/O访问情况,然后使用fio,重放blktrace的记录
使用如下命令
测试随机写(无缓冲)
fio -name=randwrite -direct=1 -iodepth=64 -rw=randwrite -ioengine=sync -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=/data0/test.log
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=64
fio-2.0.13
Starting 1 process
randwrite: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [w] [100.0% done] [0K/20183K/0K /s] [0 /5045 /0 iops] [eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=9291: Wed Jan 30 14:24:36 2019
write: io=1024.0MB, bw=13599KB/s, iops=3399 , runt= 77106msec
clat (usec): min=72 , max=122468 , avg=289.68, stdev=1456.56
lat (usec): min=72 , max=122469 , avg=290.18, stdev=1456.56
clat percentiles (usec):
| 1.00th=[ 104], 5.00th=[ 131], 10.00th=[ 145], 20.00th=[ 163],
| 30.00th=[ 177], 40.00th=[ 189], 50.00th=[ 201], 60.00th=[ 213],
| 70.00th=[ 227], 80.00th=[ 245], 90.00th=[ 270], 95.00th=[ 298],
| 99.00th=[ 1368], 99.50th=[ 1688], 99.90th=[24192], 99.95th=[28544],
| 99.99th=[45312]
bw (KB/s) : min= 1595, max=21272, per=100.00%, avg=13609.72, stdev=2728.73
lat (usec) : 100=0.63%, 250=81.97%, 500=16.05%, 750=0.04%, 1000=0.05%
lat (msec) : 2=0.83%, 4=0.03%, 10=0.08%, 20=0.15%, 50=0.15%
lat (msec) : 100=0.01%, 250=0.01%
cpu : usr=1.94%, sys=10.20%, ctx=262266, majf=0, minf=24
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
WRITE: io=1024.0MB, aggrb=13599KB/s, minb=13599KB/s, maxb=13599KB/s, mint=77106msec, maxt=77106msec
Disk stats (read/write):
vda: ios=0/261817, merge=0/7986, ticks=0/74016, in_queue=73888, util=94.04%
测试随机写(有缓冲)
fio -name=randwrite -iodepth=64 -rw=randwrite -ioengine=sync -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=/data0/test_2.log
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=64
fio-2.0.13
Starting 1 process
randwrite: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1)
randwrite: (groupid=0, jobs=1): err= 0: pid=10909: Wed Jan 30 14:33:48 2019
write: io=1024.0MB, bw=878940KB/s, iops=219735 , runt= 1193msec
clat (usec): min=1 , max=623 , avg= 3.58, stdev= 3.24
lat (usec): min=2 , max=623 , avg= 3.77, stdev= 3.33
clat percentiles (usec):
| 1.00th=[ 2], 5.00th=[ 2], 10.00th=[ 2], 20.00th=[ 2],
| 30.00th=[ 3], 40.00th=[ 3], 50.00th=[ 3], 60.00th=[ 3],
| 70.00th=[ 3], 80.00th=[ 4], 90.00th=[ 5], 95.00th=[ 9],
| 99.00th=[ 15], 99.50th=[ 17], 99.90th=[ 21], 99.95th=[ 23],
| 99.99th=[ 44]
bw (KB/s) : min=850264, max=917736, per=100.00%, avg=884000.00, stdev=47709.91
lat (usec) : 2=0.01%, 4=79.89%, 10=15.40%, 20=4.53%, 50=0.17%
lat (usec) : 100=0.01%, 250=0.01%, 750=0.01%
cpu : usr=17.37%, sys=82.38%, ctx=23, majf=0, minf=24
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
WRITE: io=1024.0MB, aggrb=878940KB/s, minb=878940KB/s, maxb=878940KB/s, mint=1193msec, maxt=1193msec
Disk stats (read/write):
vda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%