ceph分布式文件存储性能调优

  • 一、硬件调优
  • 二、BIOS配置
  • 三、网络配置
  • 四、OS配置
  • 五、硬盘调度算法
  • 六、软件层面
  • 七、ceph参数调优


一、硬件调优

1、NVMe SSD 调优
● 目的
为减少数据跨片开销。
● 方法
将NVMe SSD与网卡插在统一Riser卡。

2、内存插法调优
● 目的
内存按1dpc方式插将获得最佳性能,即将DIMM0插满,此时内存带宽最大。
● 方法
优先插入DIMM0,即插入DIMM000、010、020、030、040、050、100、110、
120、130、140、150插槽。三位数字中,第一位代表所属CPU,第二位代表内存
通道,第三位代表DIMM,优先将第三位为0的插槽按内存通道从小到大依次插
入。

3、public网卡和cluster网卡均衡
将public网卡和cluster网卡插在不同cpu下。

二、BIOS配置

1、Power Policy:performance
2、内存刷新速率:64ms
3、SMMU Disable
4、CPU预取打开

三、网络配置

1、Bond模式
public-bond和cluster-bond均组2个10GE口成一个bond。bond具体配置参数如下。

BONDING_OPTS="mode=2 miimon=1000 xmit_hash_policy=layer3+4"

2、网卡参数配置
调节MTU和网卡队列大小,脚本如下。

for i in `ifconfig | grep flags | awk '{print $1}' | sed "s/://g"`; do ifconfig $i  mtu 9000 up ; done;

for i in `ifconfig | grep flags | awk '{print $1}' | sed "s/://g"`
do 
	ethtool -G $i rx 4096
	ethtool -G $i tx 4096
done;

3、关闭系统中断均衡服务

systemctl stop irqbalance
systemctl disable irqbalance

4、打开lro

ethtool -K enp130s0f0 lro on

查看是否打开:

ethtool -k enp130s0f0 | grep large-receive-offload

5、ring_buffer调整

ethtool -G enp130s0f0 rx 4096 tx 4096

查看:ethtool -g enp130s0f0

6、网卡软中断绑核
①. 关闭irqbalance服务。
②. 查询网卡归属于哪个NUMA节点。

cat /sys/class/net/enp130s0f0/device/numa_node

③. 查询该NUMA节点对应哪些CPU core。

lscpu

④. 查询网卡中断号。

ls /sys/class/net/enp189s0f1/device/msi_irqs/
cat /proc/interrupts | grep enp130s0f0 | awk -F ':' '{print $1}'

⑤. 将软中断绑定到该NUMA节点对应的core上。

echo <core编号> > /proc/irq/ <中断号> smp_affinity_list。

四、OS配置

将如下参数放入/etc/profile里面,并执行source /etc/profile。

ulimit -u 1000000
ulimit -n 1000000
ulimit -d unlimited
ulimit -m unlimited
ulimit -s unlimited
ulimit -t unlimited
ulimit -v unlimited
ulimit -l 1024000

五、硬盘调度算法

1、将hdd的调度算法修改为mq-deadline:

echo deadline > /sys/block/sda/queue/scheduler
echo deadline > /sys/block/sdb/queue/scheduler
echo deadline > /sys/block/sdc/queue/scheduler
echo deadline > /sys/block/sdd/queue/scheduler
echo deadline > /sys/block/sde/queue/scheduler
echo deadline > /sys/block/sdf/queue/scheduler
echo deadline > /sys/block/sdg/queue/scheduler
echo deadline > /sys/block/sdh/queue/scheduler
echo deadline > /sys/block/sdi/queue/scheduler
echo deadline > /sys/block/sdj/queue/scheduler
echo deadline > /sys/block/sdk/queue/scheduler
echo deadline > /sys/block/sdl/queue/scheduler
echo deadline > /sys/block/sdm/queue/scheduler
echo deadline > /sys/block/sdn/queue/scheduler
echo deadline > /sys/block/sdo/queue/scheduler
echo deadline > /sys/block/sdp/queue/scheduler
echo deadline > /sys/block/sdq/queue/scheduler
echo deadline > /sys/block/sdr/queue/scheduler
echo deadline > /sys/block/sds/queue/scheduler
echo deadline > /sys/block/sdt/queue/scheduler

2、将SSD的调度算法修改为none:

echo none > /sys/block/nvme0n1/queue/scheduler
echo none > /sys/block/nvme0n2/queue/scheduler

六、软件层面

1、 Kernel pid max 设置内核PID上限到最大值

echo 4194303 > /proc/sys/kernel/pid_max

2、 设置MTU,交换机端需要支持该功能,系统网卡设置才有效果
配置文件追加MTU=9000

3、 read_ahead, 通过数据预读并且记载到随机访问内存方式提高磁盘读操作

echo "8192" > /sys/block/sda/queue/read_ahead_kb
echo "8192" > /sys/block/sdb/queue/read_ahead_kb
echo "8192" > /sys/block/sdc/queue/read_ahead_kb
echo "8192" > /sys/block/sdd/queue/read_ahead_kb
echo "8192" > /sys/block/sde/queue/read_ahead_kb
echo "8192" > /sys/block/sdf/queue/read_ahead_kb
echo "8192" > /sys/block/sdg/queue/read_ahead_kb
echo "8192" > /sys/block/sdh/queue/read_ahead_kb
echo "8192" > /sys/block/sdi/queue/read_ahead_kb
echo "8192" > /sys/block/sdj/queue/read_ahead_kb
echo "8192" > /sys/block/sdk/queue/read_ahead_kb
echo "8192" > /sys/block/sdl/queue/read_ahead_kb
echo "8192" > /sys/block/sdm/queue/read_ahead_kb
echo "8192" > /sys/block/sdn/queue/read_ahead_kb
echo "8192" > /sys/block/sdo/queue/read_ahead_kb
echo "8192" > /sys/block/sdp/queue/read_ahead_kb
echo "8192" > /sys/block/sdq/queue/read_ahead_kb
echo "8192" > /sys/block/sdr/queue/read_ahead_kb
echo "8192" > /sys/block/sds/queue/read_ahead_kb
echo "8192" > /sys/block/sdt/queue/read_ahead_kb
echo "8192" > /sys/block/nvme0n1/queue/read_ahead_kb
echo "8192" > /sys/block/nvme1n1/queue/read_ahead_kb

4、 swappiness, 关闭虚拟内存

echo “vm.swappiness = 0″/etc/sysctl.conf ;  sysctl –p

5、bcache顺序中断

echo 0 > /sys/block/bcache0/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache1/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache2/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache3/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache4/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache5/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache6/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache7/bcache/sequential_cutoff
echo 0 > /sys/block/bcache8/bcache/sequential_cutoff
echo 0 > /sys/block/bcache9/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache10/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache11/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache12/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache13/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache14/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache15/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache16/bcache/sequential_cutoff
echo 0 > /sys/block/bcache17/bcache/sequential_cutoff
echo 0 > /sys/block/bcache18/bcache/sequential_cutoff 
echo 0 > /sys/block/bcache19/bcache/sequential_cutoff

6、bcache配置

for var in `ls -d /sys/fs/bcache/*`
do
echo 0 >$var/congested_read_threshold_us 
echo 0 >$var/congested_write_threshold_us
done

7、设置最小回刷速度为128k(默认8)

echo 512 > /sys/block/sda/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdb/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdc/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdd/bcache/writeback_rate_minimum
echo 512 > /sys/block/sde/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdf/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdg/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdh/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdi/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdj/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdk/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdl/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdm/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdn/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdo/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdp/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdq/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdr/bcache/writeback_rate_minimum
echo 512 > /sys/block/sds/bcache/writeback_rate_minimum
echo 512 > /sys/block/sdt/bcache/writeback_rate_minimum

8、为所有块设备开启writeback模式

echo writeback > /sys/block/sda/bcache/cache_mode
echo writeback > /sys/block/sdb/bcache/cache_mode
echo writeback > /sys/block/sdc/bcache/cache_mode
echo writeback > /sys/block/sdd/bcache/cache_mode
echo writeback > /sys/block/sde/bcache/cache_mode
echo writeback > /sys/block/sdf/bcache/cache_mode
echo writeback > /sys/block/sdg/bcache/cache_mode
echo writeback > /sys/block/sdh/bcache/cache_mode
echo writeback > /sys/block/sdi/bcache/cache_mode
echo writeback > /sys/block/sdj/bcache/cache_mode
echo writeback > /sys/block/sdk/bcache/cache_mode
echo writeback > /sys/block/sdl/bcache/cache_mode
echo writeback > /sys/block/sdm/bcache/cache_mode
echo writeback > /sys/block/sdn/bcache/cache_mode
echo writeback > /sys/block/sdo/bcache/cache_mode
echo writeback > /sys/block/sdp/bcache/cache_mode
echo writeback > /sys/block/sdq/bcache/cache_mode
echo writeback > /sys/block/sdr/bcache/cache_mode
echo writeback > /sys/block/sds/bcache/cache_mode
echo writeback > /sys/block/sdt/bcache/cache_mode

9、IO路径跟踪

for var in `ls -d /sys/fs/bcache/*`
do
echo 0 >$var/congested_read_threshold_us 
echo 0 >$var/congested_write_threshold_us
done

10、脏数据回写比例

echo 5 > /proc/sys/vm/dirty_background_ratio
echo 10 > /proc/sys/vm/dirty_ratio

11、bcache配置(默认10、30)

for f in `ls -d /sys/block/bcache*`
do
        echo writeback > $f/bcache/cache_mode
        echo 20 > $f/bcache/writeback_percent
        echo 80 > $f/bcache/writeback_delay
done

12、所有进程打开文件数量file-max设置
设置为cat /proc/meminfo | grep MemTotal |awk '{print$2}'所查看到的值
执行:echo ${file-max} > /proc/sys/fs/file-max file-max为cat /proc/meminfo | grep MemTotal |awk '{print$2}'所查看到的值

13、nr_requests(默认256)
查看:cat /sys/block/sdb/queue/nr_requests 设置:echo 512 > /sys/block/sdb/queue/nr_requests

echo 512 > /sys/block/sda/queue/nr_requests
echo 512 > /sys/block/sdb/queue/nr_requests
echo 512 > /sys/block/sdc/queue/nr_requests
echo 512 > /sys/block/sdd/queue/nr_requests
echo 512 > /sys/block/sde/queue/nr_requests
echo 512 > /sys/block/sdf/queue/nr_requests
echo 512 > /sys/block/sdg/queue/nr_requests
echo 512 > /sys/block/sdh/queue/nr_requests
echo 512 > /sys/block/sdi/queue/nr_requests
echo 512 > /sys/block/sdj/queue/nr_requests
echo 512 > /sys/block/sdk/queue/nr_requests
echo 512 > /sys/block/sdl/queue/nr_requests
echo 512 > /sys/block/sdm/queue/nr_requests
echo 512 > /sys/block/sdn/queue/nr_requests
echo 512 > /sys/block/sdo/queue/nr_requests
echo 512 > /sys/block/sdp/queue/nr_requests
echo 512 > /sys/block/sdq/queue/nr_requests
echo 512 > /sys/block/sdr/queue/nr_requests
echo 512 > /sys/block/sds/queue/nr_requests
echo 512 > /sys/block/sdt/queue/nr_requests

七、ceph参数调优

[global]
osd pool default size=3
osd memory target=4294967296
osd pool default min size=1
max open files=131072
osd memory target=4294967296

[mon]
mon clock drift allowed=1
mon osd min down reporters=13
mon osd down out interval=600

[osd]
osd journal size=20000
osd max write size=512
osd client message size cap=2147483648
osd deep scrub stride=131072
osd op threads=16
osd disk threads=4
osd map cache size=1024
osd map cache bl size=128
osd recovery op priority=2
osd recovery max active=10
osd max backfills=4
osd min pg log entries=30000
osd max pg log entries=100000
osd mon heartbeat interval=40
ms dispatch throttle bytes=1048576000
objecter inflight ops=819200
osd op log threshold=50
osd crush chooseleaf type=0
journal max write bytes=1073714824
journal max write entries=10000
journal queue max ops=50000
journal queue max bytes=10485760000

[client]
rbd cache=True
rbd cache size=335544320
rbd cache max dirty=134217728
rbd cache max dirty age=30
rbd cache writethrough until flush=False
rbd cache max dirty object=2
rbd cache target dirty=235544320