[root@node141 ~]# ceph health detail HEALTH_ERR 2 scrub errors; Possible data damage: 2 pgs inconsistent OSD_SCRUB_ERRORS 2 scrub errors PG_DAMAGED Possible data damage: 2 pgs inconsistent pg 3.3e is active+clean+inconsistent, acting [11,17,4] pg 3.42 is active+clean+inconsistent, acting [17,6,0]

官网故障解决方案: https://ceph.com/geen-categorie/ceph-manually-repair-object/

步骤如下:

(1)找出异常的PG,然后找对对应的osd,在对应的主机上进行修复

[root@node140 /]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 8.71826 root default
-2 3.26935 host node140
0 hdd 0.54489 osd.0 up 1.00000 1.00000 1 hdd 0.54489 osd.1 up 1.00000 1.00000 2 hdd 0.54489 osd.2 up 1.00000 1.00000 3 hdd 0.54489 osd.3 up 1.00000 1.00000 4 hdd 0.54489 osd.4 up 1.00000 1.00000 5 hdd 0.54489 osd.5 up 1.00000 1.00000 -3 3.26935 host node141
12 hdd 0.54489 osd.12 up 1.00000 1.00000 13 hdd 0.54489 osd.13 up 1.00000 1.00000 14 hdd 0.54489 osd.14 up 1.00000 1.00000 15 hdd 0.54489 osd.15 down 1.00000 1.00000 16 hdd 0.54489 osd.16 up 1.00000 1.00000 17 hdd 0.54489 osd.17 up 1.00000 1.00000 -4 2.17957 host node142
6 hdd 0.54489 osd.6 up 1.00000 1.00000 9 hdd 0.54489 osd.9 up 1.00000 1.00000 10 hdd 0.54489 osd.10 up 1.00000 1.00000 11 hdd 0.54489 osd.11 up 1.00000 1.00000

##这个命令也行 [root@node140 /]# ceph osd find 11 { "osd": 11, "addrs": { "addrvec": [ { "type": "v2", "addr": "10.10.202.142:6820", "nonce": 24423 }, { "type": "v1", "addr": "10.10.202.142:6821", "nonce": 24423 } ] }, "osd_fsid": "1e977e5f-f514-4eef-bd88-c3632d03b2c3", "host": "node142", "crush_location": { "host": "node142", "root": "default" } }

(2)对应的问题osd 11 17 ,切换到该主机,停掉osd

[root@node142 ~]# systemctl stop ceph-osd@11

(3)将日志刷入磁盘

[root@node142 ~]# ceph-osd -i 15 --flush-journal

(4)启动osd

[root@node142 ~]# systemctl start ceph-osd@11

(5)修复pg

[root@node142 ~]# ceph pg repair 3.3e

###osd 17 也同样进行修复####

(6)查看状态

[root@node141 ~]# ceph health detail HEALTH_OK