继续之前的操作,在drbd部署完成之后,将drbd和heartbeat结合起来,实现drbd服务的高可用,并在主节点完成自动挂载,且能够做到故障自动切换。
按照之前的部署,只需要修改heartbeat中的资源,也即修改/etc/init.d/haresources文件的内容。
1、准备工作
注意:在配置drbd高可用之前,需要保证drbd服务是启动的,而且要实现两端都是secondary的状态,如下:
[root@heartbeat01 ~]# cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
所以,需要在两个drbd节点上都把drbd设置为开机自动启动。
/etc/init.d/drbd start chkconfig drbd on
在上述工作完成之后,修改haresources文件,内容如下所示:
[root@heartbeat01 ~]# tail -1 /etc/ha.d/haresources
heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1 drbddisk::test Filesystem::/dev/drbd0::/data::ext4
#这里以heartbeat01为例,heartbeat02的配置和heartbeat01保持一致
2、启动heartbeat
然后,两个节点同时启动heartbeat服务,
/etc/init.d/heartbeat start
3、观察两个节点的服务
1)下面是节点1(heartbeat01)上的状态:
[root@heartbeat01 ~]# ip a |grep 49.100
inet 172.16.49.100/24 brd 172.16.49.255 scope global secondary eth1
可以看到,节点1(heartbeat01)已经获取了VIP。
[root@heartbeat01 ~]# cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:4 nr:0 dw:4 dr:709 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
而且,heartbeat01是drbd中的Primary节点。
[root@heartbeat01 ~]# mount
/dev/mapper/VolGroup-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda1 on /boot type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/dev/drbd0 on /data type ext4 (rw)
heartbeat01已经自动挂载/dev/drbd0到/data下。
[root@heartbeat01 ~]# ls /data
10.txt 1.txt 29.txt 38.txt 47.txt 56.txt 65.txt 74.txt 83.txt 92.txt
11.txt 20.txt 2.txt 39.txt 48.txt 57.txt 66.txt 75.txt 84.txt 93.txt
12.txt 21.txt 30.txt 3.txt 49.txt 58.txt 67.txt 76.txt 85.txt 94.txt
13.txt 22.txt 31.txt 40.txt 4.txt 59.txt 68.txt 77.txt 86.txt 95.txt
14.txt 23.txt 32.txt 41.txt 50.txt 5.txt 69.txt 78.txt 87.txt 96.txt
15.txt 24.txt 33.txt 42.txt 51.txt 60.txt 6.txt 79.txt 88.txt 97.txt
16.txt 25.txt 34.txt 43.txt 52.txt 61.txt 70.txt 7.txt 89.txt 98.txt
17.txt 26.txt 35.txt 44.txt 53.txt 62.txt 71.txt 80.txt 8.txt 99.txt
18.txt 27.txt 36.txt 45.txt 54.txt 63.txt 72.txt 81.txt 90.txt 9.txt
19.txt 28.txt 37.txt 46.txt 55.txt 64.txt 73.txt 82.txt 91.txt lost+found
同时,之前drbd同步的文件也都在。
2)下面是节点1(heartbeat01)上的状态:
[root@heartbeat02 ~]# ip a |grep 49.100
节点2上没有VIP。
[root@heartbeat02 ~]# cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
ns:0 nr:4 dw:4 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
节点2(heartbeat02)在drbd中是secondary状态。
[root@heartbeat02 ~]# mount -n
/dev/mapper/VolGroup-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda1 on /boot type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
同时,heartbeat02也没有挂载/dev/drbd0。
[root@heartbeat02 ~]# ll /data
total 0
当然,/data下面什么都没有。
4、模拟故障切换场景
下面将heartbeat01的heartbeat服务停掉,查看drbd能否自动挂载到heartbeat02上。
[root@heartbeat01 ~]# /etc/init.d/heartbeat stop
Stopping High-Availability services: Done.
1)下面是节点1(heartbeat01)上的状态:
[root@heartbeat01 ~]# ip a|grep 49.100
[root@heartbeat01 ~]# cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
ns:16 nr:4 dw:20 dr:1418 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root@heartbeat01 ~]# ll /data
total 0
2)下面是节点2(heartbeat02)上的状态:
[root@heartbeat02 ~]# ip a |grep 49.100
inet 172.16.49.100/24 brd 172.16.49.255 scope global secondary eth1
[root@heartbeat02 ~]# cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:4 nr:16 dw:20 dr:705 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root@heartbeat02 ~]# ls /data
10.txt 1.txt 29.txt 38.txt 47.txt 56.txt 65.txt 74.txt 83.txt 92.txt
11.txt 20.txt 2.txt 39.txt 48.txt 57.txt 66.txt 75.txt 84.txt 93.txt
12.txt 21.txt 30.txt 3.txt 49.txt 58.txt 67.txt 76.txt 85.txt 94.txt
13.txt 22.txt 31.txt 40.txt 4.txt 59.txt 68.txt 77.txt 86.txt 95.txt
14.txt 23.txt 32.txt 41.txt 50.txt 5.txt 69.txt 78.txt 87.txt 96.txt
15.txt 24.txt 33.txt 42.txt 51.txt 60.txt 6.txt 79.txt 88.txt 97.txt
16.txt 25.txt 34.txt 43.txt 52.txt 61.txt 70.txt 7.txt 89.txt 98.txt
17.txt 26.txt 35.txt 44.txt 53.txt 62.txt 71.txt 80.txt 8.txt 99.txt
18.txt 27.txt 36.txt 45.txt 54.txt 63.txt 72.txt 81.txt 90.txt 9.txt
19.txt 28.txt 37.txt 46.txt 55.txt 64.txt 73.txt 82.txt 91.txt lost+found
3)检查一下heartbeat02上的日志
Sep 26 00:32:04 heartbeat02.contoso.com heartbeat: [4084]: info: Received shutdown notice from 'heartbeat01.contoso.com'.
Sep 26 00:32:04 heartbeat02.contoso.com heartbeat: [4084]: info: Resources being acquired from heartbeat01.contoso.com.
Sep 26 00:32:04 heartbeat02.contoso.com heartbeat: [4150]: info: acquire local HA resources (standby).
Sep 26 00:32:04 heartbeat02.contoso.com heartbeat: [4150]: info: local HA resource acquisition completed (standby).
Sep 26 00:32:04 heartbeat02.contoso.com heartbeat: [4084]: info: Standby resource acquisition done [all].
Sep 26 00:32:04 heartbeat02.contoso.com heartbeat: [4151]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys heartbeat02.contoso.com] to acquire.
harc(default)[4176]: 2016/09/26_00:32:04 info: Running /etc/ha.d//rc.d/status status
mach_down(default)[4193]: 2016/09/26_00:32:04 info: Taking over resource group IPaddr::172.16.49.100/24/eth1
ResourceManager(default)[4220]: 2016/09/26_00:32:04 info: Acquiring resource group: heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1 drbddisk::test Filesystem::/dev/drbd0::/data::ext4
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100)[4248]: 2016/09/26_00:32:04 INFO: Resource is stopped
ResourceManager(default)[4220]: 2016/09/26_00:32:04 info: Running /etc/ha.d/resource.d/IPaddr 172.16.49.100/24/eth1 start
IPaddr(IPaddr_172.16.49.100)[4373]: 2016/09/26_00:32:04 INFO: Adding inet address 172.16.49.100/24 with broadcast address 172.16.49.255 to device eth1
IPaddr(IPaddr_172.16.49.100)[4373]: 2016/09/26_00:32:04 INFO: Bringing device eth1 up
IPaddr(IPaddr_172.16.49.100)[4373]: 2016/09/26_00:32:04 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.16.49.100 eth1 172.16.49.100 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100)[4347]: 2016/09/26_00:32:04 INFO: Success
ResourceManager(default)[4220]: 2016/09/26_00:32:04 info: Running /etc/ha.d/resource.d/drbddisk test start
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[4505]: 2016/09/26_00:32:04 INFO: Resource is stopped
ResourceManager(default)[4220]: 2016/09/26_00:32:04 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext4 start
Filesystem(Filesystem_/dev/drbd0)[4595]: 2016/09/26_00:32:04 INFO: Running start for /dev/drbd0 on /data
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[4587]: 2016/09/26_00:32:04 INFO: Success
mach_down(default)[4193]: 2016/09/26_00:32:04 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[4193]: 2016/09/26_00:32:04 info: mach_down takeover complete for node heartbeat01.contoso.com.
Sep 26 00:32:04 heartbeat02.contoso.com heartbeat: [4084]: info: mach_down takeover complete.
Sep 26 00:32:36 heartbeat02.contoso.com heartbeat: [4084]: WARN: node heartbeat01.contoso.com: is dead
Sep 26 00:32:36 heartbeat02.contoso.com heartbeat: [4084]: info: Dead node heartbeat01.contoso.com gave up resources.
Sep 26 00:32:36 heartbeat02.contoso.com heartbeat: [4084]: info: Link heartbeat01.contoso.com:eth1 dead.
Sep 26 00:32:36 heartbeat02.contoso.com ipfail: [4110]: info: Status update: Node heartbeat01.contoso.com now has status dead
Sep 26 00:32:38 heartbeat02.contoso.com ipfail: [4110]: info: NS: We are dead. :<
Sep 26 00:32:38 heartbeat02.contoso.com ipfail: [4110]: info: Link Status update: Link heartbeat01.contoso.com/eth1 now has status dead
Sep 26 00:32:39 heartbeat02.contoso.com ipfail: [4110]: info: We are dead. :<
Sep 26 00:32:39 heartbeat02.contoso.com ipfail: [4110]: info: Asking other side for ping node count.