DRBD+HeartBeat+NFS 架构

原创

缥缈孤鸿一pmhong 2013-07-31 14:10:16 博主文章分类：其他 ©著作权

©著作权归作者所有：来自51CTO博客作者缥缈孤鸿一pmhong的原创作品，请联系作者获取转载授权，否则将追究法律责任

昨天做了单独的DRBD，还存在一些疑问，今天通过这个DRBD+HeartBeat+NFS架构，并可以清楚的解答昨天的疑问了。实际上，DRBD只是作为磁盘冗余，而并不是像我之前理解的那样，同时提供2份一模一样的数据，它需要借助HeartBeat 虚拟出一个虚拟IP，并通过NFS挂载，才能完成。

【实验环境】

virtual box

centos 6.4 最小化

【实验拓扑图】

【实验步骤】

1、配置主机名

[root@localhost ~]# vim /etc/sysconfig/network

HOSTNAME=node1

[root@localhost ~]# vim /etc/hosts

192.168.56.120 node1

192.168.56.121 node2

[root@localhost ~]# hostname node1

===============================

[root@localhost ~]# vim /etc/sysconfig/network

HOSTNAME=node2

[root@localhost ~]# vim /etc/hosts

192.168.56.120 node1

192.168.56.121 node2

[root@localhost ~]# hostname node2

2、配置DRBD

参照前面的配置过程

[root@node1 ~]# cat /proc/drbd

version: 8.4.3 (api:1/proto:86-101)

GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@node1, 2013-07-29 08:48:43

1: cs:Connected ro:Primary/Secondaryds:UpToDate/UpToDate C r-----

ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

确保两个节点为一主一从即可。

3、安装配置NFS（node2的配置与node1一致，可以通过scp命令将相关配置文件拷贝到node2上即可）

[root@node1 ~]# yum -y install nfs nfs-utils rpcbind openssh-clients

[root@node1 ~]# vim /etc/exports

/db *(rw,sync,insecure,no_root_squash,no_wdelay)

修改nfs启动脚本

[root@node1 ~]# vim /etc/init.d/nfs

找到 killproc nfsd –2 相关行，并改为：killproc nfsd –9

启动nfs

[root@node1 ~]# service rpcbind start

Starting rpcbind: [ OK ]

[root@node1 ~]# service nfs start

Starting NFS services: [ OK ]

Starting NFS mountd: [ OK ]

Starting NFS daemon: [ OK ]

[root@node1 ~]# scp /etc/exports /etc/init.d/nfs 192.168.56.121:/etc/exports

4、安装配置HeartBeat（node2配置同node1）

[root@node1 ~]# yum -y install heartbeat heartbeat-pils heartbeat-stonith libnet perl-MailTools

注：有些版本可能并没有heartbeat相关的rpm包，如果没有，可以下载源码包编译安装，或者通过配置epel软件仓库在线yum安装

[root@node1 ~]# wget http://mirrors.hust.edu.cn/epel/6/i386/epel-release-6-8.noarch.rpm

[root@node1 ~]# rpm -ivh epel-release-6-8.noarch.rpm

[root@node1 ~]# yum clean all

[root@node1 ~]# yum list

重新执行yum命令安装heartbeat

[root@node1 ~]# yum -y install heartbeat heartbeat-pils heartbeat-stonith libnet perl-MailTools

确认已经安装heartbeat

[root@node1 ~]# rpm -qa |grep heartbeat

heartbeat-libs-3.0.4-1.el6.i686

heartbeat-3.0.4-1.el6.i686

复制模版文件

[root@node1 ~]# cd /usr/share/doc/heartbeat-3.0.4/

[root@node1 heartbeat-3.0.4]# cp authkeys ha.cf haresources /etc/ha.d/

配置heartbeat主配置文件

[root@node1 heartbeat-3.0.4]# cd /etc/ha.d/

[root@node1 ha.d]# vim ha.cf

找到下面相关内容，去掉前面的#注释，并修改为如下内容：

debugfile /var/log/ha-debug //打开错误日志报告

logfile /var/log/ha-log // heartbeat 日志

logfacility local0

keepalive 2 //两秒检测一次心跳线连接

deadtime 10 //10 秒测试不到主服务器心跳线为有问题出现

warntime 6 //警告时间（最好在 2 ～ 10 之间）

#initdead 120 //初始化启动时 120 秒无连接视为正常，或指定heartbeat。在启动时，需要等待120秒才去启动任何资源（最少应该设置为deadtime的两倍）

udpport 694 //使用udp 694端口

bcast eth0 //广播方式

#ucast eth0 192.168.1.20 //单播方式连接（主从都写对方的 ip 进行连接）

auto_failback off //自动切换（主服恢复后可自动切换回来）这个不要开启

node node1 //声明主服务器

node node2 //声明备服务器

修改资源文件

[root@node1 ha.d]# vim haresources

node1 IPaddr::192.168.56.130/24/eth0 drbddisk::r0 Filesystem::/dev/drbd1::/db::ex

t3 killnfsd

注意：这里的192.168.56.130 便是VIP

修改node2资源文件的时候节点名字应改为node2（node2与node1配置唯一一处不同）即：

[root@node2 ha.d]# vim haresources

node2IPaddr::192.168.56.130/24/eth0 drbddisk::r0 Filesystem::/dev/drbd1::/db::ex

t3 killnfsd

[root@node1 ha.d]# vim authkeys

auth 1

1 crc

[root@node1 ha.d]# vim resource.d/killnfsd //加入以下内容

killall -9 nfsd ; /etc/init.d/nfs restart ; exit 0

修改相关文件的权限

[root@node1 ha.d]# chmod 600 /etc/ha.d/authkeys

[root@node1 ha.d]# chmod 755 /etc/ha.d/resource.d/killnfsd

[root@node1 ha.d]# service heartbeat start

Starting High-Availability services: INFO: Resource is stopped

Done.

node2 上的配置与node1的配置基本上完全一样，可以从node1上直接拷贝过去，稍作修改即可

[root@node1 ha.d]# scp ha.cf authkeys haresources 192.168.56.121:/etc/ha.d/

[root@node1 ha.d]# scp resource.d/killnfsd 192.168.56.121:/etc/ha.d/resource.d

启动node1和node2的heartbeat

[root@node1 ha.d]# service heartbeat start

两个节点都启动heartbeat后，可以看到VIP在node1上：

5、测试

在客户端上测试：

[root@localhost ~]# yum install nfs nfs-utils

确保客户端能ping通VIP: 192.168.56.130

注意：我做实验的时候，发现ping这个Vip 时通时不通，而且heartbeat 频繁切换，但是node1与node2之间网络是通的，于是查看日志，发现有一个错误：

ResourceManager(default)[4917]: 2013/07/29_16:27:13 ERROR: Cannot locate resource script drbddisk

ResourceManager(default)[4917]: 2013/07/29_16:27:14 info: Retrying failed stop operation [drbddisk::r0]

仔细一看，发现/etc/ha.d/resource.d/ 目录下居然没有drbddisk这个脚本...

[root@node1 ha.d]# vim /etc/ha.d/resource.d/drbddisk

这里顺便附上drbddisk脚本：　http://down.51cto.com/data/892360

[root@localhost ~]# ping 192.168.56.130

PING 192.168.56.130 (192.168.56.130) 56(84) bytes of data.

64 bytes from 192.168.56.130: icmp_seq=1 ttl=64 time=2.37 ms

64 bytes from 192.168.56.130: icmp_seq=2 ttl=64 time=0.394 ms

64 bytes from 192.168.56.130: icmp_seq=3 ttl=64 time=0.342 ms

在客户端挂载nfs

[root@localhost ~]# mount -t nfs 192.168.56.130:/db /mnt/nfs/

[root@localhost ~]# mount

/dev/mapper/VolGroup-lv_root on / type ext4 (rw)

proc on /proc type proc (rw)

sysfs on /sys type sysfs (rw)

devpts on /dev/pts type devpts (rw,gid=5,mode=620)

tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")

/dev/sda1 on /boot type ext4 (rw)

none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

192.168.56.130:/db on /mnt/nfs type nfs (rw,vers=4,addr=192.168.56.130,clientaddr=192.168.56.113)

[root@localhost ~]# ls /mnt/nfs/

hello.txt hw.txt lost+found

可以看到现在nfs可以正常读写了。

在客户端看来，它只知道192.168.56.130 提供了nfs 网络共享目录，并不知道实际上，nfs是跑在node1和node2上，如果node1 宕机，node2可以通过heartbeat很快接管服务，而对于用户体验来说，无需做其他更改。

下面模拟node1宕机的情况：

通过一个脚本对nfs 不断的写入数据，并从另外一个终端通过tailf 查看，与此同时，停止node1的服务

这里，网上很多资料都是用这个脚本，通过这个脚本不断的touch x ，我觉得应该有问题。但不是脚本本身问题。
while true
do
echo ---\> trying touch x:`date`
touch x
echo \<-----done touch x:`date`
echo
sleep 1
done

我来分析一下：

网上的资料很多这样的：

[root@localhost ~]# mount 192.168.10.1:/mnt/web /mnt/nfs # 挂载nfs目录

[root@localhost ~]# vim /mnt/test.sh #创建上述脚本

[root@localhost ~]# bash /mnt/test.sh # 执行脚本

执行完上面的操作后，就说看到类似于下面的输出，并且停掉node1的heartbeat后并没有中断，就说实验成功了。

---> trying touch x:2012年 07月 25日星期三 05:29:40 CST
<-----done touch x:2012年 07月 25日星期三 05:29:40 CST

实际上，从上面可以看到，挂载的nfs目录是/mnt/nfs这个目录，但是脚本的目录却是/mnt/test.sh，也就是说脚本并不是在nfs目录下，那么执行脚本的时候，毫无疑问，会在工作目录下touch x，就是说是在/mnt 目录touch x 而不是/mnt/nfs这个目录，那么实际上就并没对挂载的nfs 目录进行读写，只是一味的对不相干的/mnt这个目录进行写入操作，在停掉node1上的heartbeat之后，当然不会有任何影响。

我测试是这样做的：

[root@localhost ~]# vim test.sh

#!/bin/bash

while true

echo "`date`" >> /mnt/nfs/date

sleep 1

done

[root@localhost ~]# chmod +x test.sh

[root@localhost ~]# ./test.sh

停掉node1的heartbeat

[root@node1 ha.d]# service heartbeat stop

打开客户端的另一个终端，通过tailf观察写入情况