Heartbeat+DRBD+NFS案例详解

原创

牛晓林 2012-10-18 18:36:45 博主文章分类：群集 ©著作权

文章标签 heartbeat nfs drbd 高可用性服务器集群 文章分类 服务器

©著作权归作者所有：来自51CTO博客作者牛晓林的原创作品，请联系作者获取转载授权，否则将追究法律责任

一：应用背景

本实验部署DRBD + HEARDBEAT + NFS 环境，建立一个高可用(HA)的文件服务器集群。在方案中，通过DRBD保证了服务器数据的完整性和一致性。DRBD类似于一个网络RAID-1功能。当你将数据写入本地文件系统时，数据还将会被发送到网络中另一台主机上，以相同的形式记录在一个另文件系统中。主节点与备节点的数据可以保证实时相互同步。当本地主服务器出现故障时，备份服务器上还会保留有一份相同的数据，可以继续使用。在高可用(HA)中使用DRBD功能，可以代替使用一个共享盘阵。因为数据同时存在于本地主服务器和备份服务器上。切换时，远程主机只要使用它上面的那份备份数据，就可以继续提供主服务器上相同的服务，并且client用户对主服务器的故障无感知。

二：拓扑图

三：配置步骤

Node 1

[root@node1 ~]# hostname

node1.a.com

[root@node1 ~]# vim /etc/hosts

# Do not remove the following line, or various programs

# that require network functionality will fail.

127.0.0.1 localhost.localdomain localhost

::1 localhost6.localdomain6 localhost6

192.168.1.3 node1.a.com

192.168.1.4 node2.a.com

安装drbd

[root@node1 ~]# rpm -ivh drbd83-8.3.8-1.el5.centos.i386.rpm

[root@node1 ~]# rpm -ivh kmod-drbd83-8.3.8-1.el5.centos.i686.rpm

加载模块

[root@node1 ~]# modprobe drbd

[root@node1 ~]# lsmod |grep drbd

drbd 228528 0

创建新分区

[root@node1 ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 2610.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): n

Command action

e extended

p primary partition (1-4)

Selected partition 4

First cylinder (1418-2610, default 1418):

Using default value 1418

Last cylinder or +size or +sizeM or +sizeK (1418-2610, default 2610):

Using default value 2610

Command (m for help): n

First cylinder (1418-2610, default 1418):

Using default value 1418

Last cylinder or +size or +sizeM or +sizeK (1418-2610, default 2610): +1g

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes

255 heads, 63 sectors/track, 2610 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 15 120456 83 Linux

/dev/sda2 16 1290 10241437+ 83 Linux

/dev/sda3 1291 1417 1020127+ 82 Linux swap / Solaris

/dev/sda4 1418 2610 9582772+ 5 Extended

/dev/sda5 1418 1540 987966 83 Linux

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: 设备或资源忙.

The kernel still uses the old table.

The new table will be used at the next reboot.

Syncing disks.

从新加载

[root@node1 ~]# partprobe /dev/sda

[root@node1 ~]# cat /proc/partitions

major minor #blocks name

8 0 20971520 sda

8 1 120456 sda1

8 2 10241437 sda2

8 3 1020127 sda3

8 4 0 sda4

8 5 987966 sda5

[root@node1 ~]# cd /usr/share/doc/drbd83-8.3.8/

[root@node1 drbd83-8.3.8]# ls

ChangeLog COPYING drbd.conf file.list README

[root@node1 drbd83-8.3.8]# cp drbd.conf /etc/

cp：是否覆盖“/etc/drbd.conf”? y

[root@node1 drbd83-8.3.8]# cd /etc/drbd.

drbd.conf drbd.d/

[root@node1 drbd83-8.3.8]# cd /etc/drbd.d/

[root@node1 drbd.d]# ls

global_common.conf

[root@node1 drbd.d]# cp -p global_common.conf global_common.conf.bak

[root@node1 drbd.d]# ll

总计 8

-rwxr-xr-x 1 root root 1418 2010-06-04 global_common.conf

-rwxr-xr-x 1 root root 1418 2010-06-04 global_common.conf.bak

[root@node1 drbd.d]# vim global_common.conf

global {

usage-count no;

# minor-count dialog-refresh disable-ip-verification

}

common {

protocol C;

startup {

wfc-timeout 120;

degr-wfc-timeout 120;

} disk {

on-io-error detach; fencing resource-only;

}

net {

cram-hmac-alg "sha1";

shared-secret "mydrbdlab";

}

syncer {

rate 100M;

}

[root@node1 drbd.d]# vim web.res

resource web {

on node1.a.com {

device /dev/drbd0;

disk /dev/sda5;

address 192.168.1.3:7789;

meta-disk internal;

}

on node2.a.com {

device /dev/drbd0;

disk /dev/sda5;

address 192.168.1.4:7789;

meta-disk internal;

}

初始化

[root@node1 drbd.d]# drbdadm create-md web

Writing meta data...

initializing activity log

NOT initialized bitmap

New drbd meta data block successfully created.

NODE 2

[root@node2 ~]# hostname

node2.a.com

[root@node2 ~]# vim /etc/hosts

# Do not remove the following line, or various programs

# that require network functionality will fail.

127.0.0.1 localhost.localdomain localhost

::1 localhost6.localdomain6 localhost6

192.168.1.3 node1.a.com

192.168.1.4 node2.a.com

[root@node2 ~]# rpm -ivh drbd83-8.3.8-1.el5.centos.i386.rpm

[root@node2 ~]# rpm -ivh kmod-drbd83-8.3.8-1.el5.centos.i686.rpm

加载模块

[root@node2 ~]# modprobe drbd

[root@node2 ~]# lsmod |grep drbd

drbd 228528 0

创建新分区

[root@node2 ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 2610.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes

255 heads, 63 sectors/track, 2610 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 15 120456 83 Linux

/dev/sda2 16 1290 10241437+ 83 Linux

/dev/sda3 1291 1417 1020127+ 82 Linux swap / Solaris

Command (m for help): n

Command action

e extended

p primary partition (1-4)

Selected partition 4

First cylinder (1418-2610, default 1418):

Using default value 1418

Last cylinder or +size or +sizeM or +sizeK (1418-2610, default 2610):

Using default value 2610

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes

255 heads, 63 sectors/track, 2610 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 15 120456 83 Linux

/dev/sda2 16 1290 10241437+ 83 Linux

/dev/sda3 1291 1417 1020127+ 82 Linux swap / Solaris

/dev/sda4 1418 2610 9582772+ 5 Extended

Command (m for help): n

First cylinder (1418-2610, default 1418):

Using default value 1418

Last cylinder or +size or +sizeM or +sizeK (1418-2610, default 2610): +1g

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes

255 heads, 63 sectors/track, 2610 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 15 120456 83 Linux

/dev/sda2 16 1290 10241437+ 83 Linux

/dev/sda3 1291 1417 1020127+ 82 Linux swap / Solaris

/dev/sda4 1418 2610 9582772+ 5 Extended

/dev/sda5 1418 1540 987966 83 Linux

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: 设备或资源忙.

The kernel still uses the old table.

The new table will be used at the next reboot.

Syncing disks.

从新加载

[root@node2 ~]# partprobe /dev/sda

[root@node2 ~]# cat /proc/partitions

major minor #blocks name

8 0 20971520 sda

8 1 120456 sda1

8 2 10241437 sda2

8 3 1020127 sda3

8 4 0 sda4

8 5 987966 sda5

复制配置到node2上

[root@node1 drbd.d]# scp * node2.123.com:/etc/drbd.d/

The authenticity of host 'node2.123.com (192.168.1.7)' can't be established.

RSA key fingerprint is c7:11:f0:b8:8b:33:ba:66:9f:c3:d7:a0:5c:67:f0:e1.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'node2.123.com,192.168.1.7' (RSA) to the list of known hosts.

root@node2.123.com's password:

global_common.conf 100% 506 0.5KB/s 00:00

global_common.conf.bak 100% 1418 1.4KB/s 00:00

web.res 100% 349

初始化

[root@node2 drbd.d]# drbdadm create-md web

Writing meta data...

initializing activity log

NOT initialized bitmap

New drbd meta data block successfully created.

启动节点1 ,节点2 服务

[root@node1 drbd.d]# service drbd start

Starting DRBD resources: [

web

Found valid meta data in the expected location, 1011671040 bytes into /dev/sda5.

d(web) s(web) n(web) ]outdated-wfc-timeout has to be shorter than degr-wfc-timeout

outdated-wfc-timeout implicitly set to degr-wfc-timeout (120s)

......

[root@node2 drbd.d]# service drbd start

Starting DRBD resources: [

web

Found valid meta data in the expected location, 1011671040 bytes into /dev/sda5.

d(web) s(web) n(web) ].

查看节点1

[root@node1 drbd.d]# cat /proc/drbd

version: 8.3.8 (api:88/proto:86-94)

GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16

0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----

ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:987896

查看节点2

[root@node2 drbd.d]# cat /proc/drbd

version: 8.3.8 (api:88/proto:86-94)

GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16

0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----

ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:987896

创建文件系统

[root@node1 ~]# mkfs -t ext3 -L drbdweb /dev/drbd0

[root@node1 ~]# mkdir /mnt/web

[root@node1 ~]# mount /dev/drbd0 /mnt/web/

[root@node1 ~]# mkfs -t ext3 -L drbdweb /dev/drbd0

[root@node1 ~]# mkdir /mnt/web

[root@node1 ~]# mount /dev/drbd0 /mnt/web/

设定节点1为主节点

[root@node1 drbd.d]# drbdadm -- --overwrite-data-of-peer primary web

查看

[root@node1 drbd.d]# cat /proc/drbd

version: 8.3.8 (api:88/proto:86-94)

GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16

0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----

ns:987896 nr:0 dw:0 dr:987896 al:0 bm:61 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

查看节点2

[root@node2 drbd.d]# cat /proc/drbd

version: 8.3.8 (api:88/proto:86-94)

GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16

0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----

ns:0 nr:987896 dw:987896 dr:0 al:0 bm:61 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

nfs配置

[root@node1 ~]# vim /etc/exports

/mnt/web *(rw,sync,insecure,no_root_squash,no_wdelay)

[root@node1 ~]# service portmap start

启动 portmap： [确定]

[root@node1 ~]# chkconfig portmap on

[root@node1 ~]# service nfs start

启动 NFS 服务： [确定]

关掉 NFS 配额： [确定]

启动 NFS 守护进程： [确定]

启动 NFS mountd： [确定]

[root@node1 ~]# chkconfig nfs on

[root@node1 ~]# vim /etc/init.d/nfs

122 killproc nfsd –9

安装heartbeat

[root@node1 ~]# yum localinstall -y heartbeat-2.1.4-9.el5.i386.rpm heartbeat-pils-2.1.4-10.el5.i386.rpm heartbeat-stonith-2.1.4-10.el5.i386.rpm libnet-1.1.4-3.el5.i386.rpm perl-MailTools-1.77-1.el5.noarch.rpm –nogpgcheck

[root@node2 ~]# yum localinstall -y heartbeat-2.1.4-9.el5.i386.rpm heartbeat-pils-2.1.4-10.el5.i386.rpm heartbeat-stonith-2.1.4-10.el5.i386.rpm libnet-1.1.4-3.el5.i386.rpm perl-MailTools-1.77-1.el5.noarch.rpm --nogpgcheck

拷贝配置文档

[root@node1 ~]# cd /usr/share/doc/heartbeat-2.1.4/

[root@node1 heartbeat-2.1.4]# cp authkeys ha.cf haresources /etc/ha.d/

[root@node1 heartbeat-2.1.4]# cd /etc/ha.d/

[root@node1 ha.d]# vim ha.cf

24 debugfile /var/log/ha-debug

29 logfile /var/log/ha-log

34 logfacility local0

48 keepalive 2

56 deadtime 10

76 udpport 694

122 bcast eth0

ucast eth0 192.168.1.3

158 auto_failback off

214 node node1.a.com

215 node node2.a.com

[root@node1 ha.d]# vim haresources

45 node1.a.com IPaddr::192.168.1.1/24/eth0/192.168.1.255 drbddisk::nfs Filesystem: :/dev/drbd0::/mnt/web::ext3 killnfsd

[root@node1 ha.d]# echo "killall -9 nfsd ; /etc/init.d/nfs restart ; exit 0" >>resource.d/killnfsd

修改权限

[root@node1 ha.d]# chmod 600 /etc/ha.d/authkeys

[root@node1 ha.d]# chmod 755 /etc/ha.d/resource.d/killnfsd

拷贝配置文件到node2.123.com

[root@node1 ha.d]# scp ha.cf authkeys haresources node2.123.com:/etc/ha.d/

root@node2.123.com's password:

ha.cf 100% 10KB 10.4KB/s 00:00

authkeys 100% 659 0.6KB/s 00:00

haresources 100% 6009 5.9KB/s 00:00

[root@node1 ha.d]# scp resource.d/killnfsd node2.123.com:/etc/ha.d/resource.d/

root@node2.123.com's password:

killnfsd 100% 51 0.1KB/s 00:00

[root@node2 ha.d]# vim ha.cf

ucast eth0 192.168.1.4

重启服务

[root@node2 ~]# service heartbeat restart

Stopping High-Availability services:

[确定]

Waiting to allow resource takeover to complete:

[确定]

Starting High-Availability services:

2012/10/17_16:41:28 INFO: Resource is stopped

[确定]

四：测试：

查看节点1

[root@node1 ~]# drbd-overview

0:web Connected Primary/Secondary UpToDate/UpToDate C r---- /mnt/web ext3 950M 18M 885M 2%

[root@node2 ~]# drbd-overview

0:web Connected Secondary/Primary UpToDate/UpToDate C r----

[root@node1 ~]# ifconfig

开启客户端测试

挂载并查看

编写脚本

[root@localhost ~]# vim /mnt/test.sh

while true

echo --\> trying touch x:`date`

touch x

echo \<-----done touch x:`date`

echo

sleep 2

done

执行脚本

[root@localhost ~]# cd /mnt/1

[root@localhost 1]# bash /mnt/test.sh

--> trying touch x:2012年 10月 18日星期四 17:48:27 CST

<-----done touch x:2012年 10月 18日星期四 17:48:27 CST

--> trying touch x:2012年 10月 18日星期四 17:48:29 CST

<-----done touch x:2012年 10月 18日星期四 17:48:29 CST

--> trying touch x:2012年 10月 18日星期四 17:48:31 CST

<-----done touch x:2012年 10月 18日星期四 17:48:31 CST

--> trying touch x:2012年 10月 18日星期四 17:48:33 CST

<-----done touch x:2012年 10月 18日星期四 17:48:33 CST

在服务器节点node1上关闭heartbeat服务

[root@node1 ~]# service heartbeat stop

Stopping High-Availability services:

[确定]

在客户端会发现丢弃现象，之后文件系统又恢复正常

上一篇：Heartbeat高可用性群集

下一篇：corosync/openais+pacemaker实现的www的高可用性群集

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯