一:应用背景
本实验部署DRBD + HEARDBEAT + NFS 环境,建立一个高可用(HA)的文件服务器集群。在方案中,通过DRBD保证了服务器数据的完整性和一致性。DRBD类似于一个网络RAID-1功能。当你将数据写入本地文件系统时,数据还将会被发送到网络中另一台主机上,以相同的形式记录在一个另文件系统中。主节点与备节点的数据可以保证实时相互同步。当本地主服务器出现故障时,备份服务器上还会保留有一份相同的数据,可以继续使用。在高可用(HA)中使用DRBD功能,可以代替使用一个共享盘阵。因为数据同时存在于本地主服务器和备份服务器上。切换时,远程主机只要使用它上面的那份备份数据,就可以继续提供主服务器上相同的服务,并且client用户对主服务器的故障无感知。
二:拓扑图

 

Heartbeat+DRBD+NFS案例详解_高可用性 

 

三:配置步骤
Node 1
[root@node1 ~]# hostname
node1.a.com
[root@node1 ~]# vim /etc/hosts
 
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.1.3 node1.a.com
192.168.1.4 node2.a.com
安装drbd
[root@node1 ~]# rpm -ivh drbd83-8.3.8-1.el5.centos.i386.rpm
[root@node1 ~]# rpm -ivh kmod-drbd83-8.3.8-1.el5.centos.i686.rpm
加载模块
[root@node1 ~]# modprobe drbd
[root@node1 ~]# lsmod |grep drbd
drbd                  228528 0
创建新分区
[root@node1 ~]# fdisk /dev/sda
The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
 
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
e
Selected partition 4
First cylinder (1418-2610, default 1418):
Using default value 1418
Last cylinder or +size or +sizeM or +sizeK (1418-2610, default 2610):
Using default value 2610
 
Command (m for help): n
First cylinder (1418-2610, default 1418):
Using default value 1418
Last cylinder or +size or +sizeM or +sizeK (1418-2610, default 2610): +1g
 
Command (m for help): p
 
Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
 
   Device Boot      Start         End      Blocks   Id System
/dev/sda1   *           1          15      120456   83 Linux
/dev/sda2              16        1290    10241437+ 83 Linux
/dev/sda3            1291        1417     1020127+ 82 Linux swap / Solaris
/dev/sda4            1418        2610     9582772+   5 Extended
/dev/sda5            1418        1540      987966   83 Linux
 
Command (m for help): w
The partition table has been altered!
 
Calling ioctl() to re-read partition table.
 
WARNING: Re-reading the partition table failed with error 16: 设备或资源忙.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.
从新加载
[root@node1 ~]# partprobe /dev/sda
[root@node1 ~]# cat /proc/partitions
major minor #blocks  name
 
   8     0   20971520 sda
   8     1     120456 sda1
   8     2   10241437 sda2
   8     3    1020127 sda3
   8     4          0 sda4
   8     5     987966 sda5
 
[root@node1 ~]# cd /usr/share/doc/drbd83-8.3.8/
[root@node1 drbd83-8.3.8]# ls
ChangeLog COPYING drbd.conf file.list README
[root@node1 drbd83-8.3.8]# cp drbd.conf /etc/
cp:是否覆盖“/etc/drbd.conf”? y
[root@node1 drbd83-8.3.8]# cd /etc/drbd.
drbd.conf drbd.d/   
[root@node1 drbd83-8.3.8]# cd /etc/drbd.d/
[root@node1 drbd.d]# ls
global_common.conf
[root@node1 drbd.d]# cp -p global_common.conf global_common.conf.bak
[root@node1 drbd.d]# ll
总计 8
-rwxr-xr-x 1 root root 1418 2010-06-04 global_common.conf
-rwxr-xr-x 1 root root 1418 2010-06-04 global_common.conf.bak
[root@node1 drbd.d]# vim global_common.conf
global {
        usage-count no;
        # minor-count dialog-refresh disable-ip-verification
}
 
common {
        protocol C;
 
        startup {
                wfc-timeout 120;
                degr-wfc-timeout 120;
         }        disk {
                  on-io-error detach;                  fencing resource-only;
 
          }
        net {
                cram-hmac-alg "sha1";
                shared-secret "mydrbdlab";
         }
        syncer {
                  rate 100M;
         }
 
}
 
[root@node1 drbd.d]# vim web.res
 
resource web {
        on node1.a.com {
        device   /dev/drbd0;
        disk    /dev/sda5;
        address 192.168.1.3:7789;
        meta-disk       internal;
        }
 
        on node2.a.com {
        device   /dev/drbd0;
        disk    /dev/sda5;
        address 192.168.1.4:7789;
        meta-disk       internal;
        }
}
 
初始化
[root@node1 drbd.d]# drbdadm   create-md web
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
NODE 2
 
[root@node2 ~]# hostname
node2.a.com
[root@node2 ~]# vim /etc/hosts
 
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.1.3 node1.a.com
192.168.1.4 node2.a.com
 
[root@node2 ~]# rpm -ivh drbd83-8.3.8-1.el5.centos.i386.rpm
[root@node2 ~]# rpm -ivh kmod-drbd83-8.3.8-1.el5.centos.i686.rpm
加载模块
[root@node2 ~]# modprobe drbd
[root@node2 ~]# lsmod |grep drbd
drbd                  228528 0
创建新分区
[root@node2 ~]# fdisk /dev/sda
 
The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
 
Command (m for help): p
 
Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
 
   Device Boot      Start         End      Blocks   Id System
/dev/sda1   *           1          15      120456   83 Linux
/dev/sda2              16        1290    10241437+ 83 Linux
/dev/sda3            1291        1417     1020127+ 82 Linux swap / Solaris
 
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
e
Selected partition 4
First cylinder (1418-2610, default 1418):
Using default value 1418
Last cylinder or +size or +sizeM or +sizeK (1418-2610, default 2610):
Using default value 2610
 
Command (m for help): p
 
Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
 
   Device Boot      Start         End      Blocks   Id System
/dev/sda1   *           1          15      120456   83 Linux
/dev/sda2              16        1290    10241437+ 83 Linux
/dev/sda3            1291        1417     1020127+ 82 Linux swap / Solaris
/dev/sda4            1418        2610     9582772+   5 Extended
 
Command (m for help): n
First cylinder (1418-2610, default 1418):
Using default value 1418
Last cylinder or +size or +sizeM or +sizeK (1418-2610, default 2610): +1g
 
Command (m for help): p
 
Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
 
   Device Boot      Start         End      Blocks   Id System
/dev/sda1   *           1          15      120456   83 Linux
/dev/sda2              16        1290    10241437+ 83 Linux
/dev/sda3            1291        1417     1020127+ 82 Linux swap / Solaris
/dev/sda4            1418        2610     9582772+   5 Extended
/dev/sda5            1418        1540      987966   83 Linux
 
Command (m for help): w
The partition table has been altered!
 
Calling ioctl() to re-read partition table.
 
WARNING: Re-reading the partition table failed with error 16: 设备或资源忙.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.
从新加载
[root@node2 ~]# partprobe /dev/sda
[root@node2 ~]# cat /proc/partitions
major minor #blocks name
 
   8     0   20971520 sda
   8     1     120456 sda1
   8     2   10241437 sda2
   8     3    1020127 sda3
   8     4          0 sda4
   8     5     987966 sda5
复制配置到node2上
[root@node1 drbd.d]# scp * node2.123.com:/etc/drbd.d/
The authenticity of host 'node2.123.com (192.168.1.7)' can't be established.
RSA key fingerprint is c7:11:f0:b8:8b:33:ba:66:9f:c3:d7:a0:5c:67:f0:e1.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2.123.com,192.168.1.7' (RSA) to the list of known hosts.
root@node2.123.com's password:
global_common.conf                                                       100% 506     0.5KB/s   00:00   
global_common.conf.bak                                                   100% 1418     1.4KB/s  00:00   
web.res                                                                  100% 349  
初始化
[root@node2 drbd.d]# drbdadm   create-md web
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
 
 
启动节点1 ,节点2 服务
[root@node1 drbd.d]# service drbd start
Starting DRBD resources: [
web
Found valid meta data in the expected location, 1011671040 bytes into /dev/sda5.
d(web) s(web) n(web) ]outdated-wfc-timeout has to be shorter than degr-wfc-timeout
outdated-wfc-timeout implicitly set to degr-wfc-timeout (120s)
......
[root@node2 drbd.d]# service drbd start
Starting DRBD resources: [
web
Found valid meta data in the expected location, 1011671040 bytes into /dev/sda5.
d(web) s(web) n(web) ].
 
查看节点1
[root@node1 drbd.d]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:987896
查看节点2
[root@node2 drbd.d]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:987896
创建文件系统
[root@node1 ~]# mkfs -t ext3 -L drbdweb /dev/drbd0
[root@node1 ~]# mkdir /mnt/web
[root@node1 ~]# mount /dev/drbd0 /mnt/web/
[root@node1 ~]# mkfs -t ext3 -L drbdweb /dev/drbd0
[root@node1 ~]# mkdir /mnt/web
[root@node1 ~]# mount /dev/drbd0 /mnt/web/
设定节点1为主节点
[root@node1 drbd.d]# drbdadm   -- --overwrite-data-of-peer primary web
查看
[root@node1 drbd.d]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:987896 nr:0 dw:0 dr:987896 al:0 bm:61 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
查看节点2
[root@node2 drbd.d]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:987896 dw:987896 dr:0 al:0 bm:61 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 
nfs配置
[root@node1 ~]# vim /etc/exports
/mnt/web *(rw,sync,insecure,no_root_squash,no_wdelay)
[root@node1 ~]# service portmap start
启动 portmap:                                             [确定]
[root@node1 ~]# chkconfig portmap on
[root@node1 ~]# service nfs start
启动 NFS 服务:                                            [确定]
关掉 NFS 配额:                                            [确定]
启动 NFS 守护进程:                                        [确定]
启动 NFS mountd:                                          [确定]
[root@node1 ~]# chkconfig nfs on
[root@node1 ~]# vim /etc/init.d/nfs
122         killproc nfsd –9
 
安装heartbeat
[root@node1 ~]# yum localinstall -y heartbeat-2.1.4-9.el5.i386.rpm heartbeat-pils-2.1.4-10.el5.i386.rpm heartbeat-stonith-2.1.4-10.el5.i386.rpm libnet-1.1.4-3.el5.i386.rpm perl-MailTools-1.77-1.el5.noarch.rpm –nogpgcheck
[root@node2 ~]# yum localinstall -y heartbeat-2.1.4-9.el5.i386.rpm heartbeat-pils-2.1.4-10.el5.i386.rpm heartbeat-stonith-2.1.4-10.el5.i386.rpm libnet-1.1.4-3.el5.i386.rpm perl-MailTools-1.77-1.el5.noarch.rpm --nogpgcheck
拷贝配置文档
[root@node1 ~]# cd /usr/share/doc/heartbeat-2.1.4/
[root@node1 heartbeat-2.1.4]# cp authkeys ha.cf haresources /etc/ha.d/
[root@node1 heartbeat-2.1.4]# cd /etc/ha.d/
[root@node1 ha.d]# vim ha.cf
24 debugfile /var/log/ha-debug
29 logfile /var/log/ha-log
34 logfacility     local0
48 keepalive 2
56 deadtime 10
76 udpport 694
122 bcast eth0
ucast eth0 192.168.1.3
158 auto_failback off
214 node    node1.a.com
215 node    node2.a.com
[root@node1 ha.d]# vim haresources
45 node1.a.com IPaddr::192.168.1.1/24/eth0/192.168.1.255 drbddisk::nfs Filesystem:   :/dev/drbd0::/mnt/web::ext3 killnfsd
 
[root@node1 ha.d]# echo "killall -9 nfsd ; /etc/init.d/nfs restart ; exit 0" >>resource.d/killnfsd
修改权限
[root@node1 ha.d]# chmod 600 /etc/ha.d/authkeys
[root@node1 ha.d]# chmod 755 /etc/ha.d/resource.d/killnfsd
拷贝配置文件到node2.123.com
[root@node1 ha.d]# scp ha.cf authkeys haresources node2.123.com:/etc/ha.d/
root@node2.123.com's password:
ha.cf                                  100%   10KB 10.4KB/s   00:00   
authkeys                               100% 659     0.6KB/s   00:00   
haresources                            100% 6009     5.9KB/s   00:00   
[root@node1 ha.d]# scp resource.d/killnfsd node2.123.com:/etc/ha.d/resource.d/
root@node2.123.com's password:
killnfsd                               100%   51     0.1KB/s   00:00 
[root@node2 ha.d]# vim ha.cf
ucast eth0 192.168.1.4
重启服务
[root@node2 ~]# service heartbeat restart
Stopping High-Availability services:
                                                           [确定]
Waiting to allow resource takeover to complete:
                                                           [确定]
Starting High-Availability services:
2012/10/17_16:41:28 INFO: Resource is stopped
                                                           [确定]
四:测试:
 
查看节点1
[root@node1 ~]# drbd-overview
 0:web Connected Primary/Secondary UpToDate/UpToDate C r---- /mnt/web ext3 950M 18M 885M 2%
[root@node2 ~]# drbd-overview
 0:web Connected Secondary/Primary UpToDate/UpToDate C r----
 
[root@node1 ~]# ifconfig

Heartbeat+DRBD+NFS案例详解_heartbeat_02 

开启客户端测试

Heartbeat+DRBD+NFS案例详解_高可用性_03 

挂载并查看

Heartbeat+DRBD+NFS案例详解_服务器集群_04 

编写脚本
[root@localhost ~]# vim /mnt/test.sh
while true
do
echo --\> trying touch x:`date`
touch x
echo \<-----done touch x:`date`
echo
sleep 2
done
执行脚本
[root@localhost ~]# cd /mnt/1
[root@localhost 1]# bash /mnt/test.sh
[root@localhost 1]# bash /mnt/test.sh
--> trying touch x:2012年 10月 18日 星期四 17:48:27 CST
<-----done touch x:2012年 10月 18日 星期四 17:48:27 CST
 
--> trying touch x:2012年 10月 18日 星期四 17:48:29 CST
<-----done touch x:2012年 10月 18日 星期四 17:48:29 CST
 
--> trying touch x:2012年 10月 18日 星期四 17:48:31 CST
<-----done touch x:2012年 10月 18日 星期四 17:48:31 CST
 
--> trying touch x:2012年 10月 18日 星期四 17:48:33 CST
<-----done touch x:2012年 10月 18日 星期四 17:48:33 CST
 
在服务器节点node1上关闭heartbeat服务
[root@node1 ~]# service heartbeat stop
Stopping High-Availability services:
                                                           [确定]
 
在客户端会发现丢弃现象,之后文件系统又恢复正常
 

Heartbeat+DRBD+NFS案例详解_高可用性_05