一)hearbeat原理

heartbeat (Linux-HA)的工作原理:heartbeat最核心的包括两个部分,心跳监测部分和资源接管部分,心跳监测可以通过网络链路和串口进行,而且支持冗 余链路,它们之间相互发送报文来告诉对方自己当前的状态,如果在指定的时间内未受到对方发送的报文,那么就认为对方失效,这时需启动资源接管模块来接管运 行在对方主机上的资源或者服务。

二)hearbeat配置

实现目的,当节点1宕机后,节点2能立马提供服务。

hearbeat1:192.168.1.122

hearbeat2:192.168.1.114

nfs:192.168.1.122

 

前提:

1)定义好节点名称,每个节点都能互相解析,可以在/etc/hosts中定义
2)当节点多时,时间最好都同步,最好有个时间服务器ntp

3)每个节点能基于ssh密钥通信

各节点修改主机名:

[root@node1 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=node1.shunzi.com
[root@node1 ~]# uname -n
node1.shunzi.com
[root@node2 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=node2.shunzi.com
[root@node2 ~]# uname -n
node2.shunzi.com

各节点时间同步

ntpdate 0.centos.pool.ntp.org

每个节点基于密钥通信,无须输入密码

ssh-keygen -t rsa -P ''
scp -i .ssh/id_rsa.pub root@node2.shunzi.com

ssh-keygen -t rsa -P ''
scp -i .ssh/id_rsa.pub root@node2.shunzi.com

提前安装需要的包组和包

yum groupinstall "Development tools"
yum groupinstall "Server Platform Development"

yum -y install libnet PyXML perl-Timedate net-snmp-libs

安装hearbeat包,两个节点都需要安装

heartbeat-2.1.4-12.el6.x86_64.rpm核心程序包
heartbeat-debuginfo-2.1.4-12.el6.x86_64.rpm
heartbeat-devel-2.1.4-12.el6.x86_64.rpm开发组件
heartbeat-gui-2.1.4-12.el6.x86_64.rpm图形界面
heartbeat-ldirectord-2.1.4-12.el6.x86_64.rp
heartbeat-pils-2.1.4-12.el6.x86_64.rpm
heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm

安装完成后,把配置文件和认证文件复制过来;

cp /usr/share/doc/heartbeat-2.1.4/authkeys /etc/ha.d/
cp /usr/share/doc/heartbeat-2.1.4/ha.cf /etc/ha.d/
cp /usr/share/doc/heartbeat-2.1.4/haresources /etc/ha.d

修改认证文件:authkeys

可以通过openssl生成复杂密码

[root@node1 ha.d]# openssl rand -hex 8 生成16位的随机数
7dd04dabdfd104cd

vim /etc/ha.d/authkeys

hearbeat v1版本之web高可用_hearbeat v1版本之web高可用

权限必须是600

chmod 600 authkeys

配置核心配置文件

[root@node1 ha.d]# vim ha.cf 

[root@node1 ha.d]# egrep -v "^$|^#" ha.cf
logfile /var/log/ha-log--》自定义日志存放目录
deadtime 8--》探测对方存在次数
warntime 4--》发起警告
udpport 694--》使用端口
mcast eth0 225.0.0.1 694 1 0--》使用组播地址
auto_failback on--》启动自动转回
node node1.shunzi.com--》定义节点
node node2.shunzi.com--》定义节点
ping 192.168.1.253--》测试网络,这里用的路由网关
compression bz2--》传送心跳信息,选择压缩传送
compression_threshold 2--》大于2k的才压缩

 

 

既然是做web服务,安装httpd做测试;

node1配置

echo "192.168.1.122:node1" >> index.html

启动httpd

service httpd start

node2配置

[root@node2 htdocs]# echo "192.168.1.114:node2" >> index.html
[root@node2 htdocs]# curl http://192.168.1.114/index.html
192.168.1.114:node2

测试ok后,要关闭httpd,开机不能自动启动,因为做集群时各节点都需要资源代理通一管理

node1

service httpd stop
chkconfig httpd off

node2

service httpd stop
chkconfig httpd off

 配置集群资源

由于我用了两台虚拟机,这里就在node1上配置个别名ip提供资源记录使用

ifconfig eth0:0 192.168.1.100/24 up

 vim haresources 

node1.shunzi.com 192.168.1.100/24/eth0 httpd

上面定义了两个资源,一个是ip,一个是web

意思为当访问192.168.1.100时优先访问node1节点,当node1节点宕机时,启动node2,node1恢复时,立马取会node2上的服务。

 

把修改好的配置文件,认证文件,资源记录文件。复制到各个节点

 scp -p authkeys haresources ha.cf node2:/etc/ha.d/

node1启动hearbeat服务

[root@node1 ha.d]# service heartbeat restart

node2也启动

ssh node2.shunzi.com 'service heartbeat start'

查看日志

[root@node1 ha.d]# tail -f /var/log/ha-log
heartbeat[39001]: 2014/04/25_00:25:38 info: Status update for node node2.shunzi.com: status active
harc[39008]:    2014/04/25_00:25:38 info: Running /etc/ha.d/rc.d/status status
heartbeat[39001]: 2014/04/25_00:25:38 info: Link 192.168.1.253:192.168.1.253 up.
heartbeat[39001]: 2014/04/25_00:25:38 info: Status update for node 192.168.1.253: status ping
heartbeat[39001]: 2014/04/25_00:25:39 info: Comm_now_up(): updating status to active
heartbeat[39001]: 2014/04/25_00:25:39 info: Local status now set to: 'active'
heartbeat[39001]: 2014/04/25_00:25:39 info: remote resource transition completed.
heartbeat[39001]: 2014/04/25_00:25:39 info: remote resource transition completed.
heartbeat[39001]: 2014/04/25_00:25:39 info: Local Resource acquisition completed. (none)
heartbeat[39001]: 2014/04/25_00:25:40 info: node2.shunzi.com wants to go standby [foreign]
heartbeat[39001]: 2014/04/25_00:25:51 info: standby: acquire [foreign] resources from node2.shunzi.com
heartbeat[39029]: 2014/04/25_00:25:51 info: acquire local HA resources (standby).
ResourceManager[39042]: 2014/04/25_00:25:51 info: Acquiring resource group: node1.shunzi.com 192.168.1.100/24/eth0 httpd
IPaddr[39068]:  2014/04/25_00:25:51 INFO:  Resource is stopped
ResourceManager[39042]: 2014/04/25_00:25:51 info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.100/24/eth0 start
IPaddr[39165]:  2014/04/25_00:25:51 INFO: Using calculated netmask for 192.168.1.100: 255.255.255.0
IPaddr[39165]:  2014/04/25_00:25:51 INFO: eval ifconfig eth0:0 192.168.1.100 netmask 255.255.255.0 broadcast 192.168.1.255
IPaddr[39136]:  2014/04/25_00:25:51 INFO:  Success
ResourceManager[39042]: 2014/04/25_00:25:51 info: Running /etc/init.d/httpd  start
heartbeat[39029]: 2014/04/25_00:25:51 info: local HA resource acquisition completed (standby).
heartbeat[39001]: 2014/04/25_00:25:51 info: Standby resource acquisition done [foreign].
heartbeat[39001]: 2014/04/25_00:25:51 info: Initial resource acquisition complete (auto_failback)
heartbeat[39001]: 2014/04/25_00:25:52 info: remote resource transition completed.
heartbeat[39001]: 2014/04/25_00:25:59 info: node2.shunzi.com wants to go standby [foreign]
heartbeat[39001]: 2014/04/25_00:26:10 info: standby: acquire [foreign] resources from node2.shunzi.com
heartbeat[39296]: 2014/04/25_00:26:10 info: acquire local HA resources (standby).
ResourceManager[39309]: 2014/04/25_00:26:10 info: Acquiring resource group: node1.shunzi.com 192.168.1.100/24/eth0 httpd
IPaddr[39335]:  2014/04/25_00:26:10 INFO:  Running OK
heartbeat[39296]: 2014/04/25_00:26:10 info: local HA resource acquisition completed (standby).
heartbeat[39001]: 2014/04/25_00:26:10 info: Standby resource acquisition done [foreign].
heartbeat[39001]: 2014/04/25_00:26:10 info: remote resource transition completed.

 访问测试

hearbeat v1版本之web高可用_hearbeat v1版本之web高可用_02

测试停掉node1,看是否能自动转到node2上

service heartbeat stop

当停掉node1后,查看node2ip会发现已经自动接管过来vip地址了。

访问已经能自动接管;

hearbeat v1版本之web高可用_hearbeat v1版本之web高可用_03

node1故障恢复后,重新上线,会把node2的资源给夺回来。因为前面已经定于,优先使用node1.

 

PS:

    hearbeat V1版本基于配置文件操作完成。实现了web的高可用性。