heartbeat v2-GUI配置web高可用

推荐原创

jinlinger 2014-04-23 13:34:15 博主文章分类：集群 ©著作权

文章标签 约束 crm heartbeat 高可用 文章分类 数字化转型

©著作权归作者所有：来自51CTO博客作者jinlinger的原创作品，请联系作者获取转载授权，否则将追究法律责任

--本文大纲

前提
前言
实验拓扑
安装相关软件
配置 heartbeat（crm 资源管理器）

crm图形界面配置详解
crm配置资源（group resource）
crm资源约束(constraints)
crm资源配置总结

一、前提（前文己经提过如何设置）

1、设置主机名，要与uname -n 的名字一致，能够DNS解析

此处要修改/etc/sysconfig/network中的hostname
还要在/etc/hosts文件中为各节点做正向解析

2、节点间的时间必须一致

ntpdate

3、节点间无密码登录

#ssh-keygen -t rsa -P ''
#ssh-copy-id -i .ssh/id_rsa.pub node_name

二、前言

与1.x style不同的是，v2.x style会使用CRM来进行整个集群的管理。集群开始时，通过在各节点间选举产生一个节点成为DC（Designated Coordinator），配置应在该DC上进行，并由其分发到下面的各个节点上，其中haresource 资源管理器heartbeat v1内置的资源管理器，功能比较简单，不支持图形化管理。到了heartbeat v2时，有了更加强大的资源管理器crm，但同时为了兼容heartbeat v1，在heartbeat v2中是不兼容haresource资源管理器的配置文件haresource，所以在/etc/ha.d/haresources配置的资源都是不能使用的，所以得重新配置。需要在中添加crm on 来开启 crm 功能，crm通过mgmtd进程监听在5560/tcp 。

注:需要启动hb_gui的主机为hacluster用户添加密码，并使用其登录hb_gui

三、实验拓扑

四、安装相关软件

1、heartbeat 安装组件说明

heartbeat 核心组件 *
heartbeat-devel 开发包
heartbeat-gui 图形管理接口 *（这次就是用GUI来配置集群资源）
heartbeat-ldirectord 为lvs高可用提供规则自动生成及后端realserver健康状态检查的组件
heartbeat-pils 装载库插件接口 *
heartbeat-stonith 爆头接口 *

2、Xmanager Enterprise 4

五、配置 heartbeat（crm 资源管理器）

第一步、安装软件包

#yum install -y perl-TimeDate net-snmp-libs libnet PyXML

注：libnet是在epel源中，所以要下载ＥＰＥＬ源安装包即可

http://download.fedoraproject.org/pub/epel/6/i386/repoview/epel-release.html

下载对应的版本就可以的。

#rpm -ivh epel-release-6-8.noarch.rpm

之后就可以使用yum安装ＥＰＥL中的包了

#yum install -y libnet
#rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm  heartbeat-devel-2.1.4-12.el6.x86_64.rpm  heartbeat-gui-2.1.4-12.el6.x86_64.rpm

注：

以上步骤node4也要执行

开始配置node2

当安装完成后，查找到配置文件，拷贝到/etc/ha.d目录下

[root@essun download]# rpm -ql heartbeat-2.1.4 |grep 
 /usr/share/doc/heartbeat-2.1.4/

进入到此目录下，将此目录下的authkeys,拷贝到/etc/ha.d/目录下

[root@essun heartbeat-2.1.4]# cp authkeys  /etc/ha.

注解：

authkeys

集群节点的认证文件，以防有非集群内部节点进入此集群

heartbeat 的主配置文件

编辑authkeys

vim authkeys
#auth 1
#1 crc
#2 sha1 
# #3 md5 Hello!
auth 1
1 md5 $1$Oukjg1$aZS0Qb.PBg1Isv0cSJcxL/

修改此文件的权限为600

#chmod 600 authkeys

编辑主配置文件

[root@essun ha.d]# grep -v "#"  |grep -v "^$"
logfile    /var/log/ha-log
keepalive 100ms
deadtime 10
warntime 4
udpport    694
mcast eth0 225.0.0.1 694 1 0
auto_failback on
node 
node 
crm on
ping 172.16.0.1
compression    bz2
compression_threshold 10

注：

crm on

开启crm功能

其余参数说明见前文

编辑完成将这两个配置文件拷贝到node4的对应目录下

[root@essun ha.d]# scp authkeys  :/etc/ha.d/
The authenticity of host ' (192.168.1.110)' can't be established.
RSA key fingerprint is b8:9d:cb:7b:4d:ad:c2:fb:a4:00:23:b0:f2:6b:3f:ad.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ',192.168.1.110' (RSA) to the list of known hosts.
root@'s password:
authkeys                                     100%  699     0.7KB/s   00:00 
                                        100%   10KB  10.3KB/s   00:00

第二步、在node2与node4上安装http服务

#yum install -y httpd

测试服务可用性

node2

[root@essun ha.d]#echo "" >/var/www/html/index.html
[root@essun ha.d]# service httpd start
Starting httpd:                                            [  OK  ]
[root@essun ha.d]# curl http://172.16.32.20

node4

[root@essun download]# echo "" > /var/www/html/index.html
[root@essun download]# service httpd restart
Stopping httpd: [FAILED]
Starting httpd: [ OK ]
[root@essun download]# curl http://172.16.32.4

测试node2,node4可用后关闭http服务，禁止开机自动启动，使用crm来控制其启动与停止

[root@essun ha.d]# service httpd stop
Stopping httpd:                                            [  OK  ]
[root@essun ~]# chkconfig httpd off

第三步、在两个节点上启动heartbeat

在node2上启动 heartbeat

[root@essun ha.d]# service heartbeat start
Starting High-Availability services:
Done.
[root@essun ha.d]# ss -tnlp |grep 5560
LISTEN     0      10                        *:5560                     *:*      users:(("mgmtd",16231,10))

在node4上启动heartbeat

[root@essun ha.d]# ssh  "service heartbeat start"
 Starting High-Availability services:
Done.
[root@essun ~]# ss -tnlp |grep 5560
LISTEN     0      10                        *:5560                     *:*      users:(("mgmtd",1776,10))

查一下集群状态

#crm_mon
============
Last updated: Tue Apr 22 08:41:26 2014
Current DC:  (7188d63f-6350-4ab4-8c3e-831110e2b642)
2 Nodes configured.
0 Resources configured.
============
Node:  (7188d63f-6350-4ab4-8c3e-831110e2b642): online
Node:  (0df746b3-fdab-4625-99d2-659e9a5ef1c6): online

从上面的集群状态中我们可以看到，集群中有两个节点，分别为node2和node4状态是online全部在线，我们还可以看到，DC是node2。两个节点的资源是0，说明在我们这个集群中还没有资源。下面我们来详细的说明一下！

第四步、crm配置的方式

命令行配置：

[root@essun ~]# crm
crmadmin       crm_failcount  crm_resource   crm_uuid    
crm_attribute  crm_master     crm_sh         crm_verify  
crm_diff       crm_mon        crm_standby

图形配置：

[root@essun ~]# hb_gui

注：这次重点说这个a_c~~~~~~

前面提到了CIB，那么现在我们就去看看，CIB的文件格式

[root@essun crm]# pwd
/var/lib/heartbeat/crm
[root@essun crm]# cat cib.xml
 <cib generated="true" admin_epoch="0" epoch="6" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0" ccm_transition="2" dc_uuid="7188d63f-6350-4ab4-8c3e-831110e2b642" num_updates="7" cib-last-written="Tue Apr 22 08:28:06 2014">
   <configuration>
     <crm_config>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>
           <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.4-node: aa909246edb386137b986c5773344b98c6969999"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node id="7188d63f-6350-4ab4-8c3e-831110e2b642" uname="" type="normal"/>
       <node id="0df746b3-fdab-4625-99d2-659e9a5ef1c6" uname="" type="normal"/>
     </nodes>
     <resources/>
     <constraints/>
   </configuration>
 </cib>

看到了吧，反正我是无法看得懂，还好有命令行与图形化配置。

第五步、配置crm 资源管理器

要想启用GUI配置界面，必须要为hacluster用户（heartbeat默认用户）创建一个密码（hacluster）。

[root@essun crm]# grep hacluster /etc/passwd
hacluster:x:496:493:heartbeat user:/var/lib/heartbeat/cores/hacluster:/sbin/nologin
[root@essun crm]# echo hacluster | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.

注：

你想在那一个节点上进行配置，就在那一个节点上创建密码。

六、crm图形界面配置详解

启动hb_gui

#hb_gui &

己经启动图形了，现在我连接上去看一下效果

看一下登录后的显示的界面

说明：

资源类型：

primitive, native: 主资源，只能运行于一个节点上

group: 组资源；

clone: 克隆资源（首先其必须是主资源）；

参数：

总克隆数;

每个节点最多可运行的克隆数

stonith（资源隔离），cluster filesystem(依赖于分布式锁，其它进程能够看到，将不会写入操作)

master/slave: 主从资源如DRBD（只有两份（主的可以读写，从的不能读不能写））

RA类型：（ resource class）

heartbeat legacy传统类型

LSB：位于 /etc/rc.d/init.d/*的所有程序

OCF：

STONITH

七、添加资源(组资源)

注：

在高可用Web集群中有三个资源分别为：

VIP
httpd

添加资源

增加native资源

设置完成后点Add,结果如下

Resources中，定义一个组，名字叫webservice,组中一条资源记录（服务）及其当前的状态

继续添加httpd服务

添加后的结果如下：

启动服务，测试一下效果

经验证，所有的资源的确运行于node2上，这次我将模拟故障后一下是不是所有的资源都会转移到node4上呢？

看一下效果

资源直接切换到node4上去了，下面我在node3上做一个nfs共享存储，做一个简单的测试页面看一下文件系统资源是如何在高可用中使用。

节点上安装nfs

#yum install -y nfs
#mkdir -p /www/share
#vim /etc/exports
/www/share *(rw)
#exportfs -r
# service nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS mountd: [ OK ]
Starting NFS daemon: [ OK ]
Starting RPC idmapd: [ OK ]
# echo "cluster server nfs" > /www/share/index.html
[root@essun ~]# service nfs stop
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS mountd:                                  [  OK  ]
Shutting down NFS quotas:                                  [  OK  ]
Shutting down NFS services:                                [  OK  ]
Shutting down RPC idmapd:                                  [  OK  ]

开始添加资源

启动nfs资源看一下效果

现在模拟node4故障，将资源切换回node2上

看到效果了吧！！！

说明：

组资源是将多个资源组合在一起对外提供服务，如果这里不定义组资源，每一种资源将平均到每一个节点上，显然这不符合我们的业务逻辑，如是我们想将多个资源运行于同一个节点上对外提供服务，该如何做到呢，这个时候就用到了资源约束了，向下看吧~~~

八、crm资源约束(constraints)

资源约束（3种类型）：

位置约束（locations）：资源更倾向于哪个节点上，使用数值表示；

inf: 无穷大

n: 表示数值，要对节点间的做比较来决定最后落在那个节点上（此处n为正值）

-n: 同上，如果节点间的位置约束均为负值，则最后运行于接近正值的节点上

-inf: 负无穷（只要有其它选择，就不会选择此节点）

排列约束(Colocations)：资源运行在同一节点的倾向性；

inf: 无限倾向于在一起

-inf: 无限倾向于不在一起

顺序约束(orders)：资源启动次序及关闭次序；

下面我将图解这三种约束在高可用中的运用，现在所有的资源都被清空了。

如前面提到的步骤添加三个资源ip、httpd、nfs

七、添加资源（约束）

排列约束（那一个资源一那一个资源必须在一起的可能性），由于前面定义过的资源己经清空，所在要再建立一次

创建排列约束

定义webip资源与webserver资源在一起的可能性

说明：

ID：表示为约束记录起一个名字

from to:表示那一个资源与那一个资源（此处写上面定义资源名字即可）

score:在一起的可能性（也可以是数值）

INFINITY:ip必须要和webserver在一个节点上

定义webserver与webnfs在一起的可能性

只有这样定义还不够，这样定义过，仅能保证这三个资源在同一个节点上运行;我们知道一个web对外提供服务首先要有一个对外的地址，如果有web存储的话，还要将存储挂载到对应节点的目录上，最后才是服务本身，所以还要定义在一下顺序。

定义顺序约束

webip与webnfs顺序约束

webnfs与webserver约束

webip与webserver约束

说明：

ID、from、to与前面代表的意思一样

建立位置约束（更倾向于那一个节点）

注：

只在将第一个启动的资源服务定义在位置约束之中，那么后面的所有的资源都将在此节点运行

全部约束整合之后的样子

启动资源，测试访问

模拟node2故障

当node2重新上线后，所有的资源将再一次转移到node2上

回到命令行看一看

#crm_mon
============
Last updated: Wed Apr 23 12:15:05 2014
Current DC:  (7188d63f-6350-4ab4-8c3e-831110e2b642)
2 Nodes configured.
3 Resources configured.
============
Node:  (7188d63f-6350-4ab4-8c3e-831110e2b642): online
Node:  (0df746b3-fdab-4625-99d2-659e9a5ef1c6): online
webip   (ocf::heartbeat:IPaddr):        Started 
webnfs  (ocf::heartbeat:Filesystem):    Started 
webserver    (lsb:httpd):    Started

再看一眼CIB

[root@essun crm]# cat cib.xml
 <cib generated="true" admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0" ccm_transition="4" dc_uuid="7188d63f-6350-4ab4-8c3e-831110e2b642" epoch="116" num_updates="1" cib-last-written="Wed Apr 23 12:05:43 2014">
   <configuration>
     <crm_config>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>
           <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.4-node: aa909246edb386137b986c5773344b98c6969999"/>
           <nvpair name="last-lrm-refresh" id="cib-bootstrap-options-last-lrm-refresh" value="1398221371"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node uname="" type="normal" id="7188d63f-6350-4ab4-8c3e-831110e2b642">
         <instance_attributes id="nodes-7188d63f-6350-4ab4-8c3e-831110e2b642">
           <attributes>
             <nvpair name="standby" id="standby-7188d63f-6350-4ab4-8c3e-831110e2b642" value="off"/>
           </attributes>
         </instance_attributes>
       </node>
       <node id="0df746b3-fdab-4625-99d2-659e9a5ef1c6" uname="" type="normal">
         <instance_attributes id="nodes-0df746b3-fdab-4625-99d2-659e9a5ef1c6">
           <attributes>
             <nvpair id="standby-0df746b3-fdab-4625-99d2-659e9a5ef1c6" name="standby" value="off"/>
           </attributes>
         </instance_attributes>
       </node>
     </nodes>
     <resources>
       <primitive id="webip" class="ocf" type="IPaddr" provider="heartbeat">
         <meta_attributes id="webip_meta_attrs">
           <attributes>
             <nvpair id="webip_metaattr_target_role" name="target_role" value="started"/>
           </attributes>
         </meta_attributes>
         <instance_attributes id="webip_instance_attrs">
           <attributes>
             <nvpair id="2e4cd4a7-222b-4db9-93a8-7277ca287e92" name="ip" value="192.168.1.100"/>
             <nvpair id="f72cf044-813a-4dee-b679-fdc20b12f09a" name="nic" value="eth0"/>
             <nvpair id="f79817da-3169-4a4a-b64c-30ce668e0c5d" name="cidr_netmask" value="24"/>
           </attributes>
         </instance_attributes>
       </primitive>
       <primitive id="webnfs" class="ocf" type="Filesystem" provider="heartbeat">
         <meta_attributes id="webnfs_meta_attrs">
           <attributes>
             <nvpair id="webnfs_metaattr_target_role" name="target_role" value="started"/>
           </attributes>
         </meta_attributes>
         <instance_attributes id="webnfs_instance_attrs">
           <attributes>
             <nvpair id="76c25557-53ac-47dc-bd5f-3712a990928e" name="device" value="192.168.1.107:/www/share"/>
             <nvpair id="57dbdea6-3541-45cd-b076-d52f4b078196" name="directory" value="/var/www/html"/>
             <nvpair id="6ebe586c-cdc8-4519-a592-d0f5d6cb59a5" name="fstype" value="nfs"/>
           </attributes>
         </instance_attributes>
       </primitive>
       <primitive id="webserver" class="lsb" type="httpd" provider="heartbeat">
         <meta_attributes id="webserver_meta_attrs">
           <attributes>
             <nvpair id="webserver_metaattr_target_role" name="target_role" value="started"/>
           </attributes>
         </meta_attributes>
         <instance_attributes id="webserver_instance_attrs">
           <attributes>
             <nvpair id="097dd5e7-36ae-4aeb-ab9f-ad06485c5886" name="httpd" value="/etc/rc.d/init.d/httpd"/>
           </attributes>
         </instance_attributes>
       </primitive>
     </resources>
     <constraints>
       <rsc_colocation id="webip_with_webserver" from="webip" to="webserver" score="INFINITY"/>
       <rsc_colocation id="webserver_with_webnfs" from="webserver" to="webnfs" score="INFINITY"/>
       <rsc_order id="webip_before_webnfs" to="webnfs" from="webip"/>
       <rsc_order id="webnfs_before_webserver" to="webserver" from="webnfs"/>
       <rsc_order id="webip_before_webserver" to="webserver" from="webip"/>
       <rsc_location id="webip_on_node2" rsc="webip">
         <rule id="prefered_webip_on_node2" score="0"/>
       </rsc_location>
     </constraints>
   </configuration>
 </cib>

九、总结

在高可用集群中，将资源定义于同一个节点上的方法有两种，第一种是使用group resoruce之后每一种资源都是primitive、native,那么在组中定义的顺序将决定资源启动顺序，第二种是将资源都定义成为primitive、native，由约束为其限定倾向性，排列约束，启动顺序;如在此实验中，高可用的web集群的资源启动顺序为webip---->[webnfs]--->webserver.