三、RHCS的corosync组件实现HA(高可用)。

1、pacemaker作为corosync的插件运行

搭建环境:

ms.dtedu.com:管理HA的站点(ansible)

node5.dtedu.com:高可用节点1

node6.dtedu.com:高可用节点2

资源:vip+web+fielsystem

前提条件:

1、时间同步

2、dns解析

3、ssh互信

4、关闭iptables

5、关闭selinux

注意:运行networkmanager组件,将不能把此节点放在集群中

1.1安装ansible软件,然后安装corosync、pecemaker,安装pacemaker不能安装heartbeat,安装不上注意仓库的选择问题。

[root@ms.dtedu.com~]$ansible all -a "yum install -y corosync pacemaker"

node5.dtedu.com | SUCCESS | rc=0 >>

已加载插件:fastestmirror, refresh-packagekit

设置安装进程

Loading mirror speeds from cached hostfile

 * base: mirror.bit.edu.cn

 * epel: mirrors.tuna.tsinghua.edu.cn

 * extras: mirrors.tuna.tsinghua.edu.cn

 * updates: mirrors.tuna.tsinghua.edu.cn

依赖关系解决


================================================================================

 软件包                      架构        版本                   仓库       大小

================================================================================

正在安装:

 pacemaker                   x86_64      1.1.15-5.el6           base      443 k

为依赖而安装:

 cifs-utils                  x86_64      4.8.1-20.el6           base       65 k

 clusterlib                  x86_64      3.0.12.1-84.el6        base      109 k

 cman                        x86_64      3.0.12.1-84.el6        base      454 k

 cyrus-sasl-md5              x86_64      2.1.23-15.el6_6.2      base       47 k

 fence-agents                x86_64      4.0.15-13.el6          base      193 k

 fence-virt                  x86_64      0.2.3-24.el6           base       39 k

 gnutls-utils                x86_64      2.12.23-21.el6         base      109 k

 ipmitool                    x86_64      1.8.15-2.el6           base      465 k

 libtasn1-devel              x86_64      2.3-6.el6_5            base       61 k

 libvirt-client              x86_64      0.10.2-62.el6          base      4.1 M

 modcluster                  x86_64      0.16.2-35.el6          base      210 k

 nc                          x86_64      1.84-24.el6            base       57 k

 net-snmp-utils              x86_64      1:5.5-60.el6           base      177 k

 numactl                     x86_64      2.0.9-2.el6            base       74 k

 oddjob                      x86_64      0.30-6.el6             base       60 k

 openais                     x86_64      1.1.1-7.el6            base      192 k

 openaislib                  x86_64      1.1.1-7.el6            base       82 k

 pacemaker-cli               x86_64      1.1.15-5.el6           base      291 k

 pacemaker-cluster-libs      x86_64      1.1.15-5.el6           base       85 k

 pacemaker-libs              x86_64      1.1.15-5.el6           base      483 k

 perl-Net-Telnet             noarch      3.03-11.el6            base       56 k

 pexpect                     noarch      2.3-6.el6              base      147 k

 pyOpenSSL                   x86_64      0.13.1-2.el6           base      263 k

 python-suds                 noarch      0.4.1-3.el6            base      218 k

 quota                       x86_64      1:3.17-23.el6          base      202 k

 resource-agents             x86_64      3.9.5-46.el6           base      389 k

 ricci                       x86_64      0.16.2-87.el6          base      633 k

 sg3_utils                   x86_64      1.28-12.el6            base      498 k

 tcp_wrappers                x86_64      7.6-58.el6             base       70 k

 yajl                        x86_64      1.0.7-3.el6            base       27 k

为依赖而更新:

 gnutls                      x86_64      2.12.23-21.el6         base      389 k

 gnutls-devel                x86_64      2.12.23-21.el6         base      1.2 M

 net-snmp-devel              x86_64      1:5.5-60.el6           base      307 k

 net-snmp-libs               x86_64      1:5.5-60.el6           base      1.5 M

 nspr                        x86_64      4.13.1-1.el6           base      114 k

 nss                         x86_64      3.27.1-13.el6          base      873 k

 nss-sysinit                 x86_64      3.27.1-13.el6          base       50 k

 nss-tools                   x86_64      3.27.1-13.el6          base      443 k

 nss-util 


软件包               架构            版本                  仓库           大小

================================================================================

正在安装:

 corosync             x86_64          1.4.7-5.el6           base          216 k

为依赖而安装:

 corosynclib          x86_64          1.4.7-5.el6           base          194 k


                    x86_64      3.27.1-3.el6           base       68 k



1.2安装crmsh,pssh软件包。crmsh依赖于pssh。

[root@ms.dtedu.com~]$ansible all -a "chdir=/etc/yum.repos.d wget http://download.opensuse.org/repositories/network:ha-clustering:Stable/RedHat_RHEL-6/network:ha-clustering:Stable.repo"

node6.dtedu.com | SUCCESS | rc=0 >>

--2017-04-10 06:31:50--  http://download.opensuse.org/repositories/network:ha-clustering:Stable/RedHat_RHEL-6/network:ha-clustering:Stable.repo

正在解析主机 download.opensuse.org... 195.135.221.134, 2001:67c:2178:8::13

正在连接 download.opensuse.org|195.135.221.134|:80... 已连接。

已发出 HTTP 请求,正在等待回应... 301 Moved Permanently

位置:http://download.opensuse.org/repositories/network:ha-clustering:/Stable/RedHat_RHEL-6/network:ha-clustering:Stable.repo [跟随至新的 URL]

--2017-04-10 06:31:51--  http://download.opensuse.org/repositories/network:ha-clustering:/Stable/RedHat_RHEL-6/network:ha-clustering:Stable.repo

再次使用存在的到 download.opensuse.org:80 的连接。

已发出 HTTP 请求,正在等待回应... 301 Moved Permanently

位置:http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/RedHat_RHEL-6/network:ha-clustering:Stable.repo [跟随至新的 URL]

--2017-04-10 06:31:51--  http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/RedHat_RHEL-6/network:ha-clustering:Stable.repo

再次使用存在的到 download.opensuse.org:80 的连接。

已发出 HTTP 请求,正在等待回应... 200 OK

长度:345 [text/plain]

正在保存至: “network:ha-clustering:Stable.repo”


     0K                                                       100% 28.4M=0s




[root@ms.dtedu.com~]$ansible all -a "yum -y install crmsh"


1.3配置文件讲解(/etc/corosync.conf.example)


sync]# cat corosync.conf |grep -v ^# |grep -v ^$

compatibility: whitetank//是否兼容whitetank版本,就是0.8版本,兼容的话新功能不能使用。

totem {//心跳线配置模块

version: 2

# secauth: Enable mutual node authentication. If you choose to

# enable this ("on"), then do remember to create a shared

# secret with "corosync-keygen".

secauth: off//是否启用安全认证模式

threads: 0//启动线程数量

# interface: define at least one interface to communicate

# over. If you define more than one interface stanza, you must

# also set rrp_mode.

interface {//定义心跳信息传递接口

                # Rings must be consecutively numbered, starting at 0.

ringnumber: 0//信息循环次数

# This is normally the *network* address of the

# interface to bind to. This ensures that you can use

# identical instances of this configuration file

# across all your cluster nodes, without having to

# modify this option.

bindnetaddr: 192.168.1.0//绑定的网络地址,用于心跳线的网卡ip地址。

# However, if you have multiple physical network

# interfaces configured for the same subnet, then the

# network address alone is not sufficient to identify

# the interface Corosync should bind to. In that case,

# configure the *host* address of the interface

# instead:

# bindnetaddr: 192.168.1.1

# When selecting a multicast address, consider RFC

# 2365 (which, among other things, specifies that

# 239.255.x.x addresses are left to the discretion of

# the network administrator). Do not reuse multicast

# addresses across multiple Corosync clusters sharing

# the same network.

mcastaddr: 224.5.5.5//组播地址

# Corosync uses the port you specify here for UDP

# messaging, and also the immediately preceding

# port. Thus if you set this to 5405, Corosync sends

# messages over UDP ports 5405 and 5404.

mcastport: 5405//组播端口

# Time-to-live for cluster communication packets. The

# number of hops (routers) that this ring will allow

# itself to pass. Note that multicast routing must be

# specifically enabled on most network routers.

ttl: 1

}

}

logging {//定义日志信息

# Log the source file and line where messages are being

# generated. When in doubt, leave off. Potentially useful for

# debugging.

fileline: off

# Log to standard error. When in doubt, set to no. Useful when

# running in the foreground (when invoking "corosync -f")

to_stderr: no

# Log to a log file. When set to "no", the "logfile" option

# must not be set.

to_logfile: yes

logfile: /var/log/cluster/corosync.log

# Log to the system log daemon. When in doubt, set to yes.

to_syslog: yes//是否将日志信息写入的/var/log/message中,建议no

# Log debug messages (very verbose). When in doubt, leave off.

debug: off

# Log messages with time stamps. When in doubt, set to on

# (unless you are only logging to syslog, where double

# timestamps can be annoying).

timestamp: on//是否打开时间戳,可以关闭

logger_subsys {

subsys: AMF

debug: off

}

}

Service {//以模块方式运行pecemaker

ver:0

name:pacemaker

}



1.4 制作corosync通信间的安全秘钥。将authkey、corosync.cnf复制到其他节点上。

[root@node5.dtedu.com /etc/corosync]# corosync-keygen 

Corosync Cluster Engine Authentication key generator.

Gathering 1024 bits for key from /dev/random.

Press keys on your keyboard to generate entropy.

Writing corosync key to /etc/corosync/authkey.


[root@node5.dtedu.com /etc/corosync]# scp authkey corosync.conf node6:/etc/corosync/

authkey                                       100%  128     0.1KB/s   00:00    

corosync.conf                                 100% 2663     2.6KB/s   00:00    

[root@node5.dtedu.com /etc/corosync]# 


1.5关闭节点上的NetworkManager服务

[root@ms.dtedu.com~]$ansible all -a "chkconfig NetworkManager off"

node5.dtedu.com | SUCCESS | rc=0 >>



node6.dtedu.com | SUCCESS | rc=0 >>



[root@ms.dtedu.com~]$ansible all -a "service NetworkManager stop"

node5.dtedu.com | SUCCESS | rc=0 >>

Stopping NetworkManager daemon: [FAILED]


node6.dtedu.com | SUCCESS | rc=0 >>

Stopping NetworkManager daemon: [  OK  ]


1.6启动corosync服务

[root@ms.dtedu.com~]$ansible all -a "service corosync start"

node6.dtedu.com | SUCCESS | rc=0 >>

Starting Corosync Cluster Engine (corosync): [  OK  ]


node5.dtedu.com | SUCCESS | rc=0 >>

Starting Corosync Cluster Engine (corosync): [  OK  ]



1.7检查服务启动情况。

检查corosync引擎是否正常启动

[root@node5.dtedu.com /etc/corosync]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log 

Apr 10 10:14:23 corosync [MAIN  ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.

Apr 10 10:14:23 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.


查看初始化成员节点通知是否正常发送

[root@node5.dtedu.com /etc/corosync]# grep TOTEM /var/log/cluster/corosync.log 

Apr 10 10:14:23 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).

Apr 10 10:14:23 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Apr 10 10:14:24 corosync [TOTEM ] The network interface [192.168.1.23] is now up.

Apr 10 10:14:24 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

Apr 10 10:14:24 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.


检查启动过程中是否有错误产生,针对资源错误进行检查



[root@node5.dtedu.com /etc/corosync]# grep ERROR: /var/log/cluster/corosync.log  |grep -v unpack_resources


检查pacemaker是否正常启动

[root@node5.dtedu.com /etc/yum.repos.d]# grep pcmk_startup /var/log/cluster/corosync.log 

Apr 10 13:17:19 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized

Apr 10 13:17:19 corosync [pcmk  ] Logging: Initialized pcmk_startup

Apr 10 13:17:19 corosync [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615

Apr 10 13:17:19 corosync [pcmk  ] info: pcmk_startup: Service: 9

Apr 10 13:17:19 corosync [pcmk  ] info: pcmk_startup: Local hostname: node5.dtedu.com



查看高可用节点间的状态

[root@node6.dtedu.com /etc/yum.repos.d]# service corosync start

Starting Corosync Cluster Engine (corosync):               [  OK  ]

[root@node6.dtedu.com /etc/yum.repos.d]# crm status

Stack: classic openais (with plugin)

Current DC: node5.dtedu.com (version 1.1.15-5.el6-e174ec8) - partition with quorum

Last updated: Mon Apr 10 13:47:46 2017Last change: Mon Apr 10 13:47:38 2017 by hacluster via crmd on node5.dtedu.com

, 2 expected votes

2 nodes and 0 resources configured


Online: [ node5.dtedu.com node6.dtedu.com ]


No resources




用来检查corosync是否有语法错误

[root@node5.dtedu.com /etc/yum.repos.d]# crm_verify -LV

   error: unpack_resources:Resource start-up disabled since no STONITH resources have been defined

   error: unpack_resources:Either configure some or disable STONITH with the stonith-enabled option

   error: unpack_resources:NOTE: Clusters with shared data need STONITH to ensure data integrity

Errors found during check: config not valid