Corosyn+Pacemaker+Pcs实现HA

原创

岳麓丹枫 2022-07-12 13:18:07 博主文章分类：Linux ©著作权

文章标签 ipad linux sed 文章分类 虚拟化云计算

©著作权归作者所有：来自51CTO博客作者岳麓丹枫的原创作品，请联系作者获取转载授权，否则将追究法律责任

高可用介绍

高可用，大家可能会想到比较简单的Keepalived，或者更早一点的 heartbeat，也可能会用到 Corosync+Pacemaker，那么他们之间有什么区别。

Heartbeat到了v3版本后，拆分为多个子项目：Heartbeat、cluster-glue、Resource Agent、Pacemaker。

Heartbeat：只负责维护集群各节点的信息以及它们之前通信。

Cluster-glue：当于一个中间层，可以将heartbeat和crm（pacemaker）联系起来，主要包含2个部分，LRM和STONITH；

Resource Agent ：用来控制服务启停，监控服务状态的脚本集合，这些脚本将被LRM调用从而实现各种资源启动、停止、监控等等。

pacemaker：原Heartbeat 拆分出来的资源管理器，用来管理整个HA的控制中心，客户端通过pacemaker来配置管理监控整个集群。它不能提供底层心跳信息传递的功能，它要想与对方节点通信需要借助底层(新拆分的heartbeat或corosync)的心跳传递服务，将信息通告给对方。

Pacemaker 介绍

什么是Pacemaker

Pacemaker 是集群资源管理器。它实现了集群服务的最大可用性(即。通过使用首选集群基础设施(Corosync 或 Heartbeat)提供的消息传递和成员功能，检测并从节点和资源级故障中恢复。

架构

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IS4ItXR1-1595782926429)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200719141242214.png)]

内部组件

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mkceyaPE-1595782926438)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200719141623200.png)]

CIB (aka. 集群信息基础) CIB (Aka. Cluster Information Base)

CIB 使用 XML 表示集群的配置和集群中所有资源的当前状态
CIB 的内容在整个集群中自动保持同步
PEngine 使用它来计算集群的理想状态以及如何实现它
指令列表将被反馈给 DC (指定协调员 Designated Co-ordinator)

CRMd (aka. 集群资源管理守护进程) CRMD (Aka. Cluster Resource Management Daemon)

Pacemaker 通过选择一个 CRMd 实例充当主机来集中所有集群决策。
如果选择的 CRMd 进程，或者它所在的节点失败了… … 一个新的进程很快就会建立起来

DC（指定协调员 Designated Co-ordinator)）

DC 按照所需的顺序执行 PEngine 的指令

将它们通过集群消息传递基础结构(集群消息传递基础结构反过来将它们传递给它们的 LRMd 进程)传递给其他节点上的 LRMd (Local Resource Management daemon)或 CRMd 对等点
节点会把他们所有操作的日志发给DC，然后根据预期的结果和实际的结果(之间的差异)，执行下一个等待中的命令，或者取消操作，并让PEngine根据非预期的结果重新计算集群的理想状态。
PEngine (aka. PE or 策略引擎) PENGINE (Aka. Pe Or strategy engine)
STONITHd

在某些情况下，可能会需要关闭节点的电源来保证共享数据的完整性或是完全地恢复资源。为此Pacemaker引入了STONITHd。
STONITH是 Shoot-The-Other-Node-In-The-Head(爆其他节点的头)的缩写，并且通常是靠远程电源开关来实现的

N To N 架构
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oQXXB5wQ-1595782926449)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200719142256106.png)]

更新:

corosync在高可用中处于消息发送层，用于检测节点间通讯是否正常，而pacemaker则用于管理集群资源。通常在使用corosync和pacemaker的时候，我们都会使用统一的工具对它们进行管理，例如旧式的crmsh和新式的pcs。
使用crmsh或者pcs管理的好处是我们不必面向配置文件，而是直接通过命令行的方式管理集群节点，减少编辑配置文件造成的错误。另一个好处是降低学习成本，我们可以不必学习corosync和pacemaker的相关配置命令，只需要学习crmsh或者pcs如何使用。

配置互信

环境：

OS 版本：

[root@node0 corosync]# cat /etc/redhat-release
CentOS Linux release 7.8.2003 (Core)

IP信息：

node0 192.168.0.70
node1 192.168.0.71
node2 192.168.0.72

永久关闭防火墙并禁止开机启动与Selinux

【ALL】

systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl status firewalld.service

setenforce 0 
sed -i '/^SELINUX=/c\SELINUX=disabled'

配置互信

node0

ssh-keygen  -t   rsa
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node1
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node2

node1

ssh-keygen  -t   rsa
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node0
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node2

node2

ssh-keygen  -t   rsa
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node0
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node1

安装 corosync 与 pacemaker/pcs

【ALL】在每个节点上均执行安装命令

yum -y install corosync pacemaker  pcs resource-agents
-- yum install

【ALL】启动 pcs 服务，并设置开机自启动

systemctl start pcsd.service
systemctl enable

【ALL】设置 hacluster密码

安装组件生成的hacluster用户，用来本地启动pcs进程，因此我们需要设定密码，每个节点的密码相同

echo hacluster | passwd

【ONE】查看pcs信息

[root@node0 ~]# rpm -ql pacemaker

【ONE】查看corosync 安装信息

[root@node0 ~]# rpm -ql corosync

【ONE】在某个节点上执行

本例在 node0 上执行

[root@node0 corosync]# pcs  cluster auth node0 node1 node2 -u hacluster -p hacluster --force

【ONE】生成corosync 配置文件(随便在哪个节点上执行均可)

[root@node2 corosync]#
[root@node2 corosync]# pcs cluster setup --name cluster_test01 node0 node1 node2
Destroying cluster on nodes: node0, node1, node2...
node2: Stopping Cluster (pacemaker)...
node0: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node0: Successfully destroyed cluster
node2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node0', 'node1', 'node2'
node0: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
node1: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node0: Succeeded
node1: Succeeded
node2: Succeeded

Synchronizing pcsd certificates on nodes node0, node1, node2...
node1: Success
node0: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node0: Success
node2: Success

[root@node0 corosync]#  ll /etc/corosync/corosync.conf
-rw-r--r--. 1 root root 435 Jul 19 23:39 /etc/corosync/corosync.conf
[root@node0 corosync]#
[root@node1 corosync]# ll /etc/corosync/corosync.conf
-rw-r--r--. 1 root root 435 Jul 19 23:39 /etc/corosync/corosync.conf
[root@node2 corosync]#  ll /etc/corosync/corosync.conf

启动集群中的节点

只启动node1

[root@node0 corosync]# pcs cluster start node1
node1: Starting Cluster (corosync)...
node1: Starting Cluster (pacemaker)...
[root@node0 corosync]#
[root@node1 corosync]# ps -ef |grep coro
root      10691      1  8 23:42 ?        00:00:00 corosync
root      10716   9461  0 23:42 pts/1    00:00:00 grep --color=auto coro
[root@node1 corosync]# ps -ef |grep pace
root      10706      1  1 23:42 ?        00:00:00 /usr/sbin/pacemakerd -f
haclust+  10707  10706  1 23:42 ?        00:00:00 /usr/libexec/pacemaker/cib
root      10708  10706  0 23:42 ?        00:00:00 /usr/libexec/pacemaker/stonithd
root      10709  10706  0 23:42 ?        00:00:00 /usr/libexec/pacemaker/lrmd
haclust+  10710  10706  0 23:42 ?        00:00:00 /usr/libexec/pacemaker/attrd
haclust+  10711  10706  0 23:42 ?        00:00:00 /usr/libexec/pacemaker/pengine
haclust+  10712  10706  0 23:42 ?        00:00:00 /usr/libexec/pacemaker/crmd
root      10718   9461  0 23:42 pts/1    00:00:00 grep --color=auto pace
[root@node1 corosync]#
[root@node1 corosync]#
[root@node1 corosync]#

查看节点状态

[root@node0 corosync]# pcs cluster status
Error: cluster is not currently running on this node

[root@node1 corosync]# pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: node1 (version 1.1.21-4.el7-f14e36fd43)

启动所有节点

[root@node0 corosync]# pcs cluster start --all
node1: Starting Cluster (corosync)...
node0: Starting Cluster (corosync)...
node2: Starting Cluster (corosync)...
node0: Starting Cluster (pacemaker)...
node1: Starting Cluster (pacemaker)...
node2: Starting Cluster (pacemaker)...

[root@node0 corosync]# pcs status
Cluster name: cluster_test01

WARNINGS:
No stonith devices and stonith-enabled is not false

Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Sun Jul 19 23:55:09 2020
Last change: Sun Jul 19 23:47:47 2020 by hacluster via crmd on node1

3 nodes configured
0 resources configured

Online: [ node0 node2 ]
OFFLINE: [ node1 ]

解决告警问题

WARNINGS:
No stonith devices and stonith-enabled is not false
root@node0 corosync]# pcs property set stonith-enabled=false
[root@node0 corosync]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Sun Jul 19 23:59:13 2020
Last change: Sun Jul 19 23:57:13 2020 by root via cibadmin on node0

3 nodes configured
0 resources configured

Online: [ node0 node1 node2 ]

查看corosync状态

[root@node0 corosync]# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         1          1 node0 (local)

查看 pacemaker进程

[root@node0 corosync]# ps axf |grep pacemaker
  5003 pts/2    S+     0:00          \_ grep --color=auto pacemaker
  4792 ?        Ss     0:00 /usr/sbin/pacemakerd -f
  4793 ?        Ss     0:00  \_ /usr/libexec/pacemaker/cib
  4794 ?        Ss     0:00  \_ /usr/libexec/pacemaker/stonithd
  4795 ?        Ss     0:00  \_ /usr/libexec/pacemaker/lrmd
  4796 ?        Ss     0:00  \_ /usr/libexec/pacemaker/attrd
  4797 ?        Ss     0:00  \_ /usr/libexec/pacemaker/pengine
  4798 ?        Ss     0:00  \_ /usr/libexec/pacemaker/crmd

检查配置文件

[root@node0 corosync]# pcs property set stonith-enabled=true
[root@node0 corosync]# crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
[root@node0 corosync]# pcs property set stonith-enabled=false
[root@node0 corosync]#
[root@node0 corosync]# crm_verify -L -V

创建VIP

[root@node0 corosync]# pcs resource create VIP ocf:heartbeat:IPaddr2 ip=192.168.0.75 cidr_netmask=32 op monitor interval=30s
[root@node0 corosync]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 00:22:48 2020
Last change: Mon Jul 20 00:22:38 2020 by root via cibadmin on node0

3 nodes configured
1 resource configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP    (ocf::heartbeat:IPaddr2):       Started node0

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@node0 corosync]#
[root@node0 corosync]#
[root@node0 corosync]# ip ad list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:37:1b:18 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.70/24 brd 192.168.0.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 192.168.0.75/32 brd 192.168.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::4427:bd05:1cf9:1f4f/64 scope link tentative noprefixroute dadfailed
       valid_lft forever preferred_lft forever
    inet6 fe80::19de:291a:ae81:cfd7/64 scope link

查看 pacemaker 默认支持的资源

查看资源采用的标准类型

[root@node0 /]# pcs resource standards
lsb
ocf
service

查看可用的ocf资源提供者

[root@node0 /]# pcs resource providers

查看特定标准下所支持的脚本，例：ofc:heartbeat 下的脚本

[root@node0 /]# pcs resource agents ocf:heartbeat

将某个节点设置为standby 状态

[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 02:07:41 2020
Last change: Mon Jul 20 00:22:38 2020 by root via cibadmin on node0

3 nodes configured
1 resource configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP    (ocf::heartbeat:IPaddr2):       Started node0

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@node0 /]# pcs cluster standby node2
[root@node0 /]#
[root@node0 /]#
[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 02:07:53 2020
Last change: Mon Jul 20 02:07:50 2020 by root via cibadmin on node0

3 nodes configured
1 resource configured

Node node2: standby
Online: [ node0 node1 ]

Full list of resources:

 VIP    (ocf::heartbeat:IPaddr2):       Started node0

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@node0 /]# pcs cluster unstandby node2
[root@node0 /]#
[root@node0 /]#
[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 02:08:04 2020
Last change: Mon Jul 20 02:08:02 2020 by root via cibadmin on node0

3 nodes configured
1 resource configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP    (ocf::heartbeat:IPaddr2):       Started node0

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@node0 /]#

重启资源

[root@node0 /]# pcs resource restart  VIP

清理集群错误日志

root@node0 /]# pcs resource cleanup

无法仲裁的时候，选择忽略

[root@node0 /]# pcs property set no-quorum-policy=ignore
[root@node0 /]# pcs  property list
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: cluster_test01
 dc-version: 1.1.21-4.el7-f14e36fd43
 have-watchdog: false
 no-quorum-policy: ignore
 stonith-enabled: false
[root@node0 /]# pcs  property show
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: cluster_test01
 dc-version: 1.1.21-4.el7-f14e36fd43
 have-watchdog: false
 no-quorum-policy: ignore
 stonith-enabled: false

设置集群开机自动启动

设置之前

[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 02:19:22 2020
Last change: Mon Jul 20 02:17:39 2020 by root via cibadmin on node0

3 nodes configured
1 resource configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP    (ocf::heartbeat:IPaddr2):       Started node0

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

设置

[root@node0 /]#
[root@node0 /]# pcs cluster enable --all

设置之后

[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 02:22:06 2020
Last change: Mon Jul 20 02:17:39 2020 by root via cibadmin on node0

3 nodes configured
1 resource configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP    (ocf::heartbeat:IPaddr2):       Started node0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@node0 /]#

实例1:

Centos 7 下 Corosync + Pacemaker + psc 实现 httpd 服务高可用

安装并启动HTTPD

【ALL】所有节点都安装

yum -y install httpd
service httpd start

[root@node0 /]# service httpd start
Redirecting to /bin/systemctl start httpd.service
[root@node0 /]#
[root@node0 /]# service httpd status
Redirecting to /bin/systemctl status httpd.service
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-07-20 02:27:21 CST; 1min 16s ago
     Docs: man:httpd(8)
           man:apachectl(8)
 Main PID: 19333 (httpd)
   Status: "Total requests: 10; Current requests/sec: 0; Current traffic:   0 B/sec"
   CGroup: /system.slice/httpd.service
           ├─19333 /usr/sbin/httpd -DFOREGROUND
           ├─19334 /usr/sbin/httpd -DFOREGROUND
           ├─19335 /usr/sbin/httpd -DFOREGROUND
           ├─19337 /usr/sbin/httpd -DFOREGROUND
           ├─19338 /usr/sbin/httpd -DFOREGROUND
           ├─19387 /usr/sbin/httpd -DFOREGROUND
           ├─19388 /usr/sbin/httpd -DFOREGROUND
           ├─19389 /usr/sbin/httpd -DFOREGROUND
           ├─19390 /usr/sbin/httpd -DFOREGROUND
           ├─19391 /usr/sbin/httpd -DFOREGROUND
           └─19392 /usr/sbin/httpd -DFOREGROUND

Jul 20 02:27:21 node0 systemd[1]: Starting The Apache HTTP Server...
Jul 20 02:27:21 node0 httpd[19333]: AH00558: httpd: Could not reliably determine the server's fully qualif...ssage
Jul 20 02:27:21 node0 systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@node0 /]#

测试HTTP服务OK

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EVrLcYGP-1595782926469)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200726214043166.png)]

【ALL】开始Apache URL监控页

vim /etc/httpd/conf.d/status.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from all
</Location>

【ALL】关闭 httpd 服务，添加httpd 资源时会重新启动http服务，如果不关闭，会报错。

systemctl stop httpd
 systemctl status httpd

添加资源 WebSite

注意，这次是在node1上

[root@node1 corosync]# pcs resource create WebSite ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=30s
[root@node1 corosync]#
[root@node1 corosync]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 04:33:32 2020
Last change: Mon Jul 20 04:33:25 2020 by root via cibadmin on node1

3 nodes configured
2 resources configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP    (ocf::heartbeat:IPaddr2):       Started node0
 WebSite        (ocf::heartbeat:apache):        Started node1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@node1 corosync]#

创建了一个httpd 的集群资源 WebSite，主节点在 node1 上。检测页：http://localhost/server-status，检测时间：30s/次。但是有一个新的问题，虚拟IP在node0上， httpd资源在 node1上，会导致客户端无法访问。如果VIP在任何节点都不存在，那么WebSite也不能运行。

设置资源检测超时时间

[root@node1 corosync]# pcs resource op defaults timeout=120s
Warning: Defaults do not apply to resources which override them with their own defined values
[root@node1 corosync]# pcs resource op defaults
timeout=120s
[root@node1 corosyn

绑定服务资源和 VIP 资源，始终保持在一个节点上

[root@node1 corosync]#  pcs constraint colocation add WebSite with VIP INFINITY
[root@node1 corosync]#

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6R2N1vv4-1595782926475)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200726234823356.png)]

浏览器访问测试

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RJZIcsNm-1595782926482)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200726234857301.png)]

实例2

Centos 7 下 Corosync + Pacemaker + pcs+ HA-proxy 实现业务高可用

1、删除现有的WebSite 资源【ONE】

[root@node1 corosync]# pcs resource delete WebSite
Attempting to stop: WebSite... Stopped
[root@node1 corosync]#

2、安装 haproxy 服务【ALL】

yum  -y  install

3、配置 httpd 服务监控本地网卡80服务【ALL】

Listen 80 修改为 Listen 网卡IP:80

node0

grep -w 80 /etc/httpd/conf/httpd.conf
sed -i  "/Listen[[:blank:]]80/c\ Listen 192.168.0.70:80" /etc/httpd/conf/httpd.conf
systemctl restart httpd
grep

node1

grep -w 80 /etc/httpd/conf/httpd.conf
sed -i  "/Listen[[:blank:]]80/c\ Listen 192.168.0.71:80" /etc/httpd/conf/httpd.conf
systemctl restart httpd
grep

node2

grep -w 80 /etc/httpd/conf/httpd.conf
sed -i  "/Listen[[:blank:]]80/c\ Listen 192.168.0.72:80" /etc/httpd/conf/httpd.conf
systemctl restart httpd
grep

4、配置 haproxy【ALL】

vim /etc/haproxy/haproxy.cfg
追加

#---------------------------------------------------------------------
# listen httpd server
#---------------------------------------------------------------------

5、创建 haproxy 资源

[root@node0 /]# pcs resource create haproxy systemd:haproxy op monitor interval="5s"
[root@node0 /]#
[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 04:57:31 2020
Last change: Mon Jul 20 04:57:27 2020 by root via cibadmin on node0

3 nodes configured
2 resources configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP    (ocf::heartbeat:IPaddr2):       Started node0
 haproxy        (systemd:haproxy):      FAILED node1

Failed Resource Actions:
* haproxy_start_0 on node1 'unknown error' (1): call=37, status=complete, exitreason='',
    last-rc-change='Mon Jul 20 04:57:28 2020', queued=0ms, exec=2276ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@node0 /]#
[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 04:58:39 2020
Last change: Mon Jul 20 04:57:27 2020 by root via cibadmin on node0

3 nodes configured
2 resources configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP    (ocf::heartbeat:IPaddr2):       Started node0
 haproxy        (systemd:haproxy):      Stopped

Failed Resource Actions:
* haproxy_start_0 on node2 'unknown error' (1): call=27, status=complete, exitreason='',
    last-rc-change='Mon Jul 20 04:57:33 2020', queued=0ms, exec=2242ms
* haproxy_start_0 on node0 'unknown error' (1): call=51, status=complete, exitreason='',
    last-rc-change='Mon Jul 20 04:57:37 2020', queued=0ms, exec=2252ms
* haproxy_start_0 on node1 'unknown error' (1): call=37, status=complete, exitreason='',
    last-rc-change='Mon Jul 20 04:57:28 2020', queued=0ms, exec=2276ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@node0 /]#

资源已创建、启动，但是有报错，这是因为在其他节点的haproxy配置中监控的虚拟IP并没有落在这些节点上

清除集群报错

[root@node0 ~]# pcs resource cleanup
Cleaned up all resources on all nodes
[root@node0 ~]#

重启 haproxy资源

[root@node0 ~]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 05:46:40 2020
Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0

3 nodes configured
2 resources configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP  (ocf::heartbeat:IPaddr2):  Started node0
 haproxy  (systemd:haproxy):  Started node0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

停止node0 ，模拟node0故障

[root@node0 ~]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 05:46:40 2020
Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0

3 nodes configured
2 resources configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP  (ocf::heartbeat:IPaddr2):  Started node0
 haproxy  (systemd:haproxy):  Started node0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled



[root@node1 ~]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 05:47:28 2020
Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0

3 nodes configured
2 resources configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP  (ocf::heartbeat:IPaddr2):  Started node0
 haproxy  (systemd:haproxy):  Stopping node0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@node1 ~]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node2 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 05:47:35 2020
Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0

3 nodes configured
2 resources configured

Online: [ node1 node2 ]
OFFLINE: [ node0 ]

Full list of resources:

 VIP  (ocf::heartbeat:IPaddr2):  Started node1
 haproxy  (systemd:haproxy):  Started node1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

访问web
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oSzEn9yZ-1595782926490)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200727010018506.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sBOQDE9m-1595782926500)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200727010030607.png)]

集群相关操作

检查集群状态

# pcs status
# pcs config

# pcs cluster status
# pcs quorum status
# pcs resource show
# crm_verify -L -V

# crm_mon

销毁集群

# pcs cluster destroy <cluster_name>

启停集群

# pcs cluster start --all
# pcs cluster stop --all

启停节点

# pcs cluster start <node>
# pcs cluster stop <node>

强制停止节点上的群集服务

# pcs cluster kill

将节点置于待机状态

# pcs cluster standby <node1>

将节点从待机状态取消

# pcs cluster unstandby <node1>

设置群集属性

# pcs property set <property>=<value>

禁用 fencing

# pcs property set stonith-enabled=false

详细操作

[show [] | --full | --groups | --hide-inactive]
Show all currently configured resources or if a resource is specified
show the options for the configured resource. If --full is specified,
all configured resource options will be displayed. If --groups is
specified, only show groups (and their resources). If --hide-inactive
is specified, only show active resources.

[root@node0 ~]# pcs resource show
 VIP   (ocf::heartbeat:IPaddr2):  Started node0
 haproxy (systemd:haproxy):      Started node0

列出所有可以创建的资源，并从中过滤

[root@node1 ~]#  pcs resource list |grep Ipadd -i
ocf:heartbeat:IPaddr - Manages virtual IPv4 and IPv6 addresses (Linux specific
ocf:heartbeat:IPaddr2 - Manages virtual IPv4 and IPv6 addresses (Linux specific

描述具体某个资源

[root@node1 ~]# pcs resource describe IPaddr2
Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')
ocf:heartbeat:IPaddr2 - Manages virtual IPv4 and IPv6 addresses (Linux specific version)

This Linux-specific resource manages IP alias IP addresses.
It can add an IP alias, or remove one.
In addition, it can implement Cluster Alias IP functionality
if invoked as a clone resource.

If used as a clone, you should explicitly set clone-node-max >= 2,
and/or clone-max < number of nodes. In case of node failure,
clone instances need to be re-allocated on surviving nodes.
This would not be possible if there is already an instance on those nodes,
and clone-node-max=1 (which is the default).

Resource options:
ip (required) (unique): The IPv4 (dotted quad notation) or IPv6 address (colon hexadecimal notation) example IPv4 "192.168.1.1". example IPv6
"2001:db8:DC28:0:0:FC57:D4C8:1FFF".
nic: The base network interface on which the IP address will be brought online. If left empty, the script will try and determine this from the
routing table. Do NOT specify an alias interface in the form eth0:1 or anything here; rather, specify the base interface only. If you want a
label, see the iflabel parameter. Prerequisite: There must be at least one static IP address, which is not managed by the cluster, assigned
to the network interface. If you can not assign any static IP address on the interface, modify this kernel parameter: sysctl -w
net.ipv4.conf.all.promote_secondaries=1 # (or per device)
cidr_netmask: The netmask for the interface in CIDR format (e.g., 24 and not 255.255.255.0) If unspecified, the script will also try to determine
this from the routing table.
broadcast: Broadcast address associated with the IP. It is possible to use the special symbols '+' and '-' instead of the broadcast address. In
this case, the broadcast address is derived by setting/resetting the host bits of the interface prefix.
iflabel: You can specify an additional label for your IP address here. This label is appended to your interface name. The kernel allows
alphanumeric labels up to a maximum length of 15 characters including the interface name and colon (e.g. eth0:foobar1234) A label can be
specified in nic parameter but it is deprecated. If a label is specified in nic name, this parameter has no effect.
lvs_support: Enable support for LVS Direct Routing configurations. In case a IP address is stopped, only move it to the loopback device to allow
the local node to continue to service requests, but no longer advertise it on the network. Notes for IPv6: It is not necessary to
enable this option on IPv6. Instead, enable 'lvs_ipv6_addrlabel' option for LVS-DR usage on IPv6.
lvs_ipv6_addrlabel: Enable adding IPv6 address label so IPv6 traffic originating from the address's interface does not use this address as the
source. This is necessary for LVS-DR health checks to realservers to work. Without it, the most recently added IPv6 address
(probably the address added by IPaddr2) will be used as the source address for IPv6 traffic from that interface and since that
address exists on loopback on the realservers, the realserver response to pings/connections will never leave its loopback. See
RFC3484 for the detail of the source address selection. See also 'lvs_ipv6_addrlabel_value' parameter.
lvs_ipv6_addrlabel_value: Specify IPv6 address label value used when 'lvs_ipv6_addrlabel' is enabled. The value should be an unused label in the
policy table which is shown by 'ip addrlabel list' command. You would rarely need to change this parameter.
mac: Set the interface MAC address explicitly. Currently only used in case of the Cluster IP Alias. Leave empty to chose automatically.
clusterip_hash: Specify the hashing algorithm used for the Cluster IP functionality.
unique_clone_address: If true, add the clone ID to the supplied value of IP to create a unique address to manage
arp_interval: Specify the interval between unsolicited ARP packets in milliseconds. This parameter is deprecated and used for the backward
compatibility only. It is effective only for the send_arp binary which is built with libnet, and send_ua for IPv6. It has no effect
for other arp_sender.
arp_count: Number of unsolicited ARP packets to send at resource initialization.
arp_count_refresh: Number of unsolicited ARP packets to send during resource monitoring. Doing so helps mitigate issues of stuck ARP caches
resulting from split-brain situations.
arp_bg: Whether or not to send the ARP packets in the background.
arp_sender: The program to send ARP packets with on start. Available options are: - send_arp: default - ipoibarping: default for infiniband
interfaces if ipoibarping is available - iputils_arping: use arping in iputils package - libnet_arping: use another variant of arping
based on libnet
send_arp_opts: Extra options to pass to the arp_sender program. Available options are vary depending on which arp_sender is used. A typical use
case is specifying '-A' for iputils_arping to use ARP REPLY instead of ARP REQUEST as Gratuitous ARPs.
flush_routes: Flush the routing table on stop. This is for applications which use the cluster IP address and which run on the same physical host
that the IP address lives on. The Linux kernel may force that application to take a shortcut to the local loopback interface,
instead of the interface the address is really bound to. Under those circumstances, an application may, somewhat unexpectedly,
continue to use connections for some time even after the IP address is deconfigured. Set this parameter in order to immediately
disable said shortcut when the IP address goes away.
run_arping: Whether or not to run arping for IPv4 collision detection check.
preferred_lft: For IPv6, set the preferred lifetime of the IP address. This can be used to ensure that the created IP address will not be used as
a source address for routing. Expects a value as specified in section 5.5.4 of RFC 4862.
monitor_retries: Set number of retries to find interface in monitor-action. ONLY INCREASE IF THE AGENT HAS ISSUES FINDING YOUR NIC DURING THE
MONITOR-ACTION. A HIGHER SETTING MAY LEAD TO DELAYS IN DETECTING A FAILURE.

Default operations:
start: interval=0s timeout=20s
stop: interval=0s timeout=20s
monitor: interval=10s timeout=20s

参考：
https://www.freenetst.it/tech/rh7cluster/ 《Red Hat Enterprise Linux 7 High Availability Add-On Reference》

上一篇：PG参考资料

下一篇：大牛

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯