Pcs安装

原创

wx62e28ac222a20 2022-08-04 10:02:29 ©著作权

文章标签 ssh 运维 centos mysql 配置文件 文章分类 运维

©著作权归作者所有：来自51CTO博客作者wx62e28ac222a20的原创作品，请联系作者获取转载授权，否则将追究法律责任

一、操作系统基础环境配置

1.1 配置本地 yum 仓库（每个节点都需要执行）

[root@pacemaker-node1 ~]# mount /mnt/rhel-server-7.4-x86_64-dvd.iso /mnt/yum
[root@pacemaker-node1 ~]# cat /etc/yum.repos.d/yum.repo 
    [base]
    name=base
    enabled=1
    gpgcheck=0
    baseurl=file:///mnt/yum

    [HA]
    name=HA
    enabled=1
    gpgcheck=0
    baseurl=file:///mnt/yum/addons/HighAvailability
[root@pacemaker-node1 ~]# yum repolist 
    Loaded plugins: langpacks
    repo id                                     repo name                                status
    HA                                          HA                                          35
    base                                        base                                     4,986
    repolist: 5,021

1.2 配置主机间解析（每个节点都需要执行）

[root@pacemaker-node1 ~]# cat /etc/hosts
192.168.0.100    pacemaker-node1
192.168.0.101    pacemaker-node2

1.3 配置节点间互信（每个节点都需执行）

# ssh-keygen -t dsa -f ~/.ssh/id_dsa -N ""
# ssh-copy-id node1
# ssh-copy-id node2

二、安装并配置 drbd 环境

2.1 准备磁盘及 drbd 软件（每个节点都需具备物理环境）

[root@pacemaker-node1 ~]# lsblk
[root@pacemaker-node1 ~]# lsblk
NAME          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda             8:0    0 136.8G  0 disk 
├─sda1          8:1    0     1G  0 part /boot
├─sda2          8:2    0   118G  0 part 
│ ├─rhel-root 253:0    0    50G  0 lvm  /
│ ├─rhel-swap 253:1    0     8G  0 lvm  [SWAP]
│ └─rhel-home 253:2    0    60G  0 lvm  /home
└─sda3          8:3    0     5G  0 part 
  └─drbd0     147:0    0     5G  0 disk

2.2 安装并配置 drbd 基础环境（每个节点都需执行）

[root@pacemaker-node1 ~]# ls /opt/
    drbd84-utils-8.9.1-1.el7.elrepo.x86_64.rpm  kmod-drbd84-8.4.10-1_2.el7_4.elrepo.x86_64.rpm                            
[root@pacemaker-node1 ~]# rpm -ivh /opt/drbd84-utils-8.9.1-1.el7.elrepo.x86_64.rpm /opt/kmod-drbd84-8.4.10-1_2.el7_4.elrepo.x86_64.rpm 
[root@pacemaker-node1 ~]# cp /etc/drbd.d/global_common.conf /etc/drbd.d/global_common.conf.bak
[root@pacemaker-node1 ~]# vi /etc/drbd.d/global_common.conf
[root@pacemaker-node1 ~]# cat /etc/drbd.d/global_common.conf
global {
  usage-count yes;
}
common {
  protocol C;
}
resource mysql {
  on pacemaker-node1 {
    device    /dev/drbd0;
    disk      /dev/sda3;
    address   192.168.0.100:7789;
    meta-disk  internal;
  }
  on pacemaker-node2 {
    device    /dev/drbd0;
    disk      /dev/sda3;
    address   192.168.0.101:7789;
    meta-disk internal;
  }
}

2.3 格式化 drbd 磁盘，并启动服务（每个节点都需执行）

[root@pacemaker-node1 ~]# dd if=/dev/zero bs=1M count=1 of=/dev/sdb; sync        (此举仅是为了避免磁盘不干净而导致报错)
[root@pacemaker-node1 ~]# drbdadm create-md mysql
[root@pacemaker-node1 ~]# systemctl enable drbd；systemctl start drbd

2.4 在某一节点确认 drbd 的正确状态

[root@pacemaker-node1 ~]# cat /proc/drbd
    version: 8.4.10-1 (api:1/proto:86-101)
    GIT-hash: a4d5de01fffd7e4cde48a080e2c686f9e8cebf4c build by mockbuild@, 2017-09-15 14:23:22
    0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:20970844

2.5 设置 node1 节点为主节点，后台进程会开始进行数据同步，需要等待数据同步完成

[root@pacemaker-node1 ~]# drbdsetup /dev/drbd0 primary --o            （设置本机为主节点）
[root@pacemaker-node1 ~]# drbd-overview 
    0:mysql/0  SyncSource Primary/Secondary UpToDate/Inconsistent 
           [>....................] sync'ed:  2.4% (19996/20476)M

[root@pacemaker-node1 ~]# drbd-overview                        （数据同步完成）
    0:mysql/0  Connected Primary/Secondary UpToDate/UpToDate

2.6 在 drbd 的主节点格式化磁盘文件系统，并进行挂载测试

[root@pacemaker-node1 ~]# mkfs.xfs /dev/drbd0
[root@pacemaker-node1 ~]# mkdir /mysql            (两个节点都需要执行)
[root@pacemaker-node1 ~]# mount /dev/drbd0 /mysql/

三、安装并配置 Mariadb 数据库

3.1 安装并配置 mariadb 数据库（两个节点都需要执行）

[root@pacemaker-node1 ~]# yum install mariadb mariadb-server MySQL-python -y

3.2 更改 mariadb 数据库默认目录为drbd 磁盘对应的/mysql目录（两个节点都需要执行）

[root@pacemaker-node1 ~]# vi /etc/my.cnf
[root@pacemaker-node1 ~]# grep -vE "^#|^$" /etc/my.cnf
[mysqld]
datadir=/mysql
socket=/var/lib/mysql/mysql.sock
symbolic-links=0
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
!includedir /etc/my.cnf.d

[root@pacemaker-node1 ~]# chown -R mysql.mysql /mysql/                （两个节点都需要执行目录权限的设定）
[root@pacemaker-node1 ~]# systemctl disable mariadb ； systemctl status mariadb     （两个节点确定 mariadb 不开机启动）

四、安装 pacemaker 集群软件

4.1 安装 pacemaker 集群软件，并创建集群（每个节点都需执行）

[root@pacemaker-node1 ~]# yum install pcs pacemaker fence-agents-all -y

设置集群用户密码（每个节点都需执行且密码相同）

[root@pacemaker-node1 ~]# passwd hacluster        （例如redhat）

启动pcsd服务（每个节点都需执行）

[root@pacemaker-node1 ~]# systemctl start pcsd ；systemctl enable pcsd

认证 pcs（在一个节点执行操作）

[root@pacemaker-node1 ~]# pcs cluster auth pacemaker-node1 pacemaker-node2 -u hacluster -p redhat --force

创建集群，并启动（在一个节点执行操作）

[root@pacemaker-node1 ~]# pcs cluster setup --force --name pmcluster pacemaker-node1 pacemaker-node2            # 客户环境中需要将 MY_pacemaker 替换成要创建的集群名
[root@pacemaker-node1 ~]# pcs cluster start --all
[root@pacemaker-node1 ~]# pcs cluster enable --all

查看集群各种状态，检查节点加入集群的状态

[root@node1 ~]# pcs status 
    Cluster name: pmcluster
    Stack: corosync
    Current DC: node2 (version 1.1.16-12.el7-94ff4df) - partition with quorum
    Last updated: Wed Mar 21 05:03:22 2018
    Last change: Wed Mar 21 02:00:54 2018 by root via cibadmin on node1

    2 nodes configured
    0 resources configured

    Online: [ pacemaker-node1 pacemaker-node2 ]

    No resources
  Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@node1 ~]# corosync-cmapctl | grep members
  runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
  runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.0.100) 
  runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
  runtime.totem.pg.mrp.srp.members.1.status (str) = joined
  runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
  runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.0.101) 
  runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
  runtime.totem.pg.mrp.srp.members.2.status (str) = joined
[root@node1 ~]# pcs status corosync
  Membership information
  ----------------------
  Nodeid      Votes Name
  1          1 node1 (local)
  2          1 node2
[root@node1 ~]# ps -axf |grep pacemaker 
  9487 pts/0    S+     0:00          /_ grep --color=auto pacemaker
  1178 ?        Ss     0:00 /usr/sbin/pacemakerd -f
  1234 ?        Ss     0:00  /_ /usr/libexec/pacemaker/cib
  1235 ?        Ss     0:00  /_ /usr/libexec/pacemaker/stonithd
  1236 ?        Ss     0:00  /_ /usr/libexec/pacemaker/lrmd
  1238 ?        Ss     0:00  /_ /usr/libexec/pacemaker/attrd
  1241 ?        Ss     0:00  /_ /usr/libexec/pacemaker/pengine
  1243 ?        Ss     0:00  /_ /usr/libexec/pacemaker/crmd

4.2 Pacemaker 集群常用参数的设置(@@@@@是否最后设置呢？)

[root@node1 ~]# pcs resource defaults resource-stickiness=100    （设置资源不回切，防止资源切来切去）
[root@node1 ~]# pcs resource defaults migration-threshold=1    （节点失效就迁移资源）

[root@node1 ~]# pcs resource defaults 
 migration-threshold: 1
 resource-stickiness: 100

[root@node1 ~]# pcs property set no-quorum-policy=ignore    （忽略仲裁策略）
[root@node1 ~]# pcs property set stonith-enabled=false        （此处为禁用 fence 的方法，如果需要配置 fence 上面已经列出了配置方法）
[root@node1 ~]# pcs property 
 Cluster Properties:
 cluster-infrastructure: corosync    ----------------------->（rhel 7 集群使用corosync）
 cluster-name: MY_pacemaker
 dc-version: 1.1.16-12.el7-94ff4df
 have-watchdog: false
 no-quorum-policy: ignore
 stonith-enabled: false

[root@node1 ~]# pcs resource op defaults timeout=30s        （设置资源超时时间）
[root@node1 ~]# pcs resource op defaults 
timeout: 30s

[root@node1 ~]# crm_verify --live-check                （检查当前配置是否有错）

五、配置 pacemaker 资源

@@@@@@@@@追加的内容，添加fence 资源
fence 暂时没有配置（此处仅列出了配置方法）

[root@node1 ~]# pcs property set stonith-enabled=true        （此处为禁用 fence 的方法，如果需要配置 fence 上面已经列出了配置方法）

100 fence地址：211
101 fence地址：210
[root@pacemaker-node1 ~]# pcs stonith create node1_fence fence_ipmilan ipaddr="192.168.0.211" passwd="redhat" login="root" action="reboot" pcmk_host_list=pacemaker-node1
[root@pacemaker-node1 ~]# pcs stonith create node2_fence fence_ipmilan ipaddr="192.168.0.210" passwd="redhat" login="root" action="reboot" pcmk_host_list=pacemaker-node2

补充：虚拟 fence 的配置方法
pcs stonith create pace1_fence fence_virt pcmk_host_list=rhcs_pacemaker_1

5.1 配置浮动 IP 资源

[root@node1 ~]# pcs resource create vip ocf:heartbeat:IPaddr2 /
ip=192.168.0.223 cidr_netmask=24 op monitor interval=30s

5.2 配置 drbd 资源
创建 drbd 配置文件 drbd_cfg

[root@node1 ~]# pcs cluster cib drbd_cfg

在配置文件中创建 drbd 资源
```shell
[root@node1 ~]# pcs -f drbd_cfg resource create DRBD /
ocf:linbit:drbd drbd_resource=mysql op monitor interval=60s

在配置文件中创建 clone 资源

[root@node1 ~]# pcs -f drbd_cfg resource master DRBDClone DRBD /
master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 /
notify=true

查看 drbd_cfg 文件中定义的 drbd 资源
[root@node1 ~]# pcs -f drbd_cfg resource show
vip (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: DRBDClone [DRBD]
Stopped: [ node1 node2 ]

推送 drbd_cfg 文件配置到集群中，并验证集群状态
[root@node1 ~]# pcs cluster cib-push drbd_cfg
[root@node1 ~]# pcs status
Cluster name: MY_pacemaker
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Wed Mar 21 06:19:12 2018
Last change: Wed Mar 21 06:18:47 2018 by root via cibadmin on node1

2 nodes configured
3 resources configured

Online: [ node1 node2 ]

Full list of resources:
vip (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: DRBDClone [DRBD]
Masters: [ node1 ]
Slaves: [ node2 ]

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

确保 drbd 模块载入（两个节点都需要配置）
[root@node1 ~]# echo drbd > /etc/modules-load.d/drbd.conf

六、配置文件系统高可用

6.1 创建文件系统资源配置文件 fs_cfg
```shell
[root@node1 ~]# pcs cluster cib fs_cfg

6.2 在配置文件中创建数据库的文件系统资源

[root@node1 ~]# pcs -f fs_cfg resource create dbFS /
ocf:heartbeat:Filesystem device='/dev/drbd0' /
directory='/mysql' fstype='xfs'

6.3 在配置文件中将 DRBDClone 与数据库文件系统联系起来

[root@node1 ~]# pcs -f fs_cfg constraint colocation add /
dbFS with DRBDClone INFINITY with-rsc-role=Master

6.4 在配置文件中设置 DRBDClone 资源优先于文件系统

[root@node1 ~]# pcs -f fs_cfg constraint order promote /
DRBDClone then start dbFS

6.5 推送 fs_cfg 文件系统资源到集群中，查看资源状态

[root@node1 ~]# pcs cluster cib-push  fs_cfg
 CIB updated
[root@node1 ~]# pcs status 
  Cluster name: MY_pacemaker
  Stack: corosync
  Current DC: node2 (version 1.1.16-12.el7-94ff4df) - partition with quorum
  Last updated: Wed Mar 21 06:31:57 2018
  Last change: Wed Mar 21 06:31:52 2018 by root via cibadmin on node1

  2 nodes configured
  4 resources configured

  Online: [ node1 node2 ]

  Full list of resources:

  vip    (ocf::heartbeat:IPaddr2):    Started node1
  Master/Slave Set: DRBDClone [DRBD]
  Masters: [ node1 ]
  Slaves: [ node2 ]
  dbFS    (ocf::heartbeat:Filesystem):    Started node1

  Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

6.6 创建 Mysql 服务的资源

[root@node1 ~]# pcs resource create Mysql systemd:mariadb op /
start timeout=180s op stop timeout=180s op monitor interval=20s /
timeout=60s

6.7 将所有资源添加到资源组中，以此保证整个资源组启动到单一节点，并验证集群中资源状态正常启动

[root@node1 ~]# pcs resource group add Mysql_service vip dbFS Mysql        （添加到自定义资源组Mysql_service内，先后顺序代表了资源的启动顺序）
[root@node1 ~]# pcs status 
  Cluster name: MY_pacemaker
  Stack: corosync
  Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum
  Last updated: Wed Mar 21 08:35:44 2018
  Last change: Wed Mar 21 08:35:40 2018 by root via cibadmin on node1

  2 nodes configured
  5 resources configured

  Online: [ node1 node2 ]

  Full list of resources:  

  Master/Slave Set: DRBDClone [DRBD]
  Masters: [ node1 ]
  Slaves: [ node2 ]

  Resource Group: Mysql_service
  vip    (ocf::heartbeat:IPaddr2):    Started node1
  dbFS    (ocf::heartbeat:Filesystem):    Started node1
  Mysql    (systemd:mariadb):    Started node1

  Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

6.8 验证资源组切换正常

6.8.1 手工在数据库中创建测试数据库

[root@node1 ~]# mysql -uroot
 MariaDB [(none)]> create database helloworld;    （手工创建测试用的数据库，最后进行验证数据一致性）

6.8.2 手动使 node1 节点下线，验证资源组切换到备节点，验证数据一致性
[root@node1 ~]# pcs cluster standby node1    (该命令会临时下线 node1，且包括下线 drbd 的主状态)
[root@node1 ~]# pcs status 
 Cluster name: MY_pacemaker
 Stack: corosync
 Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum
 Last updated: Wed Mar 21 14:02:59 2018
 Last change: Wed Mar 21 14:02:40 2018 by root via cibadmin on node1

 2 nodes configured
 5 resources configured

 Node node1: standby
 Online: [ node2 ]

 Full list of resources:

 Master/Slave Set: DRBDClone [DRBD]
 Masters: [ node2 ]
 Stopped: [ node1 ]
 Resource Group: Mysql_service        -----------------> 整个资源组中的资源都成功的切换到了 node2 节点
 vip    (ocf::heartbeat:IPaddr2):    Started node2
 dbFS    (ocf::heartbeat:Filesystem):    Started node2
 Mysql    (systemd:mariadb):    Started node2

 Daemon Status:
 corosync: active/enabled
 pacemaker: active/enabled
 pcsd: active/enabled

[root@node1 ~]# drbd-overview            （可以看到 drbd 的主状态已经下线）
 0:mysql/0  Unconfigured . .

[root@node2 ~]# drbd-overview            （可以看到 drbd 的主状态已经切换到 node2 节点）
 0:mysql/0  WFConnection Primary/Unknown UpToDate/DUnknown /mysql xfs 20G 62M 20G 1%
[root@node2 ~]# mysql -uroot
 MariaDB [(none)]> show databases;
 +--------------------+
 | Database           |
 +--------------------+
 | information_schema |
 | helloworld         |        -------------->  此处验证了数据的一致性
 | mysql              |
 | performance_schema |
 | test               |
 +--------------------+
 MariaDB [(none)]> create database mytestbase;    --------------->（创建第二个测试数据库，再次切换集群进行验证）

6.8.3 手动使 node1 节点上线，验证集群状态
[root@node1 ~]# pcs cluster unstandby node1
[root@node1 ~]# pcs status 
 Cluster name: MY_pacemaker
 Stack: corosync
 Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum
 Last updated: Wed Mar 21 14:08:47 2018
 Last change: Wed Mar 21 14:08:44 2018 by root via cibadmin on node1

 2 nodes configured
 5 resources configured

 Online: [ node1 node2 ]

 Full list of resources:

 Master/Slave Set: DRBDClone [DRBD]
 Masters: [ node2 ]
 Slaves: [ node1 ]
 Resource Group: Mysql_service
 vip    (ocf::heartbeat:IPaddr2):    Started node2
 dbFS    (ocf::heartbeat:Filesystem):    Started node2
 Mysql    (systemd:mariadb):    Started node2

 Daemon Status:
 corosync: active/enabled
 pacemaker: active/enabled
 pcsd: active/enabled

6.8.4 手动使 node2 节点下线，验证业务切换到 node1 节点，并验证 drbd 状态及数据一致性

[root@node2 ~]# pcs cluster standby node2
[root@node2 ~]# pcs status 
 Cluster name: MY_pacemaker
 Stack: corosync
 Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum
 Last updated: Wed Mar 21 14:12:46 2018
 Last change: Wed Mar 21 14:12:29 2018 by root via cibadmin on node2

 2 nodes configured
 5 resources configured

 Node node2: standby
 Online: [ node1 ]

 Full list of resources:

 Master/Slave Set: DRBDClone [DRBD]
 Masters: [ node1 ]
 Stopped: [ node2 ]
 Resource Group: Mysql_service
 vip    (ocf::heartbeat:IPaddr2):    Started node1
 dbFS    (ocf::heartbeat:Filesystem):    Started node1
 Mysql    (systemd:mariadb):    Started node1

 Daemon Status:
 corosync: active/enabled
 pacemaker: active/enabled
 pcsd: active/enabled

[root@node1 ~]# drbd-overview 
 0:mysql/0  WFConnection Primary/Unknown UpToDate/DUnknown /mysql xfs 20G 62M 20G 1% 
[root@node1 ~]# mysql -uroot
 MariaDB [(none)]> show databases;
 +--------------------+
 | Database           |
 +--------------------+
 | information_schema |
 | helloworld         |
 | mysql              |
 | mytestbase         |    -----------------------> (此处为第二次切换前，创建的数据库，数据一致)
 | performance_schema |
 | test               |
 +--------------------+
 6 rows in set (0.13 sec)

6.8.5 切换测试完成，最后确保集群内节点均处于在线状态
[root@node2 ~]# pcs cluster unstandby node2
[root@node2 ~]# pcs status 
 Cluster name: MY_pacemaker
 Stack: corosync
 Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum
 Last updated: Wed Mar 21 14:16:43 2018
 Last change: Wed Mar 21 14:16:39 2018 by root via cibadmin on node2

 2 nodes configured
 5 resources configured

 Online: [ node1 node2 ]

 Full list of resources:

 Master/Slave Set: DRBDClone [DRBD]
 Masters: [ node1 ]
 Slaves: [ node2 ]
 Resource Group: Mysql_service
 vip    (ocf::heartbeat:IPaddr2):    Started node1
 dbFS    (ocf::heartbeat:Filesystem):    Started node1
 Mysql    (systemd:mariadb):    Started node1

 Daemon Status:
 corosync: active/enabled
 pacemaker: active/enabled
 pcsd: active/enabled

============================================================
补充 DRBD 脑裂的修复：
1、为防止在恢复过程中造成数据丢失，建议先对数据进行备份。

2、两个节点都停止集群服务

# pcs cluster stop    （另外一节点一般需要执行 pcs cluster stop --force）

3、在两个节点分别手动启动DRBD服务

# systemctl restart drbd

4、在主节点执行命令，强制提升为DRBD主节点

# drbdadm primary all

5、在备节点执行命令，强制降为DRBD备节点

# drbdadm secondary all

6、在备节点上执行数据同步此操作

# drbdadm --discard-my-data connect all

7、在主节点上执行连接操作

# drbdadm connect all

8、在两个节点上启动集群

# pcs cluster start

==========================================================
补充2: 双机应急方案
当双机状态出现异常时，需要手动重启双机，以便恢复到正常状态。
1、分别在两个节点重启双机

# pcs cluster stop
# pcs cluster start

2、重启服务器后，再启动双机服务
如果出现只重启双机服务后，部分服务不能正常恢复，建议重启服务器后，在启动双机服务：

# reboot
# pcs cluster start

=======================================================

总结：
1、为什么 fence 不自动迁移？
2、

切换整个资源组到其他节点（架构定义drbd clone和资源组一起切换）

[root@pacemaker-node1 ~]# crm_resource --resource Mysql_service --move --node pacemaker-node1 ; crm_resource --resource DRBDClone --move --node pacemaker-node1