本文以OpenGauss DB 2.1.0企业版为例,介绍一主二备集群的部署以及日常管理。

一、环境准备

1.1 环境信息

部署OpenGauss DB一主二备集群_linux

1.2 安装软件包

所有节点执行yum命令,安装相关软件包,尤其是python3。

[root@gsdb01 ~]# yum -y install libaio-devel flex bison ncurses-devel glibc-devel patch redhat-lsb-core readline-devel python3

1.3 创建用户以及组

所有节点执行以下命令创建omm用户以及dbgrp组。

[root@gsdb01 ~]# groupadd dbgrp
[root@gsdb01 ~]# useradd -g dbgrp -m omm
[root@gsdb01 ~]# echo redhat|passwd --stdin omm

1.4 设置用户SSH互信

设置所有节点的root用户以及omm用户的ssh互信,忽略。

1.5 禁用Selinux以及防火墙

所有节点都要操作。

[root@gsdb01 ~]# systemctl disable firewalld
[root@gsdb01 ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

以上准备工作完成后,重启所有节点后生效。

二、安装OpenGauss

这里使用gsdb01作为安装节点,如果没有特殊说明,只在此节点上进行所有操作。

2.1 创建软件包存放目录

[root@gsdb01 ~]# mkdir -p /opt/software/openGauss
[root@gsdb01 ~]# chmod 755 -R /opt/software

2.2 创建XML配置文件

[root@gsdb01 ~]# vi /opt/software/openGauss/cluster_config.xml 
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<!-- openGauss整体信息 -->
<CLUSTER>
<!-- 数据库名称 -->
<PARAM name="clusterName" value="Cluster01" />
<!-- 数据库节点名称(hostname) -->
<PARAM name="nodeNames" value="gsdb01,gsdb02,gsdb03" />
<!-- 数据库安装目录-->
<PARAM name="gaussdbAppPath" value="/opt/huawei/install/app" />
<!-- 日志目录-->
<PARAM name="gaussdbLogPath" value="/var/log/omm" />
<!-- 临时文件目录-->
<PARAM name="tmpMppdbPath" value="/opt/huawei/tmp"/>
<!-- 数据库工具目录-->
<PARAM name="gaussdbToolPath" value="/opt/huawei/install/om" />
<!-- 数据库core文件目录-->
<PARAM name="corePath" value="/opt/huawei/corefile"/>
<!-- 节点IP,与数据库节点名称列表一一对应 -->
<PARAM name="backIp1s" value="192.168.190.11,192.168.190.12,192.168.190.13"/>
</CLUSTER>
<!-- 每台服务器上的节点部署信息 -->
<DEVICELIST>
<!-- 节点1上的部署信息 -->
<DEVICE sn="gsdb01">
<!-- 节点1的主机名称 -->
<PARAM name="name" value="gsdb01"/>
<!-- 节点1所在的AZ及AZ优先级 -->
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<!-- 节点1的IP,如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
<PARAM name="backIp1" value="192.168.190.11"/>
<PARAM name="sshIp1" value="172.16.255.11"/>

<!--dn-->
<PARAM name="dataNum" value="1"/>
<PARAM name="dataPortBase" value="26000"/>
<PARAM name="dataNode1" value="/opt/huawei/install/data/dn,gsdb02,/opt/huawei/install/data/dn,gsdb03,/opt/huawei/install/data/dn"/>
<PARAM name="dataNode1_syncNum" value="0"/>
</DEVICE>

<!-- 节点2上的部署信息 -->
<DEVICE sn="gsdb02">
<!-- 节点2的主机名称 -->
<PARAM name="name" value="gsdb02"/>
<!-- 节点2所在的AZ及AZ优先级 -->
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<!-- 节点2的IP,如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
<PARAM name="backIp1" value="192.168.190.12"/>
<PARAM name="sshIp1" value="172.16.255.12"/>
</DEVICE>

<!-- 节点3上的部署信息 -->
<DEVICE sn="gsdb03">
<!-- 节点3的主机名称 -->
<PARAM name="name" value="gsdb03"/>
<!-- 节点3所在的AZ及AZ优先级 -->
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<!-- 节点3的IP,如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
<PARAM name="backIp1" value="192.168.190.13"/>
<PARAM name="sshIp1" value="172.16.255.13"/>
</DEVICE>
</DEVICELIST>
</ROOT>

2.3 初始化安装环境

以root用户执行gs_preinstall命令,如下:

[root@gsdb01 ~]# cd /opt/software/openGauss/script
[root@gsdb01 script]# ./gs_preinstall -U omm -G dbgrp -X /opt/software/openGauss/cluster_config.xml --non-interactive
Parsing the configuration file.
Successfully parsed the configuration file.
Installing the tools on the local node.
Successfully installed the tools on the local node.
Setting pssh path
Successfully set core path.
Distributing package.
Begin to distribute package to tool path.
Successfully distribute package to tool path.
Begin to distribute package to package path.
Successfully distribute package to package path.
Successfully distributed package.
Preparing SSH service.
Successfully prepared SSH service.
Installing the tools in the cluster.
Successfully installed the tools in the cluster.
Checking hostname mapping.
Successfully checked hostname mapping.
Checking OS software.
Successfully check os software.
Checking OS version.
Successfully checked OS version.
Creating cluster's path.
Successfully created cluster's path.
Set and check OS parameter.
Setting OS parameters.
Successfully set OS parameters.
Warning: Installation environment contains some warning messages.
Please get more details by "/opt/software/openGauss/script/gs_checkos -i A -h gsdb01,gsdb02,gsdb03 --detail".
Set and check OS parameter completed.
Preparing CRON service.
Successfully prepared CRON service.
Setting user environmental variables.
Successfully set user environmental variables.
Setting the dynamic link library.
Successfully set the dynamic link library.
Setting Core file
Successfully set core path.
Setting pssh path
Successfully set pssh path.
Setting Cgroup.
Successfully set Cgroup.
Set ARM Optimization.
No need to set ARM Optimization.
Fixing server package owner.
Setting finish flag.
Successfully set finish flag.
Preinstallation succeeded.

2.4 执行安装

初始化环境完成后,需要切换到omm用户执行安装。

[root@gsdb01 script]# su - omm
[omm@gsdb01 ~]$ gs_install -X /opt/software/openGauss/cluster_config.xml
Parsing the configuration file.
Check preinstall on every node.
Successfully checked preinstall on every node.
Creating the backup directory.
Successfully created the backup directory.
begin deploy..
Installing the cluster.
begin prepare Install Cluster..
Checking the installation environment on all nodes.
begin install Cluster..
Installing applications on all nodes.
Successfully installed APP.
begin init Instance..
encrypt cipher and rand files for database.
Please enter password for database:
Please repeat for database:
begin to create CA cert files
The sslcert will be generated in /opt/huawei/install/app/share/sslcert/om
Cluster installation is completed.
Configuring.
Deleting instances from all nodes.
Successfully deleted instances from all nodes.
Checking node configuration on all nodes.
Initializing instances on all nodes.
Updating instance configuration on all nodes.
Check consistence of memCheck and coresCheck on database nodes.
Successful check consistence of memCheck and coresCheck on all nodes.
Configuring pg_hba on all nodes.
Configuration is completed.
Successfully started cluster.
Successfully installed application.
end deploy..
--如果要选择区域,比如en_US.utf8,初始化数据库时加入–locale=en_US.utf8选项进行安装。
[omm@gsdb01 ~]$ gs_install -X /opt/software/openGauss/cluster_config.xml --gsinit-parameter="--locale=en_US.utf8"

2.5 初始化数据库

以omm用户登录并创建mydb数据库,如下:

[omm@gsdb01 ~]$ gsql -d postgres -p 26000
gsql ((openGauss 2.1.0 build 590b0f8e) compiled at 2021-09-30 14:29:04 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

openGauss=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+-------+-----------+---------+-------+-------------------
postgres | omm | SQL_ASCII | C | C |
template0 | omm | SQL_ASCII | C | C | =c/omm +
| | | | | omm=CTc/omm
template1 | omm | SQL_ASCII | C | C | =c/omm +
| | | | | omm=CTc/omm
(3 rows)
openGauss=# create database mydb with encoding 'UTF8' template=template0;
CREATE DATABASE
openGauss=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+-------+-----------+---------+-------+-------------------
mydb | omm | UTF8 | C | C |
postgres | omm | SQL_ASCII | C | C |
template0 | omm | SQL_ASCII | C | C | =c/omm +
| | | | | omm=CTc/omm
template1 | omm | SQL_ASCII | C | C | =c/omm +
| | | | | omm=CTc/omm
(4 rows)

2.6 安装后环境验证

以omm用户执行gs_om命令查看集群的状态,如下:

[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]

cluster_state : Normal
redistributing : No
current_az : AZ_ALL

[ Datanode State ]

node node_ip port instance state
---------------------------------------------------------------------------------------------
1 gsdb01 192.168.190.11 26000 6001 /opt/huawei/install/data/dn P Primary Normal
2 gsdb02 192.168.190.12 26000 6002 /opt/huawei/install/data/dn S Standby Normal
3 gsdb03 192.168.190.13 26000 6003 /opt/huawei/install/data/dn S Standby Normal

如上,gsdb01为主库,而gsdb02、gsdb03为备库,并且它们的状态为Normal。

三、GaussDB日常管理维护

3.1 启停OpenGauss

集群的启停需要在主节点上,以omm操作系统用户进行操作,如下:


[omm@gsdb01 ~]$ gs_om -t stop
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.
[omm@gsdb01 ~]$ gs_om -t start
Starting cluster.
=========================================
[SUCCESS] gsdb01
[SUCCESS] gsdb02
[SUCCESS] gsdb03
=========================================
Successfully started.

3.2 集群状态查询

OpenGauss支持查看整个openGauss的状态,通过查询结果确认openGauss或者单个主机的运行状态是否正常。

[omm@gsdb01 dn]$ gs_om -t status --detail
[ Cluster State ]

cluster_state : Normal
redistributing : No
current_az : AZ_ALL

[ Datanode State ]

node node_ip port instance state
---------------------------------------------------------------------------------------------
1 gsdb01 192.168.190.11 26000 6001 /opt/huawei/install/data/dn P Primary Normal
2 gsdb02 192.168.190.12 26000 6002 /opt/huawei/install/data/dn S Standby Normal
3 gsdb03 192.168.190.13 26000 6003 /opt/huawei/install/data/dn S Standby Normal
--查看单个主机状态
[omm@gsdb01 dn]$ gs_om -t status -h gsdb01
-----------------------------------------------------------------------

cluster_state : Normal
redistributing : No

-----------------------------------------------------------------------

node : 1
node_name : gsdb01
instance_id : 6001
node_ip : 192.168.190.11
data_path : /opt/huawei/install/data/dn
instance_port : 26000
type : Datanode
instance_state : Normal
az_name : AZ1
static_connections : 2
HA_state : Normal
instance_role : Primary

-----------------------------------------------------------------------

[omm@gsdb01 dn]$ gs_om -t status -h gsdb02
-----------------------------------------------------------------------

cluster_state : Normal
redistributing : No

-----------------------------------------------------------------------

node : 2
node_name : gsdb02
instance_id : 6002
node_ip : 192.168.190.12
data_path : /opt/huawei/install/data/dn
instance_port : 26000
type : Datanode
instance_state : Normal
az_name : AZ1
instance_role : Standby
HA_state : Streaming
sender_sent_location : 0/502E958
sender_write_location : 0/502E958
sender_flush_location : 0/502E958
sender_replay_location : 0/502E958
receiver_received_location: 0/502E958
receiver_write_location : 0/502E958
receiver_flush_location : 0/502E958
receiver_replay_location : 0/502E958
sync_percent : 100%
sync_state : Async

-----------------------------------------------------------------------

3.3 实例主备切换

openGauss在运行过程中,数据库管理员可能需要手工对数据库节点做主备切换。例如发现数据库节点主备failover后需要恢复原有的主备角色,或怀疑硬件故障需要手动进行主备切换。级联备机不能直接转换为主机,只能先通过switchover或者failover成为备机,然后再切换为主机。

  • 以操作系统用户omm登录数据库任意节点,执行如下命令,查看主备情况。
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]

cluster_state : Normal
redistributing : No
current_az : AZ_ALL

[ Datanode State ]

node node_ip port instance state
---------------------------------------------------------------------------------------------
1 gsdb01 192.168.190.11 26000 6001 /opt/huawei/install/data/dn P Primary Normal
2 gsdb02 192.168.190.12 26000 6002 /opt/huawei/install/data/dn S Standby Normal
3 gsdb03 192.168.190.13 26000 6003 /opt/huawei/install/data/dn S Standby Normal
  • 以操作系统用户omm登录准备切换为主节点的备节点,执行如下命令。
[omm@gsdb02 ~]$ gs_ctl switchover -D /opt/huawei/install/data/dn
[2021-11-24 17:14:39.561][47754][][gs_ctl]: gs_ctl switchover ,datadir is /opt/huawei/install/data/dn
[2021-11-24 17:14:39.561][47754][][gs_ctl]: switchover term (1)
[2021-11-24 17:14:39.570][47754][][gs_ctl]: waiting for server to switchover.........
[2021-11-24 17:14:45.666][47754][][gs_ctl]: done
[2021-11-24 17:14:45.666][47754][][gs_ctl]: switchover completed (/opt/huawei/install/data/dn)

部署OpenGauss DB一主二备集群_linux_02

如果主机故障时,可以在备机上执行下面的命令进行failover。

  • 以omm用户在备节点执行switchover

[omm@gsdb02 ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.
  • 验证切换后状态查询
[omm@gsdb02 ~]$ gs_om -t status --detail
[ Cluster State ]

cluster_state : Normal
redistributing : No
current_az : AZ_ALL

[ Datanode State ]

node node_ip port instance state
---------------------------------------------------------------------------------------------
1 gsdb01 192.168.190.11 26000 6001 /opt/huawei/install/data/dn P Standby Normal
2 gsdb02 192.168.190.12 26000 6002 /opt/huawei/install/data/dn S Primary Normal
3 gsdb03 192.168.190.13 26000 6003 /opt/huawei/install/data/dn S Standby Normal

3.4 重建备库实例

由于本人的误操作,导致集群重启后,standby节点的状态变为"Standby Need repair(WAL)",如下:

[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]

cluster_state : Degraded
redistributing : No
current_az : AZ_ALL

[ Datanode State ]

node node_ip port instance state
---------------------------------------------------------------------------------------------
1 gsdb01 192.168.190.11 26000 6001 /opt/huawei/install/data/dn P Standby Need repair(WAL)
2 gsdb02 192.168.190.12 26000 6002 /opt/huawei/install/data/dn S Primary Normal
3 gsdb03 192.168.190.13 26000 6003 /opt/huawei/install/data/dn S Standby Need repair(WAL)

此时,就需要重建备库实例进行恢复,具体操作如下:

[omm@gsdb01 ~]$ gs_ctl build -b auto -D /opt/huawei/install/data/dn 
......
[2021-11-25 08:58:39.209][21770][dn_6001_6002_6003][gs_ctl]: done
[2021-11-25 08:58:39.209][21770][dn_6001_6002_6003][gs_ctl]: server started (/opt/huawei/install/data/dn)
[2021-11-25 08:58:39.209][21770][dn_6001_6002_6003][gs_ctl]: fopen build pid file "/opt/huawei/install/data/dn/gs_build.pid" success
[2021-11-25 08:58:39.209][21770][dn_6001_6002_6003][gs_ctl]: fprintf build pid file "/opt/huawei/install/data/dn/gs_build.pid" success
[2021-11-25 08:58:39.212][21770][dn_6001_6002_6003][gs_ctl]: fsync build pid file "/opt/huawei/install/data/dn/gs_build.pid" success
[omm@gsdb03 ~]$ gs_ctl build -b auto -D /opt/huawei/install/data/dn
......
[2021-11-25 09:01:09.495][18550][dn_6001_6002_6003][gs_ctl]: done
[2021-11-25 09:01:09.495][18550][dn_6001_6002_6003][gs_ctl]: server started (/opt/huawei/install/data/dn)
[2021-11-25 09:01:09.495][18550][dn_6001_6002_6003][gs_ctl]: fopen build pid file "/opt/huawei/install/data/dn/gs_build.pid" success
[2021-11-25 09:01:09.495][18550][dn_6001_6002_6003][gs_ctl]: fprintf build pid file "/opt/huawei/install/data/dn/gs_build.pid" success
[2021-11-25 09:01:09.497][18550][dn_6001_6002_6003][gs_ctl]: fsync build pid file "/opt/huawei/install/data/dn/gs_build.pid" success

重建完成后,再次查询集群状态:

[omm@gsdb03 ~]$ gs_om -t status --detail
[ Cluster State ]

cluster_state : Normal
redistributing : No
current_az : AZ_ALL

[ Datanode State ]

node node_ip port instance state
---------------------------------------------------------------------------------------------
1 gsdb01 192.168.190.11 26000 6001 /opt/huawei/install/data/dn P Standby Normal
2 gsdb02 192.168.190.12 26000 6002 /opt/huawei/install/data/dn S Primary Normal
3 gsdb03 192.168.190.13 26000 6003 /opt/huawei/install/data/dn S Standby Normal