基于PGPool的双机集群如下图所示:pg主节点和备节点实现流复制热备,pgpool1,pgpool2作为中间件,将主备pg节点加入集群,实现读写分离,负载均衡和HA故障自动切换。两pgpool节点可以委托一个虚拟ip节点作为应用程序访问的地址,两节点之间通过watchdog进行监控,当pgpool1宕机时,pgpool2会自动接管虚拟ip继续对外提供不间断服务。
1. 主机规划
192.168.20.201 redis01
192.168.20.202 redis02
192.168.20.203 vip
2. 配置主机ssh互信
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub redis02
3. 安装pgpool
wget http://www.pgpool.net/mediawiki/images/pgpool-II-4.1.1.tar.gz
[postgres@redis01 ~]$ tar -zxf pgpool-II-4.1.0.tar.gz
./configure --prefix=/u01/pgpool --with-pgsql=/u01/pgsql
make && make install
4. 配置pgpool
4.1配置环境变量
pgpool装在了postgres账户下,在该账户中添加环境变量,master,slave节点都执行。
[postgres@redis01 ~]$ cat .bash_profile
# .bash_profile
export PGHOME=/u01/pgsql
export PGDATA=/u01/pgsql/data
export PGPOOLHOME=/u01/pgpool
export PGPASSFILE=/u01/pgsql/data/.pgpass
export PATH=$PATH:$HOME/bin:$PGHOME/bin:$PGPOOLHOME/bin
export LD_LIBRARY_PATH=$PGHOME/lib:$PGPOOLHOME/lib
4.2配置pool_hba.conf
pool_hba.conf是对登录用户进行验证的,要和pg的pg_hba.conf保持一致,要么都是trust,要么都是md5验证方式,这里采用了md5验证方式如下设置:
# IPv4 local connections:
host all all 127.0.0.1/32 trust
host postgres postgres 192.168.20.0/24 md5
host postgres rep 192.168.20.0/24 md5
host all all 0.0.0.0/0 md5
# IPv6 local connections:
host all all ::1/128 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all trust
host replication all 127.0.0.1/32 trust
host replication rep 192.168.20.0/24 md5
host replication all ::1/128 trust
4.3配置pcp.conf
pcp.conf配置用于pgpool自己登陆管理使用的,一些操作pgpool的工具会要求提供密码等,配置如下:
pg_md5 posgres
[postgres@redis01 etc]$ cat pcp.conf
# USERID:MD5PASSWD
postgres:e8a48653851e28c69d0506508fb27fc5
#pcp.conf是pgpool管理器自己的用户名和密码,用于管理集群
pg_md5 -p -m -u postgres pool_passwd
#数据库登录用户是postgres,这里输入登录密码,不能出错
#输入密码后,在pgpool/etc目录下会生成一个pool_passwd文件
4.4配置系统命令权限
配置 ifconfig, arping 执行权限 ,执行failover_stream.sh需要用到,可以让其他普通用户执行。
chmod u+s /sbin/ip
chmod u+s /usr/sbin/arping
4.5配置pgpool.conf
后台连接设置
# - Backend Connection Settings -
backend_hostname0 = 'redis01'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/u01/pgsql/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'server1'
backend_hostname1 = 'redis02'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/u01/pgsql/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'server2'
负载均衡
load_balance_mode = on
postgres流复制
master_slave_mode = on
master_slave_sub_mode = 'stream'
健康检查设置
#------------------------------------------------------------------------------
# HEALTH CHECK GLOBAL PARAMETERS
#------------------------------------------------------------------------------
health_check_period = 10
health_check_timeout = 20
health_check_user = 'postgres'
health_check_password = 'postgres'
health_check_database = ''
health_check_max_retries = 0
health_check_retry_delay = 1
connect_timeout = 10000
灾备切换设置
failover_command = '/u01/pgpool/failover_stream.sh %H %h'
# Executes this command at failover
# Special values:
# %d = failed node id
# %h = failed node host name
# %p = failed node port number
# %D = failed node database cluster path
# %m = new master node id
# %H = new master node hostname
# %M = old master node id
# %P = old primary node id
# %r = new master port number
# %R = new master database cluster path
# %N = old primary node hostname
# %S = old primary node port number
# %% = '%' character
开启看门狗
#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------
# - Enabling -
use_watchdog = on
# -Connection to up stream servers -
trusted_servers = '192.168.20.201,192.168.20.202'
ping_path = '/bin'
# - Watchdog communication Settings -
wd_hostname = 'redis01'
wd_port = 9000
wd_priority = 1
wd_authkey = ''
wd_ipc_socket_dir = '/tmp'
配置vip
# - Virtual IP control Setting -
delegate_IP = '192.168.20.205'
if_cmd_path = '/sbin'
if_up_cmd = ' /sbin/ip addr add $_IP_$/24 dev eth2 label eth2:0'
if_down_cmd = '/sbin/ip addr del $_IP_$/24 dev eth2'
arping_path = '/usr/sbin'
arping_cmd = '/usr/sbin/arping -U $_IP_$ -w 1 -I eth2'
其他pgpool连接设置
# - Other pgpool Connection Settings -
other_pgpool_hostname0 = 'redis02'
other_pgpool_port0 =9999
other_wd_port0 = 9000
4.6配置failover_stream.sh脚本
[postgres@redis01 pgpool]$ cat failover_stream.sh
#! /bin/sh
# Failover command for streaming replication.
# Arguments: $1: new master hostname.
new_master=$1
old_master=$2
RECOVERYCONF=$PGDATA/recovery.conf
touch_command="touch $RECOVERYCONF"
trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA"
delrecovery_command="rm -f $PGDATA/recovery.done && rm -f $PGDATA/recovery.conf"
echo1_cmd="echo -e \" primary_conninfo = 'host=$new_master port=5432 user=rep password=rep passfile=''/u01/pgsql/data/.pgpass''' \" >> $PGDATA/recovery.conf"
echo2_cmd="echo -e \" recovery_target_timeline = 'latest' \" >> $PGDATA/recovery.conf"
echo3_cmd="echo -e \" standby_mode = 'on' \" >> $PGDATA/recovery.conf"
# Prompte standby database.
/usr/bin/ssh -T $new_master $trigger_command
# create recovery.conf
/usr/bin/ssh -T $old_master $delrecovery_command
/usr/bin/ssh -T $old_master $touch_command
/usr/bin/ssh -T $old_master $echo3_cmd
/usr/bin/ssh -T $old_master $echo1_cmd
/usr/bin/ssh -T $old_master $echo2_cmd
exit 0;
检测到postgresql down机以后把standby转成primary模式可读写,在down机的主机生成recovery.conf,down机的postgresql启动起来以后自动转成standby。
4.7启动pgpool
启动pgpool以前先启动postgresql, postgresql已配置好流复制
启动postgressql
pg_ctl start
查看流复制
[postgres@redis01 etc]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'
client_addr | application_name | sync_state
----------------+------------------+------------
192.168.20.202 | walreceiver | sync
启动pgpool
pgpool -n -D > /u01/pgpool/log/pgpool.log 2>&1 &
查看pgpool节点
[postgres@redis01 etc]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | redis01 | 5432 | up | 0.500000 | primary | 0 | true | 0 | | | 2020-02-23 22:04:41
1 | redis02 | 5432 | up | 0.500000 | standby | 0 | false | 0 | | | 2020-02-23 22:04:41
查看vip
[postgres@redis01 ~]$ ifconfig -a
eth1 Link encap:Ethernet HWaddr 08:00:27:E9:92:7B
inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fee9:927b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:11 errors:0 dropped:0 overruns:0 frame:0
TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1921 (1.8 KiB) TX bytes:1968 (1.9 KiB)
eth2 Link encap:Ethernet HWaddr 08:00:27:CF:77:EB
inet addr:192.168.20.201 Bcast:192.168.20.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fecf:77eb/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:32488 errors:0 dropped:0 overruns:0 frame:0
TX packets:31045 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3737971 (3.5 MiB) TX bytes:3848610 (3.6 MiB)
eth2:0 Link encap:Ethernet HWaddr 08:00:27:CF:77:EB
inet addr:192.168.20.205 Bcast:0.0.0.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
5. 演示postgresql down机
[postgres@redis01 ~]$ pg_ctl stop -m fast
waiting for server to shut down..... done
server stopped
redis01 postgressql日志
2020-02-23 22:45:21.402 CST [2577] LOG: received fast shutdown request
2020-02-23 22:45:21.682 CST [2577] LOG: aborting any active transactions
2020-02-23 22:45:21.682 CST [2577] LOG: background worker "logical replication launcher" (PID 2586) exited with exit code 1
2020-02-23 22:45:21.683 CST [2580] LOG: shutting down
2020-02-23 22:45:23.174 CST [2577] LOG: database system is shut down
redis02 postgressql日志
2020-02-23 22:45:23.174 CST [12119] FATAL: could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host "redis01" (192.168.20.201) and accepting
TCP/IP connections on port 5432?
2020-02-23 22:45:28.180 CST [12126] FATAL: could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host "redis01" (192.168.20.201) and accepting
TCP/IP connections on port 5432?
2020-02-23 22:45:31.395 CST [3584] LOG: received promote request
2020-02-23 22:45:31.395 CST [3584] LOG: redo done at 0/26000028
2020-02-23 22:45:31.405 CST [3584] LOG: selected new timeline ID: 12
2020-02-23 22:45:32.389 CST [3584] LOG: archive recovery complete
2020-02-23 22:45:32.402 CST [3582] LOG: database system is ready to accept connections
查看主备
[postgres@redis02 ~]$ pg_controldata|grep cluster
Database cluster state: in production
[postgres@redis02 ~]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'
client_addr | application_name | sync_state
-------------+------------------+------------
(0 rows)
Redis02主机上的postgressql数据库直接从standby转成primary
查看pgpool集群
[postgres@redis01 pgpool]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | redis01 | 5432 | down | 0.500000 | standby | 0 | false | 0 | | | 2020-02-23 22:45:33
1 | redis02 | 5432 | up | 0.500000 | primary | 0 | true | 0 | | | 2020-02-23 22:45:33
(2 rows)
从pgpool 来看也是把Redis02主机上的postgressql数据库直接从standby转成primary
启动down机的主节点;启动起来以后转成standby
[postgres@redis01 pgpool]$ pg_ctl start
waiting for server to start....2020-02-23 22:58:00.891 CST [9600] LOG: listening on IPv4 address "192.168.20.201", port 5432
2020-02-23 22:58:00.897 CST [9600] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-02-23 22:58:00.912 CST [9600] LOG: redirecting log output to logging collector process
2020-02-23 22:58:00.912 CST [9600] HINT: Future log output will appear in directory "log".
done
server started
[postgres@redis02 ~]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'
client_addr | application_name | sync_state
----------------+------------------+------------
192.168.20.201 | walreceiver | sync
(1 row)
down机的节点加入pgpool集群
[postgres@redis01 pgpool]$ pcp_attach_node -d -U postgres -h 192.168.20.205 -p 9898 -n 0
Password:
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="C", len=6
DEBUG: recv: tos="c", len=20
pcp_attach_node -- Command Successful
DEBUG: send: tos="X", len=4
[postgres@redis01 pgpool]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | redis01 | 5432 | up | 0.500000 | standby | 0 | true | 0 | | | 2020-02-23 23:00:11
1 | redis02 | 5432 | up | 0.500000 | primary | 0 | false | 0 | | | 2020-02-23 22:45:33
(2 rows)
到此灾备演练完毕。