基于PGPool的双机集群如下图所示:pg主节点和备节点实现流复制热备,pgpool1,pgpool2作为中间件,将主备pg节点加入集群,实现读写分离,负载均衡和HA故障自动切换。两pgpool节点可以委托一个虚拟ip节点作为应用程序访问的地址,两节点之间通过watchdog进行监控,当pgpool1宕机时,pgpool2会自动接管虚拟ip继续对外提供不间断服务。

1.主机规划

192.168.20.201 redis01

192.168.20.202 redis02

192.168.20.203 vip

2.配置主机ssh互信

ssh-keygen -t rsa

ssh-copy-id -i ~/.ssh/id_rsa.pub redis02

3.安装pgpool

wget http://www.pgpool.net/mediawiki/images/pgpool-II-4.1.1.tar.gz

[postgres@redis01 ~]$ tar -zxf pgpool-II-4.1.0.tar.gz

./configure --prefix=/u01/pgpool --with-pgsql=/u01/pgsql

make && make install

4.配置pgpool

4.1配置环境变量

pgpool装在了postgres账户下,在该账户中添加环境变量,master,slave节点都执行。

[postgres@redis01 ~]$ cat .bash_profile

# .bash_profile

export PGHOME=/u01/pgsql

export PGDATA=/u01/pgsql/data

export PGPOOLHOME=/u01/pgpool

export PGPASSFILE=/u01/pgsql/data/.pgpass

export PATH=$PATH:$HOME/bin:$PGHOME/bin:$PGPOOLHOME/bin

export LD_LIBRARY_PATH=$PGHOME/lib:$PGPOOLHOME/lib

4.2配置pool_hba.conf

pool_hba.conf是对登录用户进行验证的,要和pg的pg_hba.conf保持一致,要么都是trust,要么都是md5验证方式,这里采用了md5验证方式如下设置:

# IPv4 local connections:

host all all 127.0.0.1/32 trust

host postgres postgres 192.168.20.0/24 md5

host postgres rep 192.168.20.0/24 md5

host all all 0.0.0.0/0 md5

# IPv6 local connections:

host all all ::1/128 trust

# Allow replication connections from localhost, by a user with the

# replication privilege.

local replication all trust

host replication all 127.0.0.1/32 trust

host replication rep 192.168.20.0/24 md5

host replication all ::1/128 trust

4.3配置pcp.conf

pcp.conf配置用于pgpool自己登陆管理使用的,一些操作pgpool的工具会要求提供密码等,配置如下:

pg_md5 posgres

[postgres@redis01 etc]$ cat pcp.conf

# USERID:MD5PASSWD

postgres:e8a48653851e28c69d0506508fb27fc5

#pcp.conf是pgpool管理器自己的用户名和密码,用于管理集群

pg_md5 -p -m -u postgres pool_passwd

#数据库登录用户是postgres,这里输入登录密码,不能出错

#输入密码后,在pgpool/etc目录下会生成一个pool_passwd文件

4.4配置系统命令权限

配置 ifconfig, arping 执行权限 ,执行failover_stream.sh需要用到,可以让其他普通用户执行。

chmod u+s /sbin/ip

chmod u+s /usr/sbin/arping

4.5配置pgpool.conf

后台连接设置

# - Backend Connection Settings -

backend_hostname0 = 'redis01'

backend_port0 = 5432

backend_weight0 = 1

backend_data_directory0 = '/u01/pgsql/data'

backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_application_name0 = 'server1'

backend_hostname1 = 'redis02'

backend_port1 = 5432

backend_weight1 = 1

backend_data_directory1 = '/u01/pgsql/data'

backend_flag1 = 'ALLOW_TO_FAILOVER'

backend_application_name1 = 'server2'

负载均衡

load_balance_mode = on

postgres流复制

master_slave_mode = on

master_slave_sub_mode = 'stream'

健康检查设置

#------------------------------------------------------------------------------

# HEALTH CHECK GLOBAL PARAMETERS

#------------------------------------------------------------------------------

health_check_period = 10

health_check_timeout = 20

health_check_user = 'postgres'

health_check_password = 'postgres'

health_check_database = ''

health_check_max_retries = 0

health_check_retry_delay = 1

connect_timeout = 10000

灾备切换设置

failover_command = '/u01/pgpool/failover_stream.sh %H %h'

# Executes this command at failover

# Special values:

# %d = failed node id

# %h = failed node host name

# %p = failed node port number

# %D = failed node database cluster path

# %m = new master node id

# %H = new master node hostname

# %M = old master node id

# %P = old primary node id

# %r = new master port number

# %R = new master database cluster path

# %N = old primary node hostname

# %S = old primary node port number

# %% = '%' character

开启看门狗

#------------------------------------------------------------------------------

# WATCHDOG

#------------------------------------------------------------------------------

# - Enabling -

use_watchdog = on

# -Connection to up stream servers -

trusted_servers = '192.168.20.201,192.168.20.202'

ping_path = '/bin'

# - Watchdog communication Settings -

wd_hostname = 'redis01'

wd_port = 9000

wd_priority = 1

wd_authkey = ''

wd_ipc_socket_dir = '/tmp'

配置vip

# - Virtual IP control Setting -

delegate_IP = '192.168.20.205'

if_cmd_path = '/sbin'

if_up_cmd = ' /sbin/ip addr add $_IP_$/24 dev eth2 label eth2:0'

if_down_cmd = '/sbin/ip addr del $_IP_$/24 dev eth2'

arping_path = '/usr/sbin'

arping_cmd = '/usr/sbin/arping -U $_IP_$ -w 1 -I eth2'

其他pgpool连接设置

# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'redis02'

other_pgpool_port0 =9999

other_wd_port0 = 9000

4.6配置failover_stream.sh脚本

[postgres@redis01 pgpool]$ cat failover_stream.sh

#! /bin/sh

# Failover command for streaming replication.

# Arguments: $1: new master hostname.

new_master=$1

old_master=$2

RECOVERYCONF=$PGDATA/recovery.conf

touch_command="touch $RECOVERYCONF"

trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA"

delrecovery_command="rm -f $PGDATA/recovery.done && rm -f $PGDATA/recovery.conf"

echo1_cmd="echo -e \" primary_conninfo = 'host=$new_master port=5432 user=rep password=rep passfile=''/u01/pgsql/data/.pgpass''' \" >> $PGDATA/recovery.conf"

echo2_cmd="echo -e \" recovery_target_timeline = 'latest' \" >> $PGDATA/recovery.conf"

echo3_cmd="echo -e \" standby_mode = 'on' \" >> $PGDATA/recovery.conf"

# Prompte standby database.

/usr/bin/ssh -T $new_master $trigger_command

# create recovery.conf

/usr/bin/ssh -T $old_master $delrecovery_command

/usr/bin/ssh -T $old_master $touch_command

/usr/bin/ssh -T $old_master $echo3_cmd

/usr/bin/ssh -T $old_master $echo1_cmd

/usr/bin/ssh -T $old_master $echo2_cmd

exit 0;

检测到postgresql down机以后把standby转成primary模式可读写,在down机的主机生成recovery.conf,down机的postgresql启动起来以后自动转成standby。

4.7启动pgpool

启动pgpool以前先启动postgresql, postgresql已配置好流复制

启动postgressql

pg_ctl start

查看流复制

[postgres@redis01 etc]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'

client_addr | application_name | sync_state

----------------+------------------+------------

192.168.20.202 | walreceiver | sync

启动pgpool

pgpool -n -D > /u01/pgpool/log/pgpool.log 2>&1 &

查看pgpool节点

[postgres@redis01 etc]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'

node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change

---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------

0 | redis01 | 5432 | up | 0.500000 | primary | 0 | true | 0 | | | 2020-02-23 22:04:41

1 | redis02 | 5432 | up | 0.500000 | standby | 0 | false | 0 | | | 2020-02-23 22:04:41

查看vip

[postgres@redis01 ~]$ ifconfig -a

eth1 Link encap:Ethernet HWaddr 08:00:27:E9:92:7B

inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0

inet6 addr: fe80::a00:27ff:fee9:927b/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:11 errors:0 dropped:0 overruns:0 frame:0

TX packets:19 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:1921 (1.8 KiB) TX bytes:1968 (1.9 KiB)

eth2 Link encap:Ethernet HWaddr 08:00:27:CF:77:EB

inet addr:192.168.20.201 Bcast:192.168.20.255 Mask:255.255.255.0

inet6 addr: fe80::a00:27ff:fecf:77eb/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:32488 errors:0 dropped:0 overruns:0 frame:0

TX packets:31045 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:3737971 (3.5 MiB) TX bytes:3848610 (3.6 MiB)

eth2:0 Link encap:Ethernet HWaddr 08:00:27:CF:77:EB

inet addr:192.168.20.205 Bcast:0.0.0.0 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

5.演示postgresql down机

[postgres@redis01 ~]$ pg_ctl stop -m fast

waiting for server to shut down..... done

server stopped

redis01 postgressql日志

2020-02-23 22:45:21.402 CST [2577] LOG: received fast shutdown request

2020-02-23 22:45:21.682 CST [2577] LOG: aborting any active transactions

2020-02-23 22:45:21.682 CST [2577] LOG: background worker "logical replication launcher" (PID 2586) exited with exit code 1

2020-02-23 22:45:21.683 CST [2580] LOG: shutting down

2020-02-23 22:45:23.174 CST [2577] LOG: database system is shut down

redis02 postgressql日志

2020-02-23 22:45:23.174 CST [12119] FATAL: could not connect to the primary server: could not connect to server: Connection refused

Is the server running on host "redis01" (192.168.20.201) and accepting

TCP/IP connections on port 5432?

2020-02-23 22:45:28.180 CST [12126] FATAL: could not connect to the primary server: could not connect to server: Connection refused

Is the server running on host "redis01" (192.168.20.201) and accepting

TCP/IP connections on port 5432?

2020-02-23 22:45:31.395 CST [3584] LOG: received promote request

2020-02-23 22:45:31.395 CST [3584] LOG: redo done at 0/26000028

2020-02-23 22:45:31.405 CST [3584] LOG: selected new timeline ID: 12

2020-02-23 22:45:32.389 CST [3584] LOG: archive recovery complete

2020-02-23 22:45:32.402 CST [3582] LOG: database system is ready to accept connections

查看主备

[postgres@redis02 ~]$ pg_controldata|grep cluster

Database cluster state: in production

[postgres@redis02 ~]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'

client_addr | application_name | sync_state

-------------+------------------+------------

(0 rows)

Redis02主机上的postgressql数据库直接从standby转成primary

查看pgpool集群

[postgres@redis01 pgpool]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'

node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change

---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------

0 | redis01 | 5432 | down | 0.500000 | standby | 0 | false | 0 | | | 2020-02-23 22:45:33

1 | redis02 | 5432 | up | 0.500000 | primary | 0 | true | 0 | | | 2020-02-23 22:45:33

(2 rows)

从pgpool 来看也是把Redis02主机上的postgressql数据库直接从standby转成primary

启动down机的主节点;启动起来以后转成standby

[postgres@redis01 pgpool]$ pg_ctl start

waiting for server to start....2020-02-23 22:58:00.891 CST [9600] LOG: listening on IPv4 address "192.168.20.201", port 5432

2020-02-23 22:58:00.897 CST [9600] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"

2020-02-23 22:58:00.912 CST [9600] LOG: redirecting log output to logging collector process

2020-02-23 22:58:00.912 CST [9600] HINT: Future log output will appear in directory "log".

done

server started

[postgres@redis02 ~]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'

client_addr | application_name | sync_state

----------------+------------------+------------

192.168.20.201 | walreceiver | sync

(1 row)

down机的节点加入pgpool集群

[postgres@redis01 pgpool]$ pcp_attach_node -d -U postgres -h 192.168.20.205 -p 9898 -n 0

Password:

DEBUG: recv: tos="m", len=8

DEBUG: recv: tos="r", len=21

DEBUG: send: tos="C", len=6

DEBUG: recv: tos="c", len=20

pcp_attach_node -- Command Successful

DEBUG: send: tos="X", len=4

[postgres@redis01 pgpool]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'

node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change

---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------

0 | redis01 | 5432 | up | 0.500000 | standby | 0 | true | 0 | | | 2020-02-23 23:00:11

1 | redis02 | 5432 | up | 0.500000 | primary | 0 | false | 0 | | | 2020-02-23 22:45:33

(2 rows)

到此灾备演练完毕。