说明:
本文主要测试PG-POOL的安装,PG-POOL的功能,failover功能和load balance功能.
PG-POOL的复制和在线恢复功能不做测试( recovery_1st_stage等脚本),目前PG的逻辑备库的流复制可以满足复制、pg_rewind命令可以进行主备切换后的备库重建,想对比PG-POOL复杂的恢复脚本方便了很多(个人愚见,初步接触PG-POOL)。

正常情况下,使用PG_POOL,分配一个VIP。外部应用连接到VIP。当其中一个节点down后,VIP会漂移。对应用来说无影响。目前暂时不测试这个功能。

PG-POOL failover后,新的主备关系会不会自动转换,这个没有详细测试。

参考文档:

https://www.pgpool.net/docs/pgpool-II-3.5.4/doc/pgpool-zh_cn.html https://www.pgpool.net/docs/pgpool-II-3.5.4/doc/tutorial-zh_cn.html#start
https://www.pgpool.net/docs/latest/en/html/tutorial-testing-failover.html
https://www.pgpool.net/docs/latest/en/html/tutorial-testing-load-balance.html

PG版本:10.15 
pg-pool 版本: 4.2.4 

-- 主备的安装(略)

-- 安装PG POOL 

tar -zxvf pgpool-II-4.2.4.tar.gz
./configure --prefix=/opt/pgpool/ --with-pgsql=/opt/PostgreSQL/10/lib

报错configure: error: libpq is not installed or libpq is old
需要安装postgresql-devel,yum install postgresql-devel安装即可

-- 安装pgpool_regclass 

cd /postgres/pgpool-II-4.2.4/src/sql/pgpool-regclass
psql -f pgpool-regclass.sql template1

[root@test /postgres/pgpool-II-4.2.4/src/sql/pgpool-regclass]$psql -f pgpool-regclass.sql template1
CREATE FUNCTION
[root@test /postgres/pgpool-II-4.2.4/src/sql/pgpool-regclass]$

-- 建立insert_lock表 

psql -f insert_lock.sql template1

[root@test /postgres/pgpool-II-4.2.4/src/sql]$psql -f insert_lock.sql template1
psql.bin:insert_lock.sql:3: ERROR:  schema "pgpool_catalog" does not exist
CREATE SCHEMA
CREATE TABLE
INSERT 0 1
GRANT
GRANT
GRANT
GRANT
[root@test /postgres/pgpool-II-4.2.4/src/sql]$

-- 配置两个节点ssh互信

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

ssh test1 cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys test1:~/.ssh/

-- 配置密码,注意文件pool_passwd中的格式

[postgres@test1 /home/postgres]$pg_md5 -u postgres -p 
password: 
a189c633d9995e11bf8607170ec9a4b8
[postgres@test1 /home/postgres]$

-- 配置pgpool.conf,该文件在/opt/pgpool/etc/pgpool.conf.sample下,重新复制一个(大部分参数,其他需要了再调整)

listen_addresses = '*'
port = 9999
socket_dir = '/tmp'
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/tmp'

backend_hostname0 = 'test'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/opt/PostgreSQL/10/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'server0'

backend_hostname1 = 'test1'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/opt/PostgreSQL/10/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'server1'

enable_pool_hba = on
pool_passwd = 'pool_passwd'

pid_file_name = '/opt/pgpool/etc/pgpool.pid'

-- 配置pgpool_hba.conf(略)

-- 创建pgpool_node_id ,这个文件记录节点id 

vi  /opt/pgpool/pgpool.pid     -- 0 
vi  /opt/pgpool/pgpool.pid     -- 1

-- 启动

pgpool -n >/tmp/pgpool.log 2>&1 &

-- 查看pgpool的节点,登陆到pgpool上查看,可以看到节点0和节点1 。

psql -p 9999 -h 192.168.2.80 -Upostgres
postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | up     | 1.000000  | primary | 1          | true              | 0                 |                   |  
                      | 2021-08-26 13:20:05
 1       | test1    | 5432 | up     | 0.000000  | standby | 0          | false             | 0                 |                   |  
                      | 2021-08-26 13:20:05
(2 rows)

postgres=#

-- 关闭掉primary库,stanby库被提升为primary库 

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | down   | 0.500000  | standby | 373069     | false             | 0                 |                   |  
                      | 2021-08-27 15:13:43
 1       | test1    | 5432 | up     | 0.500000  | primary | 46646      | true              | 0                 |                   |  
                      | 2021-08-27 15:13:43
(2 rows)

postgres=#

再次启动关闭的库,原来的primary库,pgpool中并没有自动切换。感觉只能起到一个failover的作用,节点重新上线后,并不能自动添加到pg-pool中。节点再次启动后,发现两个节点都是in production。也就是failover了。需要重建了。 

-- 测试在线recovery (不测试了,使用pg_basebackup修复就很方便,或者pg_rewind也很方便)

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | r
eplication_state | replication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+--
-----------------+------------------------+---------------------
 0       | test     | 5432 | up     | 0.500000  | primary | 0          | true              | 0                 |  
                 |                        | 2021-08-27 07:47:40
 1       | test1    | 5432 | down   | 0.500000  | standby | 0          | false             | 0                 |  
                 |                        | 2021-08-27 07:47:40
(2 rows)

postgres=#

-- 使用命令修复备库(先rm 掉data目录,否则失败),建议使用pg_rewind,这个只是copy变化过的数据块,类似增量
pg_basebackup -h 192.168.2.80 -U repl -p 5432 -F p -X s -v -P -R -D /opt/PostgreSQL/10/data/ -l pg_init

-- 使用pgbench进行测试负载均衡

-- 创建测试用的数据库pgbench

[postgres@test /opt/PostgreSQL/10/data]$psql -h 192.168.2.80 -p9999 -c "create database pgbench"
Password: 
CREATE DATABASE

-- 初始化测试用的数据库pgbench

[postgres@test /opt/PostgreSQL/10/data]$pgbench -i pgbench
Password: 
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
creating tables...
100000 of 100000 tuples (100%) done (elapsed 0.38 s, remaining 0.00 s)
vacuum...
set primary keys...
done.
[postgres@test /opt/PostgreSQL/10/data]$

-- 进行pgbench测试

[postgres@test /opt/PostgreSQL/10/data]$pgbench -p 9999 -c 10 -j 10 -S -T 60 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
duration: 60 s
number of transactions actually processed: 421004
latency average = 1.425 ms
tps = 7016.464624 (including connections establishing)
tps = 7028.458906 (excluding connections establishing)
[postgres@test /opt/PostgreSQL/10/data]$

-- 在测试过程中,观察节点的状态,可以看到备库上有连接 

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | up     | 0.500000  | primary | 0          | true              | 0                 |                   |  
                      | 2021-08-27 12:58:25
 1       | test1    | 5432 | up     | 0.500000  | standby | 0          | false             | 0                 |                   |  
                      | 2021-08-27 12:58:25
(2 rows)

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | up     | 0.500000  | primary | 29256      | true              | 0                 |                   |  
                      | 2021-08-27 12:58:25
 1       | test1    | 5432 | up     | 0.500000  | standby | 3651       | false             | 0                 |                   |  
                      | 2021-08-27 12:58:25
(2 rows)

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | up     | 0.500000  | primary | 42489      | true              | 0                 |                   |  
                      | 2021-08-27 12:58:25
 1       | test1    | 5432 | up     | 0.500000  | standby | 5245       | false             | 0                 |                   |  
                      | 2021-08-27 12:58:25
(2 rows)

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | up     | 0.500000  | primary | 50300      | true              | 0                 |                   |  
                      | 2021-08-27 12:58:25
 1       | test1    | 5432 | up     | 0.500000  | standby | 6381       | false             | 0                 |                   |  
                      | 2021-08-27 12:58:25
(2 rows)

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | up     | 0.500000  | primary | 55895      | true              | 0                 |                   |  
                      | 2021-08-27 12:58:25
 1       | test1    | 5432 | up     | 0.500000  | standby | 7044       | false             | 0                 |                   |  
                      | 2021-08-27 12:58:25
(2 rows)

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | up     | 0.500000  | primary | 81584      | true              | 0                 |                   |  
                      | 2021-08-27 12:58:25
 1       | test1    | 5432 | up     | 0.500000  | standby | 10379      | false             | 0                 |                   |  
                      | 2021-08-27 12:58:25
(2 rows)

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | up     | 0.500000  | primary | 323013     | true              | 0                 |                   |  
                      | 2021-08-27 12:58:25
 1       | test1    | 5432 | up     | 0.500000  | standby | 40473      | false             | 0                 |                   |  
                      | 2021-08-27 12:58:25
(2 rows)

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | up     | 0.500000  | primary | 373069     | true              | 0                 |                   |  
                      | 2021-08-27 12:58:25
 1       | test1    | 5432 | up     | 0.500000  | standby | 46646      | false             | 0                 |                   |  
                      | 2021-08-27 12:58:25
(2 rows)

-- 问题1

[postgres@test /opt/pgpool/etc]$psql -p 9999 -h 192.168.2.80 -Upostgres
psql.bin: FATAL:  md5 authentication failed
DETAIL:  pool_passwd file does not contain an entry for "postgres"

解决方法 :

pg_md5 -m -p -u postgres pool_passwd    -- 直接将密码md5后写入到pool_passwd文件了 
pgpool -F pool_passwd reload

-- 问题2  
停掉primary后 (但是查看primary和standby的pg_controldata是正常的,primary和standby上同步是正常的,问题可能出在pgpool,第二天检查发现是pgpool_status这个文件的缘故.这个文件保存集群的状态,比如两行up,表示两个节点都是up)
 日志 :

find_primary_node: standby node is 1

2021-08-26 14:39:12: pid 8894: LOG:  find_primary_node: standby node is 1
2021-08-26 14:39:13: pid 8894: LOG:  find_primary_node: standby node is 1
2021-08-26 14:39:14: pid 9208: LOG:  PCP process: 9208 started
2021-08-26 14:39:14: pid 9210: LOG:  process started
2021-08-26 14:39:14: pid 9209: LOG:  process started
2021-08-26 14:39:14: pid 8894: LOG:  pgpool-II successfully started. version 4.2.4 (chichiriboshi)
2021-08-26 14:39:14: pid 8894: LOG:  node status[0]: 0
2021-08-26 14:39:14: pid 8894: LOG:  node status[1]: 2
postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | down   | 1.000000  | standby | 0          | false             | 0                 |                   |  
                      | 2021-08-26 14:34:10
 1       | test1    | 5432 | up     | 0.000000  | standby | 0          | true              | 0                 |                   |  
                      | 2021-08-26 14:34:10
(2 rows)

postgres=#

解决方法,将节点加进来 ,要在postgres账号下 ,期间状态一直为waiting,解决方法,将pgpool_status文件删除,重新启动pgpool

[postgres@test /tmp]$pcp_attach_node -h 192.168.2.80 -p 9898 -U postgres -n 0 -v
Password: 
pcp_attach_node -- Command Successful
[postgres@test /tmp]$
postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change  
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
 0       | test     | 5432 | up     | 1.000000  | primary | 0          | true              | 0                 |                   |  
                      | 2021-08-26 15:20:35
 1       | test1    | 5432 | up     | 0.000000  | standby | 0          | false             | 0                 |                   |  
                      | 2021-08-26 15:07:16
(2 rows)

postgres=# 
postgres=# show pool_nodes;
 node_id | hostname | port | status  | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | 
replication_sync_state | last_status_change  
---------+----------+------+---------+-----------+---------+------------+-------------------+-------------------+-------------------+-
-----------------------+---------------------
 0       | test     | 5432 | up      | 0.500000  | primary | 0          | true              | 0                 |                   | 
                       | 2021-08-27 12:44:03
 1       | test1    | 5432 | waiting | 0.500000  | standby | 0          | false             | 0                 |                   | 
                       | 2021-08-27 12:50:25
(2 rows)

postgres=#

其他事项,测试后再补充。

end