说明:
本文主要测试PG-POOL的安装,PG-POOL的功能,failover功能和load balance功能.
PG-POOL的复制和在线恢复功能不做测试( recovery_1st_stage等脚本),目前PG的逻辑备库的流复制可以满足复制、pg_rewind命令可以进行主备切换后的备库重建,想对比PG-POOL复杂的恢复脚本方便了很多(个人愚见,初步接触PG-POOL)。
正常情况下,使用PG_POOL,分配一个VIP。外部应用连接到VIP。当其中一个节点down后,VIP会漂移。对应用来说无影响。目前暂时不测试这个功能。
PG-POOL failover后,新的主备关系会不会自动转换,这个没有详细测试。
参考文档:
https://www.pgpool.net/docs/pgpool-II-3.5.4/doc/pgpool-zh_cn.html https://www.pgpool.net/docs/pgpool-II-3.5.4/doc/tutorial-zh_cn.html#start
https://www.pgpool.net/docs/latest/en/html/tutorial-testing-failover.html
https://www.pgpool.net/docs/latest/en/html/tutorial-testing-load-balance.html
PG版本:10.15
pg-pool 版本: 4.2.4
-- 主备的安装(略)
-- 安装PG POOL
tar -zxvf pgpool-II-4.2.4.tar.gz
./configure --prefix=/opt/pgpool/ --with-pgsql=/opt/PostgreSQL/10/lib
报错configure: error: libpq is not installed or libpq is old
需要安装postgresql-devel,yum install postgresql-devel安装即可
-- 安装pgpool_regclass
cd /postgres/pgpool-II-4.2.4/src/sql/pgpool-regclass
psql -f pgpool-regclass.sql template1
[root@test /postgres/pgpool-II-4.2.4/src/sql/pgpool-regclass]$psql -f pgpool-regclass.sql template1
CREATE FUNCTION
[root@test /postgres/pgpool-II-4.2.4/src/sql/pgpool-regclass]$
-- 建立insert_lock表
psql -f insert_lock.sql template1
[root@test /postgres/pgpool-II-4.2.4/src/sql]$psql -f insert_lock.sql template1
psql.bin:insert_lock.sql:3: ERROR: schema "pgpool_catalog" does not exist
CREATE SCHEMA
CREATE TABLE
INSERT 0 1
GRANT
GRANT
GRANT
GRANT
[root@test /postgres/pgpool-II-4.2.4/src/sql]$
-- 配置两个节点ssh互信
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
ssh test1 cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys test1:~/.ssh/
-- 配置密码,注意文件pool_passwd中的格式
[postgres@test1 /home/postgres]$pg_md5 -u postgres -p
password:
a189c633d9995e11bf8607170ec9a4b8
[postgres@test1 /home/postgres]$
-- 配置pgpool.conf,该文件在/opt/pgpool/etc/pgpool.conf.sample下,重新复制一个(大部分参数,其他需要了再调整)
listen_addresses = '*'
port = 9999
socket_dir = '/tmp'
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/tmp'
backend_hostname0 = 'test'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/opt/PostgreSQL/10/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'server0'
backend_hostname1 = 'test1'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/opt/PostgreSQL/10/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'server1'
enable_pool_hba = on
pool_passwd = 'pool_passwd'
pid_file_name = '/opt/pgpool/etc/pgpool.pid'
-- 配置pgpool_hba.conf(略)
-- 创建pgpool_node_id ,这个文件记录节点id
vi /opt/pgpool/pgpool.pid -- 0
vi /opt/pgpool/pgpool.pid -- 1
-- 启动
pgpool -n >/tmp/pgpool.log 2>&1 &
-- 查看pgpool的节点,登陆到pgpool上查看,可以看到节点0和节点1 。
psql -p 9999 -h 192.168.2.80 -Upostgres
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | up | 1.000000 | primary | 1 | true | 0 | |
| 2021-08-26 13:20:05
1 | test1 | 5432 | up | 0.000000 | standby | 0 | false | 0 | |
| 2021-08-26 13:20:05
(2 rows)
postgres=#
-- 关闭掉primary库,stanby库被提升为primary库
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | down | 0.500000 | standby | 373069 | false | 0 | |
| 2021-08-27 15:13:43
1 | test1 | 5432 | up | 0.500000 | primary | 46646 | true | 0 | |
| 2021-08-27 15:13:43
(2 rows)
postgres=#
再次启动关闭的库,原来的primary库,pgpool中并没有自动切换。感觉只能起到一个failover的作用,节点重新上线后,并不能自动添加到pg-pool中。节点再次启动后,发现两个节点都是in production。也就是failover了。需要重建了。
-- 测试在线recovery (不测试了,使用pg_basebackup修复就很方便,或者pg_rewind也很方便)
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | r
eplication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+--
-----------------+------------------------+---------------------
0 | test | 5432 | up | 0.500000 | primary | 0 | true | 0 |
| | 2021-08-27 07:47:40
1 | test1 | 5432 | down | 0.500000 | standby | 0 | false | 0 |
| | 2021-08-27 07:47:40
(2 rows)
postgres=#
-- 使用命令修复备库(先rm 掉data目录,否则失败),建议使用pg_rewind,这个只是copy变化过的数据块,类似增量
pg_basebackup -h 192.168.2.80 -U repl -p 5432 -F p -X s -v -P -R -D /opt/PostgreSQL/10/data/ -l pg_init
-- 使用pgbench进行测试负载均衡
-- 创建测试用的数据库pgbench
[postgres@test /opt/PostgreSQL/10/data]$psql -h 192.168.2.80 -p9999 -c "create database pgbench"
Password:
CREATE DATABASE
-- 初始化测试用的数据库pgbench
[postgres@test /opt/PostgreSQL/10/data]$pgbench -i pgbench
Password:
NOTICE: table "pgbench_history" does not exist, skipping
NOTICE: table "pgbench_tellers" does not exist, skipping
NOTICE: table "pgbench_accounts" does not exist, skipping
NOTICE: table "pgbench_branches" does not exist, skipping
creating tables...
100000 of 100000 tuples (100%) done (elapsed 0.38 s, remaining 0.00 s)
vacuum...
set primary keys...
done.
[postgres@test /opt/PostgreSQL/10/data]$
-- 进行pgbench测试
[postgres@test /opt/PostgreSQL/10/data]$pgbench -p 9999 -c 10 -j 10 -S -T 60 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
duration: 60 s
number of transactions actually processed: 421004
latency average = 1.425 ms
tps = 7016.464624 (including connections establishing)
tps = 7028.458906 (excluding connections establishing)
[postgres@test /opt/PostgreSQL/10/data]$
-- 在测试过程中,观察节点的状态,可以看到备库上有连接
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | up | 0.500000 | primary | 0 | true | 0 | |
| 2021-08-27 12:58:25
1 | test1 | 5432 | up | 0.500000 | standby | 0 | false | 0 | |
| 2021-08-27 12:58:25
(2 rows)
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | up | 0.500000 | primary | 29256 | true | 0 | |
| 2021-08-27 12:58:25
1 | test1 | 5432 | up | 0.500000 | standby | 3651 | false | 0 | |
| 2021-08-27 12:58:25
(2 rows)
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | up | 0.500000 | primary | 42489 | true | 0 | |
| 2021-08-27 12:58:25
1 | test1 | 5432 | up | 0.500000 | standby | 5245 | false | 0 | |
| 2021-08-27 12:58:25
(2 rows)
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | up | 0.500000 | primary | 50300 | true | 0 | |
| 2021-08-27 12:58:25
1 | test1 | 5432 | up | 0.500000 | standby | 6381 | false | 0 | |
| 2021-08-27 12:58:25
(2 rows)
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | up | 0.500000 | primary | 55895 | true | 0 | |
| 2021-08-27 12:58:25
1 | test1 | 5432 | up | 0.500000 | standby | 7044 | false | 0 | |
| 2021-08-27 12:58:25
(2 rows)
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | up | 0.500000 | primary | 81584 | true | 0 | |
| 2021-08-27 12:58:25
1 | test1 | 5432 | up | 0.500000 | standby | 10379 | false | 0 | |
| 2021-08-27 12:58:25
(2 rows)
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | up | 0.500000 | primary | 323013 | true | 0 | |
| 2021-08-27 12:58:25
1 | test1 | 5432 | up | 0.500000 | standby | 40473 | false | 0 | |
| 2021-08-27 12:58:25
(2 rows)
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | up | 0.500000 | primary | 373069 | true | 0 | |
| 2021-08-27 12:58:25
1 | test1 | 5432 | up | 0.500000 | standby | 46646 | false | 0 | |
| 2021-08-27 12:58:25
(2 rows)
-- 问题1
[postgres@test /opt/pgpool/etc]$psql -p 9999 -h 192.168.2.80 -Upostgres
psql.bin: FATAL: md5 authentication failed
DETAIL: pool_passwd file does not contain an entry for "postgres"
解决方法 :
pg_md5 -m -p -u postgres pool_passwd -- 直接将密码md5后写入到pool_passwd文件了
pgpool -F pool_passwd reload
-- 问题2
停掉primary后 (但是查看primary和standby的pg_controldata是正常的,primary和standby上同步是正常的,问题可能出在pgpool,第二天检查发现是pgpool_status这个文件的缘故.这个文件保存集群的状态,比如两行up,表示两个节点都是up)
日志 :
find_primary_node: standby node is 1
2021-08-26 14:39:12: pid 8894: LOG: find_primary_node: standby node is 1
2021-08-26 14:39:13: pid 8894: LOG: find_primary_node: standby node is 1
2021-08-26 14:39:14: pid 9208: LOG: PCP process: 9208 started
2021-08-26 14:39:14: pid 9210: LOG: process started
2021-08-26 14:39:14: pid 9209: LOG: process started
2021-08-26 14:39:14: pid 8894: LOG: pgpool-II successfully started. version 4.2.4 (chichiriboshi)
2021-08-26 14:39:14: pid 8894: LOG: node status[0]: 0
2021-08-26 14:39:14: pid 8894: LOG: node status[1]: 2
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | down | 1.000000 | standby | 0 | false | 0 | |
| 2021-08-26 14:34:10
1 | test1 | 5432 | up | 0.000000 | standby | 0 | true | 0 | |
| 2021-08-26 14:34:10
(2 rows)
postgres=#
解决方法,将节点加进来 ,要在postgres账号下 ,期间状态一直为waiting,解决方法,将pgpool_status文件删除,重新启动pgpool
[postgres@test /tmp]$pcp_attach_node -h 192.168.2.80 -p 9898 -U postgres -n 0 -v
Password:
pcp_attach_node -- Command Successful
[postgres@test /tmp]$
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | r
eplication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+--
----------------------+---------------------
0 | test | 5432 | up | 1.000000 | primary | 0 | true | 0 | |
| 2021-08-26 15:20:35
1 | test1 | 5432 | up | 0.000000 | standby | 0 | false | 0 | |
| 2021-08-26 15:07:16
(2 rows)
postgres=#
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state |
replication_sync_state | last_status_change
---------+----------+------+---------+-----------+---------+------------+-------------------+-------------------+-------------------+-
-----------------------+---------------------
0 | test | 5432 | up | 0.500000 | primary | 0 | true | 0 | |
| 2021-08-27 12:44:03
1 | test1 | 5432 | waiting | 0.500000 | standby | 0 | false | 0 | |
| 2021-08-27 12:50:25
(2 rows)
postgres=#
其他事项,测试后再补充。
end