PgSQL流复制切换,通过函数pg_promote和standby.signal来实现

服务器信息
master 10.15.6.115
standby 10.15.6.116
数据库版本
$ psql -V
psql (PostgreSQL) 12.3
$

一、关闭主库

$ pg_ctl -D /data/pgsql_5431/pgdata/ stop -m fast
waiting for server to shut down.... done
server stopped
$
主库日志:
2021-09-10 11:26:48.834 CST,,,11754,,613ac3b0.2dea,3,,2021-09-10 10:32:16 CST,,0,LOG,00000,"received fast shutdown request",,,,,,,,,""
2021-09-10 11:26:48.835 CST,,,11754,,613ac3b0.2dea,4,,2021-09-10 10:32:16 CST,,0,LOG,00000,"aborting any active transactions",,,,,,,,,""
2021-09-10 11:26:48.836 CST,,,11754,,613ac3b0.2dea,5,,2021-09-10 10:32:16 CST,,0,LOG,00000,"background worker ""logical replication launcher"" (PID 11763) exited with exit code 1",,,,,,,,,""
2021-09-10 11:26:48.837 CST,,,11757,,613ac3b0.2ded,1,,2021-09-10 10:32:16 CST,,0,LOG,00000,"shutting down",,,,,,,,,""
2021-09-10 11:26:49.492 CST,,,11754,,613ac3b0.2dea,6,,2021-09-10 10:32:16 CST,,0,LOG,00000,"database system is shut down",,,,,,,,,""
备库日志:
2021-09-10 11:29:19.538 CST,,,25644,,613ad10f.642c,1,,2021-09-10 11:29:19 CST,,0,FATAL,XX000,"could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host ""10.15.6.115"" and accepting
TCP/IP connections on port 5431?",,,,,,,,,""
2021-09-10 11:29:24.544 CST,,,25655,,613ad114.6437,1,,2021-09-10 11:29:24 CST,,0,FATAL,XX000,"could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host ""10.15.6.115"" and accepting
TCP/IP connections on port 5431?",,,,,,,,,""
2021-09-10 11:29:29.550 CST,,,25657,,613ad119.6439,1,,2021-09-10 11:29:29 CST,,0,FATAL,XX000,"could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host ""10.15.6.115"" and accepting
TCP/IP connections on port 5431?",,,,,,,,,""
二、备库通过pg_promote()函数,将角色提升为主库
$ psql -h 127.0.0.1 -p 5431
psql (12.3)
Type "help" for help.
postgres=# SELECT pg_promote(true,60);
pg_promote
------------
t
(1 row)

wait: 表示是否等待备库的 promotion 完成或者 wait_seconds 秒之后返回成功,默认值为 true。
wait_seconds: 等待时间,单位秒,默认 60
#也可通过su - postgres -c "pg_ctl promote"命令完成
日志信息:
2021-09-10 11:29:30.698 CST,,,25036,,613acf22.61cc,6,,2021-09-10 11:21:06 CST,1/0,0,LOG,00000,"received promote request",,,,,,,,,""
2021-09-10 11:29:30.698 CST,,,25036,,613acf22.61cc,7,,2021-09-10 11:21:06 CST,1/0,0,LOG,00000,"redo done at 0/B000028",,,,,,,,,""
2021-09-10 11:29:30.701 CST,,,25036,,613acf22.61cc,8,,2021-09-10 11:21:06 CST,1/0,0,LOG,00000,"selected new timeline ID: 2",,,,,,,,,""
2021-09-10 11:29:30.886 CST,,,25036,,613acf22.61cc,9,,2021-09-10 11:21:06 CST,1/0,0,LOG,00000,"archive recovery complete",,,,,,,,,""
2021-09-10 11:29:30.891 CST,,,25034,,613acf22.61ca,3,,2021-09-10 11:21:06 CST,,0,LOG,00000,"database system is ready to accept connections",,,,,,,,,""

之后,$PGDATA目录下standby.signal消失

三、新备库创建手工standby.signal 并编辑内容
$ touch standby.signal
vi standby.signal
standby_mode = 'on'

检查新主库的pg_hba.conf 的replication权限
host replication all 10.15.6.115/24 trust

启动新备库
$ pg_ctl -D /data/pgsql_5431/pgdata/ start
2021-09-10 11:31:37.022 CST,,,17484,,613ad198.444c,1,,2021-09-10 11:31:36 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2021-09-10 11:31:37.024 CST,,,17486,,613ad199.444e,1,,2021-09-10 11:31:37 CST,,0,LOG,00000,"database system was shut down at 2021-09-10 11:26:49 CST",,,,,,,,,""
2021-09-10 11:31:37.025 CST,,,17486,,613ad199.444e,2,,2021-09-10 11:31:37 CST,,0,WARNING,01000,"specified neither primary_conninfo nor restore_command",,"The database server will regularly poll the pg_wal subdirectory to check for files placed there.",,,,,,,""
2021-09-10 11:31:37.025 CST,,,17486,,613ad199.444e,3,,2021-09-10 11:31:37 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""
2021-09-10 11:31:37.029 CST,,,17486,,613ad199.444e,4,,2021-09-10 11:31:37 CST,1/0,0,LOG,00000,"consistent recovery state reached at 0/B0000A0",,,,,,,,,""
2021-09-10 11:31:37.029 CST,,,17486,,613ad199.444e,5,,2021-09-10 11:31:37 CST,1/0,0,LOG,00000,"invalid record length at 0/B0000A0: wanted 24, got 0",,,,,,,,,""
2021-09-10 11:31:37.029 CST,,,17484,,613ad198.444c,2,,2021-09-10 11:31:36 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,


2021-09-10 14:47:58.136 CST,,,3780,,613aff9d.ec4,1,,2021-09-10 14:47:57 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2021-09-10 14:47:58.138 CST,,,3782,,613aff9e.ec6,1,,2021-09-10 14:47:58 CST,,0,LOG,00000,"database system was shut down in recovery at 2021-09-10 14:47:57 CST",,,,,,,,,""
2021-09-10 14:47:58.138 CST,,,3782,,613aff9e.ec6,2,,2021-09-10 14:47:58 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""
2021-09-10 14:47:58.140 CST,,,3782,,613aff9e.ec6,3,,2021-09-10 14:47:58 CST,1/0,0,LOG,00000,"consistent recovery state reached at 0/70000A0",,,,,,,,,""
2021-09-10 14:47:58.140 CST,,,3782,,613aff9e.ec6,4,,2021-09-10 14:47:58 CST,1/0,0,LOG,00000,"invalid record length at 0/70000A0: wanted 24, got 0",,,,,,,,,""
2021-09-10 14:47:58.140 CST,,,3780,,613aff9d.ec4,2,,2021-09-10 14:47:57 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""
2021-09-10 14:47:58.144 CST,,,3786,,613aff9e.eca,1,,2021-09-10 14:47:58 CST,,0,LOG,00000,"started streaming WAL from primary at 0/7000000 on timeline 2",,,,,,,,,""
2021-09-10 14:48:09.832 CST,,,3782,,613aff9e.ec6,5,,2021-09-10 14:47:58 CST,1/0,0,LOG,00000,"redo starts at 0/70000A0",,,,,,,,,""

四:最后验证流复制新的主备关系

新主库:
postgres=# select client_addr,sync_state from pg_stat_replication;
client_addr | sync_state
--------------+------------
10.15.6.115 | async
(1 row)
可以观察到walsender进程
$ ps -ef |grep [w]al
postgres 5750 3780 0 15:05 ? 00:00:00 postgres: walwriter
postgres 5944 3780 0 15:07 ? 00:00:00 postgres: walsender repuser 10.15.6.116(46798) streaming 0/9000000

备库:
可以观察到walreceiver进程
$ ps -ef |grep [w]al
postgres 5489 5483 0 Apr21 ? 00:47:12 postgres: walwriter
postgres 10333 10327 0 15:07 ? 00:00:01 postgres: walreceiver streaming 0/9000000