Pg10搭建了流复制主备切换

配置 主库10.10.10.13 pghost4 备库10.10.10.14 pghost5

测试主库出现问题,停止后,备库自动切换成为新主库。老主库现在应该是备库了,启动它验证状态。发现备库启动不起来。通过以下方法,备库启动了。

  1. 新备库通过pg_ctl start启动,出现告警日志。

[root@pghost4 ~]# su - pg10 pg10@pghost4->pg_ctl start waiting for server to start....2019-12-20 21:44:10.004 CST [23416] LOG: listening on IPv4 address "0.0.0.0", port 1921 2019-12-20 21:44:10.004 CST [23416] LOG: listening on IPv6 address "::", port 1921 2019-12-20 21:44:10.006 CST [23416] LOG: listening on Unix socket "/tmp/.s.PGSQL.1921" 2019-12-20 21:44:10.014 CST [23417] LOG: database system was shut down in recovery at 2019-12-20 21:33:31 CST 2019-12-20 21:44:10.014 CST [23417] LOG: entering standby mode 2019-12-20 21:44:10.014 CST [23417] FATAL: requested timeline 2 is not a child of this server's history 2019-12-20 21:44:10.014 CST [23417] DETAIL: Latest checkpoint is at 0/11000028 on timeline 1, but in the history of the requested timeline, the server forked off from that timeline at 0/C000098. 2019-12-20 21:44:10.015 CST [23416] LOG: startup process (PID 23417) exited with exit code 1 2019-12-20 21:44:10.015 CST [23416] LOG: aborting startup due to startup process failure 2019-12-20 21:44:10.016 CST [23416] LOG: database system is shut down stopped waiting pg_ctl: could not start server Examine the log output.

从上边的意思可以看到,新备库pg_wal目录下的00000002.history 这个跟新主库不一致。

pg10@pghost4->ll total 273M -rw------- 1 pg10 pg10 16M Dec 18 22:40 000000010000000000000001 -rw------- 1 pg10 pg10 16M Dec 18 23:01 000000010000000000000002 -rw------- 1 pg10 pg10 16M Dec 18 23:02 000000010000000000000003 -rw------- 1 pg10 pg10 16M Dec 18 23:06 000000010000000000000004 -rw------- 1 pg10 pg10 16M Dec 18 23:39 000000010000000000000005 -rw------- 1 pg10 pg10 16M Dec 18 23:51 000000010000000000000006 -rw------- 1 pg10 pg10 16M Dec 18 23:55 000000010000000000000007 -rw------- 1 pg10 pg10 16M Dec 18 23:58 000000010000000000000008 -rw------- 1 pg10 pg10 16M Dec 19 00:15 000000010000000000000009 -rw------- 1 pg10 pg10 16M Dec 19 00:15 00000001000000000000000A -rw------- 1 pg10 pg10 16M Dec 19 00:27 00000001000000000000000B -rw------- 1 pg10 pg10 16M Dec 19 00:29 00000001000000000000000C -rw------- 1 pg10 pg10 16M Dec 19 00:34 00000001000000000000000D -rw------- 1 pg10 pg10 16M Dec 19 00:36 00000001000000000000000E -rw------- 1 pg10 pg10 16M Dec 19 00:36 00000001000000000000000F -rw------- 1 pg10 pg10 302 Dec 19 00:36 00000001000000000000000F.00000028.backup -rw------- 1 pg10 pg10 16M Dec 19 00:38 000000010000000000000010 -rw------- 1 pg10 pg10 16M Dec 20 21:33 000000010000000000000011 -rw------- 1 pg10 pg10 41 Dec 19 00:30 00000002.history drwx------ 2 pg10 pg10 4.0K Dec 20 21:31 archive_status pg10@pghost4->more 00000002.history 1 0/C000098 no recovery target specified pg10@pghost4->

然后看新主库该路径下的目录

pg10@pghost4->ll total 132K -rw------- 1 pg10 pg10 206 Dec 19 00:36 backup_label.old drwx------ 5 pg10 pg10 4.0K Dec 19 00:36 base drwx------ 2 pg10 pg10 4.0K Dec 20 21:27 global drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_commit_ts drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_dynshmem -rw------- 1 pg10 pg10 4.6K Dec 19 00:36 pg_hba.conf -rw------- 1 pg10 pg10 1.6K Dec 19 00:36 pg_ident.conf drwx------ 4 pg10 pg10 4.0K Dec 19 00:38 pg_logical drwx------ 4 pg10 pg10 4.0K Dec 19 00:36 pg_multixact drwx------ 2 pg10 pg10 4.0K Dec 19 00:37 pg_notify drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_replslot drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_serial drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_snapshots drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_stat drwx------ 2 pg10 pg10 4.0K Dec 20 21:48 pg_stat_tmp drwx------ 2 pg10 pg10 4.0K Dec 19 00:38 pg_subtrans drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_tblspc drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_twophase -rw------- 1 pg10 pg10 3 Dec 19 00:36 PG_VERSION drwx------ 3 pg10 pg10 4.0K Dec 19 00:38 pg_wal drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_xact -rw------- 1 pg10 pg10 88 Dec 19 00:36 postgresql.auto.conf -rw------- 1 pg10 pg10 23K Dec 19 00:36 postgresql.conf -rw------- 1 pg10 pg10 31 Dec 19 00:37 postmaster.opts -rw------- 1 pg10 pg10 74 Dec 19 00:38 postmaster.pid -rw-r--r-- 1 pg10 pg10 5.8K Dec 19 00:37 recovery.done pg10@pghost4->cd pg_wal/ pg10@pghost4->ll total 65M -rw------- 1 pg10 pg10 16M Dec 19 00:36 00000001000000000000000F -rw------- 1 pg10 pg10 16M Dec 19 00:38 000000010000000000000010 -rw------- 1 pg10 pg10 16M Dec 19 00:38 000000010000000000000011.partial -rw------- 1 pg10 pg10 16M Dec 19 00:38 000000020000000000000011 -rw------- 1 pg10 pg10 42 Dec 19 00:38 00000002.history drwx------ 2 pg10 pg10 4.0K Dec 19 00:38 archive_status pg10@pghost4->pwd /data/pg10/data/pg_wal pg10@pghost4->more 00000002.history 1 0/11000098 no recovery target specified

将新备库的00000002.history 备份,然后删除。 将新主库的传递到新备库。 然后启动。 pg10@pghost4->scp 00000002.history 10.10.10.13:/data/pg10/data/pg_wal/ pg10@10.10.10.13's password: 00000002.history 100% 42 0.0KB/s 00:00
pg10@pghost4->

pg10@pghost4->pg_ctl start waiting for server to start....2019-12-20 21:51:25.142 CST [23486] LOG: listening on IPv4 address "0.0.0.0", port 1921 2019-12-20 21:51:25.142 CST [23486] LOG: listening on IPv6 address "::", port 1921 2019-12-20 21:51:25.144 CST [23486] LOG: listening on Unix socket "/tmp/.s.PGSQL.1921" 2019-12-20 21:51:25.155 CST [23487] LOG: database system was shut down in recovery at 2019-12-20 21:33:31 CST 2019-12-20 21:51:25.155 CST [23487] LOG: entering standby mode 2019-12-20 21:51:25.171 CST [23487] LOG: consistent recovery state reached at 0/11000098 2019-12-20 21:51:25.171 CST [23487] LOG: invalid record length at 0/11000098: wanted 24, got 0 2019-12-20 21:51:25.171 CST [23486] LOG: database system is ready to accept read only connections 2019-12-20 21:51:25.179 CST [23491] LOG: started streaming WAL from primary at 0/11000000 on timeline 2 2019-12-20 21:51:25.225 CST [23487] LOG: redo starts at 0/11000098 done server started pg10@pghost4->

发现已经启动。

pg10@pghost4->pg_controldata|grep cluster Database cluster state: in archive recovery pg10@pghost4->

可以看到新备库处于备库模式。