1. postgres 主备环境搭建

1.1  HA 实现方式

      热备份、冷备份

     冷备份:物理备份,也就是文件复制。主机写完一个WAL日志文件后,才复制到standby服务器上去应用。

     热备份:流复制,也就是同步复制或异步复制。主备机实时同步。

1.2 操作步骤

主备机安装在相同的机器上

(1)主备机PGDATA路径

primary PGDATA路径:

[wln@localhost data]$ echo $PGDATA
 /home/wln/postgres9.3/datastandby PGDATA路径:
[wln@localhost standby]$ echo $PGDATA
 /home/wln/standby

(2)修改primary 的postgresql.conf 文件

wal_level = hot_standby   //热备模式
 max_wal_senders = 2    //primary 最多允许的并发standby数据库
 wal_keep_segments = 32  //in logfile segments, 16MB each; 0 disables。该xlog日志会循环覆盖,请将该值设置适当大些(需考虑磁盘空间、数据量增长情况)logging_collector = on

(3)修改primary的pg_hba.conf

host replication repluser 127.0.0.1/32 trust

数据库名必须填“replication”, 这是一个为standby连接使用了一个虚拟的数据库名称。用户repluser是给standby连接使用的在主库上建的一个超级用户。
127.0.0.1 是standby数据库的IP地址。

(4)创建一个数据库超级用户repluser,用来给standby备机连接主机使用。

[wln@localhost data]$ psql -d postgres
 psql (9.3beta2)
 Type "help" for help.postgres=# create user repluser superuser password 'password';
 CREATE ROLE

(5)重启primary数据库

pg_ctl restart

(6)对主数据库做一个基础备份
用select pg_start_backup();命令把数据库切换到备份状态

[wln@localhost data]$ psql -d postgres
 psql (9.3beta2)
 Type "help" for help.postgres=# select pg_start_backup('/home/wln/standby');
  pg_start_backup 
 -----------------
  0/6000024
 (1 row)(不要退出)

(7)新开1个session2,将主机数据目录copy到备机

[wln@localhost ~]$ cp -r postgres9.3/data/* standby/
 [wln@localhost ~]$ cd standby/[wln@localhost standby]$ ll
 总计 104
 -rw------- 1 wln wln   205 07-06 04:13 backup_label
 drwx------ 5 wln wln  4096 07-06 04:13 base
 drwx------ 2 wln wln  4096 07-06 04:13 global
 drwx------ 2 wln wln  4096 07-06 04:13 pg_clog
 -rw------- 1 wln wln  4461 07-06 04:13 pg_hba.conf
 -rw------- 1 wln wln  1636 07-06 04:13 pg_ident.conf
 drwx------ 2 wln wln  4096 07-06 04:13 pg_log
 drwx------ 4 wln wln  4096 07-06 04:13 pg_multixact
 drwx------ 2 wln wln  4096 07-06 04:13 pg_notify
 drwx------ 2 wln wln  4096 07-06 04:13 pg_serial
 drwx------ 2 wln wln  4096 07-06 04:13 pg_snapshots
 drwx------ 2 wln wln  4096 07-06 04:13 pg_stat
 drwx------ 2 wln wln  4096 07-06 04:13 pg_stat_tmp
 drwx------ 2 wln wln  4096 07-06 04:13 pg_subtrans
 drwx------ 2 wln wln  4096 07-06 04:13 pg_tblspc
 drwx------ 2 wln wln  4096 07-06 04:13 pg_twophase
 -rw------- 1 wln wln     4 07-06 04:13 PG_VERSION
 drwx------ 3 wln wln  4096 07-06 04:13 pg_xlog
 -rw------- 1 wln wln 20312 07-06 04:13 postgresql.conf
 -rw------- 1 wln wln    43 07-06 04:13 postmaster.opts
 -rw------- 1 wln wln    83 07-06 04:13 postmaster.pid(8)copy完成,结束主机备份状态
postgres=# select pg_stop_backup();
 NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup
  pg_stop_backup 
 ----------------
  0/60000DC
 (1 row)(9)在session2上操作
修改备机postgresql.conf文件
port=5433
hot_standby=on
 
将share/recovery.conf.sample  复制到standby目录下,并修改如下:
standby_mode = 'on'
 primary_conninfo = 'host=127.0.0.1 port=5432 user=repluser password=password'
 trigger_file = '/home/wln/standby/trigger_activestb' 
删除standby目录下的postmaster.pid文件,执行
[wln@localhost standby]$export PGDATA=/home/wln/standby
[wln@localhost standby]$ echo $PGDATA
 /home/wln/standby(10)启动备机
在session2操作
pg_ctl start
检查有没有正常启动pg_ctl status
若正常启动,pg_log下日志:
LOG:  database system was shut down at 2014-07-06 03:55:40 CST
 LOG:  database system is ready to accept connections
 LOG:  autovacuum launcher started
 LOG:  database system was interrupted; last known up at 2014-07-06 04:13:09 CST
LOG:  entering standby modeLOG:  redo starts at 0/6000024
 LOG:  record with zero length at 0/60000B8
 LOG:  started streaming WAL from primary at 0/6000000 on timeline 1
 LOG:  consistent recovery state reached at 0/60000DC(11)主机上执行一些操作,查看备机有没有同步过来
主机上:
postgres=# \d
 No relations found.
 postgres=# create table t1(id int);
 CREATE TABLE
 postgres=# insert into t1 select generate_series(1,10);
 INSERT 0 10postgres=# select pg_current_xlog_location();
  pg_current_xlog_location 
 --------------------------
  0/7034470
 (1 row)postgres=# select * from pg_stat_replication ;
 -[ RECORD 1 ]----+------------------------------
 pid              | 10183
 usesysid         | 24715
 usename          | repl
 application_name | walreceiver
 client_addr      | 127.0.0.1
 client_hostname  | 
 client_port      | 41560
 backend_start    | 2014-07-06 04:15:45.185796+08
 state            | streaming
 sent_location    | 0/7034470
 write_location   | 0/7034470
 flush_location   | 0/7034470
 replay_location  | 0/7034470
 sync_priority    | 0
 sync_state       | async
备机上:
postgres=# \d
        List of relations
  Schema | Name | Type  | Owner 
 --------+------+-------+-------
  public | t1   | table | wln
 (1 row)postgres=# select * from t1;
  id 
 ----
   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
 (10 rows)postgres=# select pg_last_xlog_receive_location();
  pg_last_xlog_receive_location 
 -------------------------------
  0/7034470
 (1 row)postgres=# select * from pg_stat_replication ;
 (No rows)
 postgres=# create table  t2(id int);
 ERROR:  cannot execute CREATE TABLE in a read-only transaction(12)一些参数设置
synchronous_standby_names = '*' # standby servers that provide sync rep
                                 # comma-separated list of application_name
                                 # from standby(s); '*' = all(默认为‘’)

synchronous_commit = on                # synchronization level;
                                         # off, local, remote_write, or on
synchronous_commit is set to on or remote_write
synchronous_standby_names. The first named standby will be used as the synchronous standby. Standbys listed after this will take over the role of synchronous standby if the first one should fail.



3. 主备切换(方式1)

主备不会自动切换,需借助第三方软件,如heartbeat等,请自行搜索。

(1)如何查看是primary还是standby

方法1:查看进程

主机 sender

[wln@localhost data]$ ps -ef | grep wal
 wln      10421 10416  0 07:52 ?        00:00:00 postgres: wal writer process             
 wln      10691 10416  0 08:04 ?        00:00:00 postgres:  wal sender process repluser 127.0.0.1(59226) streaming 0/6011BAC
 wln      10702  8927  0 08:05 pts/2    00:00:00 grep wal 
备机 receiver
[wln@localhost data]$ ps -ef | grep wal
 wln      10421 10416  0 07:52 ?        00:00:00 postgres: wal writer process             
 wln      10690 10685  3 08:04 ?        00:00:03 postgres:  wal receiver process   streaming 0/6011BAC
 wln      10702  8927  0 08:05 pts/2    00:00:00 grep wal方法2:
主机:
[wln@localhost data]$ pg_controldata | grep Database
 Database system identifier:           6032728846531166188
Database cluster state:               in productionDatabase block size:                  8192 备机:
[wln@localhost standby]$ pg_controldata | grep Database
 Database system identifier:           6032728846531166188
Database cluster state:               in archive recoveryDatabase block size:                  8192(2)备机切换为主机
将主机停掉
[wln@localhost data]$ pg_ctl stop
 waiting for server to shut down.... done
 server stopped在备机PGDATA对应目录下生成recovery.conf文件中指定的文件trigger_file = '/home/wln/standby/trigger_activestb'  (此时备机正常运行状态)
touch  /home/wln/standby/trigger_activestb
会看到 recovery.conf 文件变为 recovery.done
查看原来备机日志:
LOG:  trigger file found: /home/wln/standby/trigger_activestb
 LOG:  redo done at 0/6011C40
 LOG:  last completed transaction was at log time 2014-07-06 08:05:40.152147+08
 LOG:  selected new timeline ID: 2
 LOG:  archive recovery complete
 LOG:  database system is ready to accept connections
 LOG:  autovacuum launcher started登录原来的备机
[wln@localhost standby]$ psql -d postgres -p 5433
 psql (9.3beta2)
 Type "help" for help.postgres=# show transaction_read_only;
  transaction_read_only 
 -----------------------
  off
 (1 row)可以看出现在原备机具有读写功能。即主备切换成功
postgres=# \d
        List of relations
  Schema | Name | Type  | Owner 
 --------+------+-------+-------
  public | t1   | table | wln
 (1 row)postgres=# create table t2(id int);
 CREATE TABLE

4、主备切换(方式2)

备机中recovery.conf中设置为:

standby_mode = on
recovery_target_timeline = 'latest'
primary_conninfo = 'host=localhost port=5433 user=repluser password=password'    
#trigger_file = '/home/wln/standby/trigger_activestb'  #这个注释掉
 备机升为主机方式为:pg_ctl promote -D standbyPath
会看到recovery.conf过几秒后变为recovery.done 表示升级为主机成功。

其他的请参考方式1中内容。

 

 5. 简单脚本

#master
mDATA=/home/$USER/pg94/data
port=5432
wal_level=hot_standby
max_wal_senders=2
wal_keep_segments=5
logging_collector=on
log_line_prefix='%d %p %t %c %i %e '
#standby
sDATA=/home/$USER/pg94/standby
mkdir $sDATA
chmod 700 $sDATA
sport=5433
shot_standby=on

pg_ctl stop -m i
pg_ctl stop -D $sDATA -m i
rm -rf $sDATA/*

#step
sed -i /port\ =/c\port\ =\ $port                                        $mDATA/postgresql.conf
sed -i /wal_level\ =/c\wal_level\ =\ $wal_level                         $mDATA/postgresql.conf
sed -i /max_wal_senders\ =/c\max_wal_senders\ =\ $max_wal_senders       $mDATA/postgresql.conf
sed -i /wal_keep_segments\ =/c\wal_keep_segments\ =\ $wal_keep_segments $mDATA/postgresql.conf
sed -i /logging_collector\ =/c\logging_collector\ =\ $logging_collector $mDATA/postgresql.conf
sed -i /log_line_prefix\ =/c\log_line_prefix\ =\ "'$log_line_prefix'"   $mDATA/postgresql.conf
sed -i 's/#host/host/'  $mDATA/pg_hba.conf

#restart master
pg_ctl restart
psql -c "alter user $USER password '123456'"
psql -c "select pg_start_backup('label');"
#copy data
cp -r $mDATA/* $sDATA
psql -c "select pg_stop_backup();"

#modify standby port
sed -i /port\ =/c\port\ =\ $sport                       $sDATA/postgresql.conf
sed -i /hot_standby\ =/c\hot_standby\ =\ $shot_standby   $sDATA/postgresql.conf
rm -rf $sDATA/postmaster.pid
#recovery.conf
cp $PGHOME/share/recovery.conf.sample  $sDATA/recovery.conf
sed -i /standby_mode/c\standby_mode=on $sDATA/recovery.conf
sed -i /primary_conninfo\ =/c\primary_conninfo="'host=127.0.0.1 port=5432 user=$USER password=123456'"   $sDATA/recovery.conf
sed -i /trigger_file/c\trigger_file="'$sDATA/trigger'" $sDATA/recovery.conf


pg_ctl start -D $sDATA