postgresql中使用pg_repack主要有三种原因:

1.大量删除记录后,从表中回收空闲空间给磁盘。

2.重新构建一个表,以重新排列记录,并将它们压缩/打包到更少的页。这可能会让查询的IO更少,性能更高。

3.从那些因为autovacuum设置不当,而导致过度膨胀的表中回收空闲空间。

pg_repack是PostgreSQL的一个扩展,它可以帮助我们在线重建一个表。这类似于MySQL中用于在线表重建的pt-online-schema-change。然而,pg_repack只适用于具有主键或NOT NULL唯一键的表。

 

安装pg_repack

yum install pg_repack11  

 

加载扩展

修改配置文件后重启postgresql

shared_preload_libraries = 'pg_repack'

  

数据库中创建扩展

$ psql 

\c xxx

CREATE EXTENSION pg_repack;

  

查看帮助

$ pg_repack --help
pg_repack re-organizes a PostgreSQL database.

Usage:
  pg_repack [OPTION]... [DBNAME]
Options:
  -a, --all                 repack all databases
  -t, --table=TABLE         repack specific table only
  -I, --parent-table=TABLE  repack specific parent table and its inheritors
  -c, --schema=SCHEMA       repack tables in specific schema only
  -s, --tablespace=TBLSPC   move repacked tables to a new tablespace
  -S, --moveidx             move repacked indexes to TBLSPC too
  -o, --order-by=COLUMNS    order by columns instead of cluster keys
  -n, --no-order            do vacuum full instead of cluster
  -N, --dry-run             print what would have been repacked
  -j, --jobs=NUM            Use this many parallel jobs for each table
  -i, --index=INDEX         move only the specified index
  -x, --only-indexes        move only indexes of the specified table
  -T, --wait-timeout=SECS   timeout to cancel other backends on conflict
  -D, --no-kill-backend     don't kill other backends when timed out
  -Z, --no-analyze          don't analyze at end
  -k, --no-superuser-check  skip superuser checks in client
  -C, --exclude-extension   don't repack tables which belong to specific extension

Connection options:
  -d, --dbname=DBNAME       database to connect
  -h, --host=HOSTNAME       database server host or socket directory
  -p, --port=PORT           database server port
  -U, --username=USERNAME   user name to connect as
  -w, --no-password         never prompt for password
  -W, --password            force password prompt

Generic options:
  -e, --echo                echo queries
  -E, --elevel=LEVEL        set output message level
  --help                    show this help, then exit
  --version                 output version information, then exit

Read the website for details: <https://reorg.github.io/pg_repack/>.
Report bugs to <https://github.com/reorg/pg_repack/issues>.

  

参数--dry-run检测哪些表支持pg_repack

$ pg_repack --dry-run -d percona --table scott.employee
INFO: Dry run enabled, not executing repack
INFO: repacking table "scott.employee"

如果不支持,会提示:

$ pg_repack --dry-run -d percona --table scott.sales
INFO: Dry run enabled, not executing repack
WARNING: relation "scott.sales" must have a primary key or not-null unique keys

可以并行执行

$ pg_repack -d percona -t scott.employee -j 4
NOTICE: Setting up workers.conns
INFO: repacking table "scott.employee"

  

pg_repack也支持对远程节点执行操作。

 

工作原理

通过在原表上使用触发器,避免执行full vacuum的时候发生宕机。以下是工作实现过程:

对表执行全表repack:

1.创建一个记录表,用于记录原表中的记录的修改

2.在原表上创建一个触发器,记录插入、更新、和删除操作到日志表

3.创建一个新表,包含原表中的所有的记录

4.在新表上创建索引

5.将日志表中的变更应用到新表

6.使用system catalogs将原表和新表进行swap,包含索引和toast表

7.删除原先的表

  

对索引执行repack

1.使用concurrently创建新索引

2.将新的索引和老的索引进行swap

3.删除原先的索引