postgresql中使用pg_repack主要有三种原因:
1.大量删除记录后,从表中回收空闲空间给磁盘。
2.重新构建一个表,以重新排列记录,并将它们压缩/打包到更少的页。这可能会让查询的IO更少,性能更高。
3.从那些因为autovacuum设置不当,而导致过度膨胀的表中回收空闲空间。
pg_repack是PostgreSQL的一个扩展,它可以帮助我们在线重建一个表。这类似于MySQL中用于在线表重建的pt-online-schema-change。然而,pg_repack只适用于具有主键或NOT NULL唯一键的表。
安装pg_repack
yum install pg_repack11
加载扩展
修改配置文件后重启postgresql
shared_preload_libraries = 'pg_repack'
数据库中创建扩展
$ psql \c xxx CREATE EXTENSION pg_repack;
查看帮助
$ pg_repack --help pg_repack re-organizes a PostgreSQL database. Usage: pg_repack [OPTION]... [DBNAME] Options: -a, --all repack all databases -t, --table=TABLE repack specific table only -I, --parent-table=TABLE repack specific parent table and its inheritors -c, --schema=SCHEMA repack tables in specific schema only -s, --tablespace=TBLSPC move repacked tables to a new tablespace -S, --moveidx move repacked indexes to TBLSPC too -o, --order-by=COLUMNS order by columns instead of cluster keys -n, --no-order do vacuum full instead of cluster -N, --dry-run print what would have been repacked -j, --jobs=NUM Use this many parallel jobs for each table -i, --index=INDEX move only the specified index -x, --only-indexes move only indexes of the specified table -T, --wait-timeout=SECS timeout to cancel other backends on conflict -D, --no-kill-backend don't kill other backends when timed out -Z, --no-analyze don't analyze at end -k, --no-superuser-check skip superuser checks in client -C, --exclude-extension don't repack tables which belong to specific extension Connection options: -d, --dbname=DBNAME database to connect -h, --host=HOSTNAME database server host or socket directory -p, --port=PORT database server port -U, --username=USERNAME user name to connect as -w, --no-password never prompt for password -W, --password force password prompt Generic options: -e, --echo echo queries -E, --elevel=LEVEL set output message level --help show this help, then exit --version output version information, then exit Read the website for details: <https://reorg.github.io/pg_repack/>. Report bugs to <https://github.com/reorg/pg_repack/issues>.
参数--dry-run检测哪些表支持pg_repack
$ pg_repack --dry-run -d percona --table scott.employee INFO: Dry run enabled, not executing repack INFO: repacking table "scott.employee"
如果不支持,会提示:
$ pg_repack --dry-run -d percona --table scott.sales INFO: Dry run enabled, not executing repack WARNING: relation "scott.sales" must have a primary key or not-null unique keys
可以并行执行
$ pg_repack -d percona -t scott.employee -j 4 NOTICE: Setting up workers.conns INFO: repacking table "scott.employee"
pg_repack也支持对远程节点执行操作。
工作原理
通过在原表上使用触发器,避免执行full vacuum的时候发生宕机。以下是工作实现过程:
对表执行全表repack:
1.创建一个记录表,用于记录原表中的记录的修改 2.在原表上创建一个触发器,记录插入、更新、和删除操作到日志表 3.创建一个新表,包含原表中的所有的记录 4.在新表上创建索引 5.将日志表中的变更应用到新表 6.使用system catalogs将原表和新表进行swap,包含索引和toast表 7.删除原先的表
对索引执行repack
1.使用concurrently创建新索引 2.将新的索引和老的索引进行swap 3.删除原先的索引