注意事项:
参考官方文档 https://github.com/shayonj/pg-osc
DDL变更逻辑,和 pt-osc基本类似:
1. 创建一个审计表来记录对父表所做的更改。
2. 获取一个简短的ACCESS EXCLUSIVE锁以将父表上的触发器(用于插入、更新、删除)添加到审计表。
3. 创建一个新的影子表并在影子表上运行 ALTER/migration。
4. 复制旧表中的所有行。
5. 在新表上建立索引。
6. 针对影子表重放审计表中累积的所有更改。
7. 在重播时删除审计表中的行。
8. 一旦增量(剩余行)为 ~20 行,ACCESS EXCLUSIVE在事务中获取针对父表的锁,并且:
9. 交换表名(影子表 <> 父表)。
10. 通过删除并重新创建带有NOT VALID.
11. ANALYZE在新表上运行。
12. 验证添加的所有 FK NOT VALID。
13. 删除父(现在旧)表(可选)。
依赖ruby高版本, 在centos7上安装没成功,用它提供的docker也提示不兼容(可能是我工作站CPU太老导致的)
下面是我自己编译docker image的步骤,实测可以使用。
# 先挂下代理,加速下载
export https_proxy=http://192.168.31.16:7890 && export http_proxy=http://192.168.31.16:7890
编写 Dockerfile 文件内容如下
cat Dockerfile
FROM ubuntu:22.10
RUN apt-get update && apt-get install -y make gcc libpq-dev ruby ruby-dev
RUN gem install pg_online_schema_change
CMD []
打image
docker build .
打tag
# docker tag
9ddffecd4e3e pg-osc:lee
案例1 删列
export PGPASSWORD="dts"
docker run --network host -it --rm pg-osc:lee pg-online-schema-change perform --alter-statement 'ALTER TABLE t2 drop COLUMN purchased ; ALTER TABLE t2 drop COLUMN purchased2;' --dbname
"postgres" --schema "public" --host "192.168.31.181" --username "dts" --pull-batch-count 1000 --delta-count 20 --wait-time-for-lock 5 --kill-backends –drop
案例2 列改名
export PGPASSWORD="dts"
docker run --network host -it --rm pg-osc:lee pg-online-schema-change perform --alter-statement 'ALTER TABLE t2 RENAME COLUMN b to new_b;' --dbname "postgres" --schema "public" --host
"192.168.31.181" --username "dts" --pull-batch-count 1000 --delta-count 20 --wait-time-for-lock 5 --kill-backends --drop
一些参数说明:
--wait-time-for-lock 5 --kill-backends 如果该操作是在繁忙的表上执行的,则可以使用pg osc的kill后端功能来杀死可能与pg osc操作竞争的其他后端,以便在短时间内获取锁。pg osc获取的ACCESS EXCLUSIVE锁只持有一小段时间,之后释放。你可以调整pg osc在杀死其他后端之前应该等待多长时间(或者如果pg osc一开始就应该杀死后端)。
--pull-batch-count 2000 --delta-count 500 如果您有一个写容量很高的表,那么默认的重放迭代可能不够。也就是说,您可能会看到pg osc一次从审计表中回放1000行(pull batch count)。pg osc也在进行交换之前等待,直到审计表中的剩余行计数(delta计数)为20。您可以将这些值调整得更高,以便更快地赶上这类工作负载。
还有些其它功能(数据回填的时候执行自定义的sql逻辑),具体可以看官方文档。
参数清单:
Options:
-a, --alter-statement=ALTER_STATEMENT # The ALTER statement to perform the
schema change
-s, --schema=SCHEMA # The schema in which
the table is
# Default: public
-d, --dbname=DBNAME # Name of the
database
-h, --host=HOST # Server host
where the Database is located
-u, --username=USERNAME # Username for the
Database
-p, --port=N # Port for the
Database
# Default: 5432
-w, --password=PASSWORD # DEPRECATED: Password
for the Database. Please pass PGPASSWORD environment variable instead.
-v, [--verbose], [--no-verbose] # Emit logs in debug mode
-f, [--drop], [--no-drop] # Drop the original table
in the end after the swap
-k, [--kill-backends],
[--no-kill-backends] # Kill other
competing queries/backends when trying to acquire lock for the shadow table
creation and swap. It will wait for --wait-time-for-lock duration before
killing backends and try upto 3 times.
-w, [--wait-time-for-lock=N] # Time to wait before killing
backends to acquire lock and/or retrying upto 3 times. It will kill backends if
--kill-backends is true, otherwise try upto 3 times and exit if it cannot
acquire a lock.
# Default: 10
-c, [--copy-statement=COPY_STATEMENT] # Takes a .sql file location where you
can provide a custom query to be played (ex: backfills) when pgosc copies data
from the primary to the shadow table. More examples in README.
-b, [--pull-batch-count=N] # Number of rows to be
replayed on each iteration after copy. This can be tuned for faster catch up
and swap. Best used with delta-count.
# Default: 1000
-e, [--delta-count=N] # Indicates how many
rows should be remaining before a swap should be performed. This can be tuned
for faster catch up and swap, especially on highly volume tables. Best used
with pull-batch-count.
# Default: 20