注意事项:

参考官方文档 https://github.com/shayonj/pg-osc

 

DDL变更逻辑,和 pt-osc基本类似:

1.	创建一个审计表来记录对父表所做的更改。
2.	获取一个简短的ACCESS EXCLUSIVE锁以将父表上的触发器(用于插入、更新、删除)添加到审计表。
3.	创建一个新的影子表并在影子表上运行 ALTER/migration。
4.	复制旧表中的所有行。
5.	在新表上建立索引。
6.	针对影子表重放审计表中累积的所有更改。
7.	在重播时删除审计表中的行。
8.	一旦增量(剩余行)为 ~20 行,ACCESS EXCLUSIVE在事务中获取针对父表的锁,并且:
9.	交换表名(影子表 <> 父表)。
10.	通过删除并重新创建带有NOT VALID.
11.	ANALYZE在新表上运行。
12.	验证添加的所有 FK NOT VALID。
13.	删除父(现在旧)表(可选)。

 

 依赖ruby高版本, 在centos7上安装没成功,用它提供的docker也提示不兼容(可能是我工作站CPU太老导致的)

 

下面是我自己编译docker image的步骤,实测可以使用。

 

# 先挂下代理,加速下载

export https_proxy=http://192.168.31.16:7890 && export http_proxy=http://192.168.31.16:7890

 

编写 Dockerfile 文件内容如下

cat Dockerfile

FROM ubuntu:22.10
RUN apt-get update && apt-get install -y make gcc libpq-dev ruby ruby-dev
RUN gem install pg_online_schema_change
CMD []

 

打image

docker build .

 

打tag

# docker tag
9ddffecd4e3e pg-osc:lee

 

案例1 删列

export PGPASSWORD="dts"

docker run --network host -it --rm pg-osc:lee pg-online-schema-change perform --alter-statement 'ALTER TABLE t2 drop COLUMN purchased ; ALTER TABLE t2 drop COLUMN purchased2;' --dbname
"postgres" --schema "public" --host "192.168.31.181" --username "dts" --pull-batch-count 1000 --delta-count 20 --wait-time-for-lock 5 --kill-backends –drop

 

案例2 列改名

export PGPASSWORD="dts"

docker run --network host -it --rm pg-osc:lee pg-online-schema-change perform --alter-statement 'ALTER TABLE t2 RENAME COLUMN b to new_b;' --dbname "postgres" --schema "public" --host
"192.168.31.181" --username "dts" --pull-batch-count 1000 --delta-count 20 --wait-time-for-lock 5 --kill-backends --drop

 

一些参数说明:

--wait-time-for-lock 5 --kill-backends  如果该操作是在繁忙的表上执行的,则可以使用pg osc的kill后端功能来杀死可能与pg osc操作竞争的其他后端,以便在短时间内获取锁。pg osc获取的ACCESS EXCLUSIVE锁只持有一小段时间,之后释放。你可以调整pg osc在杀死其他后端之前应该等待多长时间(或者如果pg osc一开始就应该杀死后端)。


--pull-batch-count 2000 --delta-count 500 如果您有一个写容量很高的表,那么默认的重放迭代可能不够。也就是说,您可能会看到pg osc一次从审计表中回放1000行(pull batch count)。pg osc也在进行交换之前等待,直到审计表中的剩余行计数(delta计数)为20。您可以将这些值调整得更高,以便更快地赶上这类工作负载。

还有些其它功能(数据回填的时候执行自定义的sql逻辑),具体可以看官方文档。


参数清单:

Options:
  -a, --alter-statement=ALTER_STATEMENT        # The ALTER statement to perform the
schema change
  -s, --schema=SCHEMA                          # The schema in which
the table is
                                              
# Default: public
  -d, --dbname=DBNAME                          # Name of the
database
  -h, --host=HOST                              # Server host
where the Database is located
  -u, --username=USERNAME                      # Username for the
Database
  -p, --port=N                                 # Port for the
Database
                                              
# Default: 5432
  -w, --password=PASSWORD                      # DEPRECATED: Password
for the Database. Please pass PGPASSWORD environment variable instead.
  -v, [--verbose], [--no-verbose]              # Emit logs in debug mode
  -f, [--drop], [--no-drop]                    # Drop the original table
in the end after the swap
  -k, [--kill-backends],
[--no-kill-backends]  # Kill other
competing queries/backends when trying to acquire lock for the shadow table
creation and swap. It will wait for --wait-time-for-lock duration before
killing backends and try upto 3 times.
  -w, [--wait-time-for-lock=N]                 # Time to wait before killing
backends to acquire lock and/or retrying upto 3 times. It will kill backends if
--kill-backends is true, otherwise try upto 3 times and exit if it cannot
acquire a lock.
                                              
# Default: 10
  -c, [--copy-statement=COPY_STATEMENT]        # Takes a .sql file location where you
can provide a custom query to be played (ex: backfills) when pgosc copies data
from the primary to the shadow table. More examples in README.
  -b, [--pull-batch-count=N]                   # Number of rows to be
replayed on each iteration after copy. This can be tuned for faster catch up
and swap. Best used with delta-count.
                                               # Default: 1000
  -e, [--delta-count=N]                        # Indicates how many
rows should be remaining before a swap should be performed. This can be tuned
for faster catch up and swap, especially on highly volume tables. Best used
with pull-batch-count.
                                              
# Default: 20