gh-ost 学习笔记


参考资料:

https://m.aliyun.com/yunqi/articles/62928

https://m.aliyun.com/yunqi/articles/115565?spm=5176.100239.0.0.bIL3Yr

https://yq.aliyun.com/articles/62928   【推荐】

https://github.com/github/gh-ost/tree/master/doc  【推荐】

https://github.com/wing324/helloworld_zh/blob/master/MySQL/gh-ost/GitHub%E5%BC%80%E6%BA%90MySQL%20Online%20DDL%E5%B7%A5%E5%85%B7gh-ost%E5%8F%82%E6%95%B0%E8%A7%A3%E6%9E%90.md

http://blog.csdn.net/qq_36281740/article/details/52711284



下面是一个添加字段的 general_log 截图:

grant ALL on *.* to ptosc@'10.0.20.%' identified by 'ptosc';   -- 添加gh-ost操作的授权

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 -allow-on-master -database="hellodb" -debug -table="coc" -alter "add column cl111 char(20) not null" -assume-rbr -execute --initially-drop-old-table --ok-to-drop-table


image.pngimage.pngimage.png

image.png


部分表说明:

coc  原始表

_coc_del  原先的原始表,alter操作完成时,会自动将这个表重命名为_coc_del 【安全起见,默认gh-ost不会删除这个文件】

_coc_gho  影子表,alter操作完成时,会自动将这个表重命名为coc

_coc_ghc  记录操作进度等信息,其表中的内容类似下图:

image.png




过程 https://github.com/github/gh-ost 

image.png




连接到主库的大致的操作过程如下:

1、gh-ost程序连接到主库,检查账号权限,mysqld版本,估算需要操作的表的行数。

2、初始化stream连接,监听binlog,并记录gh-ost操作前的binlog位置。 初始化Applier连接。

3、删除_xxx_ghc表(如果之前存在),重建这个ghc表,用于记录gh-ost的操作changelog。

3、创建_xxx_gho表(最终表),对gho表进行ALTER操作,然后复制原表数据到gho表。(这ALTER期间,gh-ost模拟成slave,将操作期间的DML产生的binlog event获取到,并应用到gho表上)

4、【cut-over新旧表切换阶段】重命名xxx表为_coc_del表,重命名_xxx_gho表为coc表【rename表的瞬间是锁表的】

5、删除_xxx_ghc表

6、删除_xxx_del表 (默认不会删除原始表,需要加-ok-to-drop-table参数)



迁移和切换的细节实现:

关于gh-ost的实现,这里只挑了rowcopy和binlog apply的顺序问题和rename过程做了详细解析。

数据迁移过程

在数据迁移的过程中,数据变量有三个,暂且分为,A:来自原表的rowcopy,B:binlog的apply,C:对原表的dml操作。

C操作会记录binglog从而触发B操作,所以B操作一定在C操作的后面,因此一般情况下,会有ACB,CBA两种组合,同时特殊情况如binlog apply延迟,则会有CAB这种组合。

分析三种组合之前要先了解gh-ost在sql改写方面是如何映射的:

RowCopy

原表操作

新表操作


select

insert ignore into

BinlogApply

原表操作

新表操作


insert

replace into


update

update 新表(全行更新)


delete

delete

在上述原则的基础上,我们再来逐个分析不同顺序组合的影响:

1.insert 操作

binlog是最权威的,gh-ost的原则是以binlog优先,所以无论任何顺序下,数据都是和binlog保持一致,如果rowcopy在后,会insert ignore,如果binlog apply在后会replace into掉。

2.update/delete 操作

一般情况下:

ACB组合,即对已经rowcopy过的数据,出现对原表的update/delete操作。这时候会全部通过binlog apply执行,注意binlog apply的update是对某一条记录的全部列覆盖更新,所以不会有累加的问题。

CBA组合,即对尚未迁移的数据,出现对原表的update/delete操作。这时候对新表的binlog apply会是空操作,具体数据由rowcopy迁移。

特殊情况下:

CAB组合,即先对原表更新完以后,rowcopy在binlog apply之前把数据迁移了过去,而在binlog event过来以后,会再次应用,这里有问题?其实结合gh-ost的binlog aplly的sql映射规则,insert操作会被replace重新替换掉,update 会更新对应记录全部行,delete 会是空操作。最终数据还是一致的状态。

cut-over过程:

在pt-osc或者online ddl中,最后的rename操作一般是耗时比较短,但如果表结构变更过程中,有大查询进来,那么在rename操作的时候,会触发MDL锁的等待,如果在高峰期,这就是个严重的问题。所以gh-ost是怎么做的呢?

gh-ost利用了MySQL的一个特性,就是原子性的rename请求,在所有被blocked的请求中,优先级永远是最高的。gh-ost基于此设计了该方案:一个连接对原表加锁,另启一个连接尝试rename操作,此时会被阻塞住,当释放lock的时候,rename会首先被执行,其他被阻塞的请求会继续应用到新表。

migrator.go:iterateChunks() 函数来确定何时开始 cut-over

具体切换流程如下:

START

  1. 会话A

    1. CREATE table tbl_old

    防止rename过早执行

    1. LOCK TABLES tbl WRITE, tbl_old WRITE

    通过lock_wait_timeout设置为2s控制超时,超时失败会重试次数为配置default-retries,默认60次

  2. 新的请求进来,关于原表的请求被blocked

  3. RENAME TABLE tbl TO tbl_old, ghost TO tbl , 同样被blocked

  4. 新的请求进来,关于原表的请求被blocked

  5. 检查是否有blocked 的RENAME请求,通过show processlist

  6. 会话A: DROP TABLE tbl_old

  7. 会话A: UNLOCK TABLES

  8. RENAME SUCCESS

session1

session2

begin

begin;

lock tables `t1` write, `_t1_del` write

rename table `t1` to `_t1_del`, `_t1_gho` to `t1` -- 会被阻塞

drop table if exists `_t1_del`


unlock tables;



session1释放锁后,session会 rename 成功

不同阶段失败后如何处理:

  • 如果第一步失败,退出程序

  • 如果会话A建表成功,加锁失败,退出程序,未加锁

  • rename请求来的时候,会话A死掉,lock会自动释放,同时因为tbl_old的存在rename也会失败,所有请求恢复正常

  • rename被blocked的时候,会话A死掉,lock会自动释放,同样因为tbl_old的存在,rename会失败,所有请求恢复正常

  • rename死掉,gh-ost会捕获不到rename,会话A继续运行,释放lock,所有请求恢复正常




gh-ost要的权限比较大,需要严格控制访问的IP

grant ALL on *.* to ptosc@'10.0.20.%' identified by 'ptosc';   # 如果gh-ost只在主库通过SSH终端执行的话,可以给本机地址授权即可。

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 -allow-on-master -database="hellodb" -debug -table="coc" -alter "add column cl1 varchar(20) not null default \"test col1\" " -assume-rbr -ok-to-drop-table -serve-socket-file="/tmp/gh-ost.hellodb.coc.sock" -execute


gh-ost有三种工作模式:

a:连接到从库,在主库做迁移。

b:连接到主库,迁移过程所有操作都在主上操作,包括读取binlog等等。【推荐使用】

c:在从库做迁移测试。


三种模式对比:

三种方法各有优缺点,但我只关心缺点,先说a的缺点,a会在从上面读取binlog,但数据库主从数据为什么会造成不一致,一个很重要的原因是主库的binlog没有完全在从库执行。所以个人感觉a方法有丢失数据的风险。

b方法任何操作都会再主库操作,或多或少会对主库负载造成影响,但是可以通过调整一些参数降低和时刻关注这些影响,所以个人推荐使用b方法。

至于c方法是偏向测试用的,这里不做过多介绍,但是c方法里有一个细节,cut-over阶段有会stop slave一个操作,其实这个操作风险特别高,有时stop slave 时间会很长,务必会对线上数据库使用造成影响,所以如果使用c方法做测试也要在线下数据库。



gh-ost 参数详解:

  -allow-master-master  允许在双主环境下使用,貌似现在生产环境双主用的比较少,一般都是MHA

        explicitly allow running in a master-master setup


  -allow-nullable-unique-key  允许gh-ost在数据迁移(migrate)依赖的唯一键可以为NULL,默认为不允许为NULL的唯一键。如果数据迁移(migrate)依赖的唯一键允许NULL值,则可能造成数据不正确,请谨慎使用。

        allow gh-ost to migrate based on a unique key with nullable columns. As long as no NULL values exist, this should be OK. If NULL values exist in chosen key, data may be corrupted. Use at your own risk!


  -allow-on-master   允许在主库执行【gh-ost默认会去从库执行命令,这个选项会强制在主库执行,推荐这种方式使用gh-ost】

        allow this migration to run directly on master. Preferably it would run on a replica


  -alter string   这里跟具体的alter语句【需要用双引号或者单引号括起来】

        alter statement (mandatory)


  -approve-renamed-columns ALTER    给列改名的话,必须加这个选项

        in case your ALTER statement renames columns, gh-ost will note that and offer its interpretation of the rename. By default gh-ost does not proceed to execute. This flag approves that gh-ost's interpretation is correct

    例如:

        ./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 -allow-on-master -database="hellodb" -table="coc" -approve-renamed-columns -alter "change column c1 newc1  varchar(100) not null default \"测试字段1\" COMMENT \"测试注释1\""  -assume-rbr -execute -ok-to-drop-table    操作完成后删除旧表

   

  -assume-master-host string

        (optional) explicitly tell gh-ost the identity of the master. Format:some.host.com[:port] This is useful in master-master setups where you wish to pick an explicit master, or in a tungsten-replicator where gh-ost is unabel to determine the master


  -assume-rbr  确认gh-ost连接的数据库实例的binlog_format=ROW的情况下,可以指定-assume-rbr,这样可以禁止从库上运行stop slave,start slave,执行gh-ost用户也不需要SUPER权限。

        set to 'true' when you know for certain your server uses 'ROW' binlog_format. gh-ost is unable to tell, event after reading binlog_format, whether the replication process does indeed use 'ROW', and restarts replication to be certain RBR setting is applied. Such operation requires SUPER privileges which you might not have. Setting this flag avoids restarting replication and you can proceed to use gh-ost without SUPER privileges


  -check-flag   检查某个参数是否支持??

        Check if another flag exists/supported. This allows for cross-version scripting. Exits with 0 when all additional provided flags exist, nonzero otherwise. You must provide (dummy) values for flags that require a value. Example: gh-ost --check-flag --cut-over-lock-timeout-seconds --nice-ratio 0


  -chunk-size int  在每次迭代中处理的行数量(允许范围:100-100000),默认值为1000

        amount of rows to handle in each iteration (allowed range: 100-100,000) (default 1000)


  -concurrent-rowcount  该参数如果为True(默认值),则进行row-copy之后,估算统计行数(使用explain select count(*)方式),并调整ETA时间,否则,gh-ost首先预估统计行数,然后开始row-copy。

        (with --exact-rowcount), when true (default): count rows after row-copy begins, concurrently, and adjust row estimate later on; when false: first count rows, then start row copy (default true)


  -conf string   可以指定gh-ost默认去读取配置的文件【一般直接在命令行指定DDL使用的选项即可】


  -critical-load string   一系列逗号分隔的status-name=values组成,当MySQL中status超过对应的values,gh-ost将会退出。【用的较少】

        Comma delimited status-name=threshold, same format as --max-load. When status exceeds threshold, app panics and quits

        例如:  -critical-load "Threads_running=1000,threads_connected=5000"  指的是当MySQL中的状态值Threads_running>1000,threads_connected>5000的时候,gh-ost将会由于该数据库严重负载而停止并退出。


  -critical-load-hibernate-seconds int  【用的较少】

        When nonzero, critical-load does not panic and bail out; instead, gh-ost goes into hibernate for the specified duration. It will not read/write anything to from/to any server


  -critical-load-interval-millis int    【用的较少】

        When 0, migration immediately bails out upon meeting critical-load. When non-zero, a second check is done after given interval, and migration only bails out if 2nd check still meets critical load 。当值为0时,当达到-critical-load,gh-ost立即退出。当值不为0时,当达到-critical-load,gh-ost会在-critical-load-interval-millis秒数后,再次进行检查,再次检查依旧达到-critical-load,gh-ost将会退出。


  -cut-over string    选择cut-over类型:atomic/two-step,atomic(默认)类型是github的算法,two-step采用的是facebook-OSC的算法。

        choose cut-over type (default|atomic, two-step) (default "atomic")


  -cut-over-lock-timeout-seconds int       gh-ost在cut-over阶段最大的锁等待时间,当锁超时时,gh-ost的cut-over将重试。(默认值:3)

        Max number of seconds to hold locks on tables while attempting to cut-over (retry attempted when lock exceeds timeout) (default 3)


  -database string       指定需要处理的库名

  -debug    设置日志的级别

  -default-retries int   各种操作在panick前重试次数。(默认为60次)

        Default number of retries for various operations before panicking (default 60)


  -discard-foreign-keys  【危险参数】对一个有外键的表,带该参数执行gh-ost后,外键就会丢失。该参数很适合用于删除外键,除此之外,请谨慎使用

        DANGER! This flag will migrate a table that has foreign keys and will NOT create foreign keys on the ghost table, thus your altered table will have NO foreign keys. This is useful for intentional dropping of foreign keys


  -dml-batch-size int  待确认具体的含义

        batch size for DML events to apply in a single transaction (range 1-100) (default 10)


  -exact-rowcount    使用count(*)计算表行数,而不是从information_schema表里估计行数(用于更精确的进度估算)

        actually count table rows as opposed to estimate them (results in more accurate progress estimation)


  -execute   真实执行ALTER操作,不加的话就是dry-run测试运行


  -force-named-cut-over

        When true, the 'unpostpone|cut-over' interactive command must name the migrated table


  -force-table-names string

        table name prefix to be used on the temporary tables


  -heartbeat-interval-millis int       gh-ost心跳频率值,默认为100秒

        how frequently would gh-ost inject a heartbeat value (default 100)


  -hooks-hint string

        arbitrary message to be injected to hooks via GH_OST_HOOKS_HINT, for your convenience


  -hooks-path string

        directory where hook files are found (default: empty, ie. hooks disabled). Hook files found on this path, and conforming to hook naming conventions will be executed


  -host string        主机地址   MySQL hostname (preferably a replica, not the master) (default "127.0.0.1")


  -initially-drop-ghost-table    执行gh-ost时候,先删除同名gho表(如果存在的话)【危险,慎用】

        Drop a possibly existing Ghost table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists


  -initially-drop-old-table      gh-ost操作之前,检查并删除已经存在的旧表。【危险,慎用】

        Drop a possibly existing OLD table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists


  -initially-drop-socket-file    gh-ost强制删除已经存在的socket文件。该参数不建议使用,可能会删除一个正在运行的gh-ost程序,导致DDL失败。【危险,慎用】

        Should gh-ost forcibly delete an existing socket file. Be careful: this might drop the socket file of a running migration!


  -master-password string

        MySQL password on master, if different from that on replica. Requires --assume-master-host

  -master-user string

        MySQL user on master, if different from that on replica. Requires --assume-master-host

  -user string   MySQL用户名

  -max-lag-millis int     主从复制最大延迟时间,当主从复制延迟时间超过该值后,gh-ost将采取节流(throttle)措施,默认值:1500s

        replication lag at which to throttle operation (default 1500)


  -max-load string       一系列逗号分隔的status-name=values组成,当MySQL中status超过对应的values,gh-ost将采取节流(throttle)措施。

        Comma delimited status-name=threshold. e.g: 'Threads_running=100,Threads_connected=500'. When status exceeds threshold, app throttles writes

              例如: -max-load "Threads_running=20,Connections=1500"  指的是当MySQL中的状态值Threads_running>20,Connections>1500的时候,gh-ost将采取节流(throttle)措施。


  -migrate-on-replica     在从库上执行gh-ost命令

        Have the migration run on a replica, not on the master. This will do the full migration on the replica including cut-over (as opposed to --test-on-replica)


  -nice-ratio float   每次chunk时间段的休眠时间。【0:每个chunk时间段不休眠;1:每row-copy 1毫秒,则另外休眠1毫秒;0.7:每row-copy 10毫秒,则另外休眠7毫秒】

        force being 'nice', imply sleep time per chunk time; range: [0.0..100.0]. Example values: 0 is aggressive. 1: for every 1ms spent copying rows, sleep additional 1ms (effectively doubling runtime); 0.7: for every 10ms spend in a rowcopy chunk, spend 7ms sleeping immediately after


  -ok-to-drop-table   gh-ost操作结束后,删除旧表,默认状态是不删除旧表,会存在_tablename_del表

        Shall the tool drop the old table at end of operation. DROPping tables can be a long locking operation, which is why I'm not doing it by default. I'm an online tool, yes?

  -panic-flag-file string    当这个文件被创建,gh-ost将会立即退出,且不会清理产生的临时文件及socket文件 【慎用】

        when this file is created, gh-ost will immediately terminate, without cleanup

  -password string   MySQL密码

  -port int         MySQL端口

  -postpone-cut-over-flag-file string   当这个文件存在的时候,gh-ost的cut-over阶段将会被推迟,直到该文件被删除。

        while this file exists, migration will postpone the final stage of swapping tables, and will keep on syncing the ghost table. Cut-over/swapping would be ready to perform the moment the file is deleted.


  -quiet   静默模式

  -replica-server-id uint

        server id used by gh-ost process. Default: 99999 (default 99999)


  -replication-lag-query string

        Deprecated. gh-ost uses an internal, subsecond resolution query


  -serve-socket-file string

        Unix socket file to serve on. Default: auto-determined and advertised upon startup


  -serve-tcp-port int

        TCP port to serve on. Default: disabled


  -skip-foreign-key-checks      跳过外键检查,除非你确认待操作的表没有外键【危险,慎用】

        set to 'true' when you know for certain there are no foreign keys on your table, and wish to skip the time it takes for gh-ost to verify that


  -skip-renamed-columns ALTER  该参数告诉gh-ost跳该列的数据迁移,让gh-ost把重命名列作为无关紧要的列。该操作很危险,你会损失该列的所有值。【危险,慎用】

        in case your ALTER statement renames columns, gh-ost will note that and offer its interpretation of the rename. By default gh-ost does not proceed to execute. This flag tells gh-ost to skip the renamed columns, i.e. to treat what gh-ost thinks are renamed columns as unrelated columns. NOTE: you may lose column data

        例如:gh-ost -user="wing" -host="127.0.0.1" -port=3306 -database="wing" -table="t" -password="wing" -alter="change column c1 col1 int not null default 0" -assume-rbr -execute -skip-renamed-columns ALTER    这个ALTER操作会丢弃到c1列的原先全部数据


  -stack     添加错误堆栈追踪


  -switch-to-rbr     让gh-ost自动将从库的binlog_format转换为ROW格式【生产环境都是用row格式了,基本上不用加这个参数】

        let this tool automatically switch binary log format to 'ROW' on the replica, if needed. The format will NOT be switched back. I'm too scared to do that, and wish to protect you if you happen to execute another migration while this one is running


  -table string    待ALTER操作的表名

  -test-on-replica     在从库上测试gh-ost,包括在从库上数据迁移(migration),数据迁移完成后stop slave,原表和ghost表立刻交换而后立刻交换回来。继续保持stop slave,使你可以对比两张表。

        Have the migration run on a replica, not on the master. At the end of migration replication is stopped, and tables are swapped and immediately swap-revert. Replication remains stopped and you can compare the two tables for building trust


  -test-on-replica-skip-replica-stop    当-test-on-replica执行时,该参数表示该过程中不用stop slave。

        When --test-on-replica is enabled, do not issue commands stop replication (requires --test-on-replica)


  -throttle-additional-flag-file string  

        当该文件被创建后,gh-ost操作立即停止。该参数可以用在多个gh-ost同时操作的时候,创建一个文件,让所有的gh-ost操作停止,或者删除这个文件,让所有的gh-ost操作恢复。


  -throttle-control-replicas string  

        列出所有需要被检查slave lag的从库,如:-throttle-control-replica=192.16.12.22:3306,192.16.12.23:3307,192.16.13.12:3308


  -throttle-flag-file string  

        当该文件被创建后,gh-ost操作立即停止。该参数适合控制单个gh-ost操作。


  -throttle-http string

        when given, gh-ost checks given URL via HEAD request; any response code other than 200 (OK) causes throttling; make sure it has low latency response


  -throttle-query string   节流查询。每秒钟执行一次。当返回值=0时不需要节流,当返回值>0时,需要执行节流操作。该查询会在数据迁移(migrated)服务器上操作,所以请确保该查询是轻量级的。

        when given, issued (every second) to check if operation should throttle. Expecting to return zero for no-throttle, >0 for throttle. Query is issued on the migrated server. Make sure this query is lightweight

 

 -timestamp-old-table

        Use a timestamp in old table name. This makes old table names unique and non conflicting cross migrations


  -tungsten   告诉gh-ost你正在运行的是一个tungsten-replication拓扑结构。

        explicitly let gh-ost know that you are running on a tungsten-replication based topology (you are likely to also provide --assume-master-host)




  -verbose   gh-ost执行时候输出详细信息




示例:

主库:10.0.20.25

从库:10.0.20.26

在主库添加用于alter操作的账号:grant ALL on *.* to ptosc@'10.0.20.25' identified by 'ptosc' ;

### 注意:下面的命令都是在主库执行的。参数里面 - 和-- 的一样用,没任何区别。  另外,操作的表上必须要有主键。

./gh-ost -version  使用的是1.0.42版本。  


添加字段:

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 \
-max-load=Threads_running=25 \
-critical-load=Threads_running=1000 \
-chunk-size=1000 \
-allow-on-master --verbose -database="hellodb" -table="coc" \
-alter " add column c1 varchar(100) not null default \"测试字段1\" COMMENT \"测试注释1\"" \
-assume-rbr -ok-to-drop-table -execute



修改某个字段的字符集:

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 \
-allow-on-master --verbose -database="hellodb" -table="coc" \
-alter "change c1 c1  varchar(128) CHARACTER SET utf8mb4" \
-assume-rbr -ok-to-drop-table -execute



删除字段:

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 \
-allow-on-master --verbose -database="hellodb" -table="coc" \
-alter "drop column c1" \
-assume-rbr -ok-to-drop-table -execute



字段改名:

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 \
-allow-on-master --verbose -database="hellodb" -table="coc" \
-approve-renamed-columns \
-alter "change column c1 newc1 varchar(100) not null default \"测试字段1\" COMMENT \"测试注释1\"" \
-assume-rbr -ok-to-drop-table -execute



添加索引:

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 \
-allow-on-master --verbose  -database="hellodb" -table="coc" \
-alter "add index idx_newc1_classid(newc1,classid)" \
-assume-rbr -ok-to-drop-table -execute



删除索引:

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 \
-allow-on-master --verbose -database="hellodb" -table="coc" \
-alter "drop index idx_newc1_classid" \
-assume-rbr -ok-to-drop-table -execute



修改表引擎:

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 \
-allow-on-master --verbose -database="hellodb" -table="coc" \
-alter "engine=InnoDB" \
-assume-rbr -ok-to-drop-table -execute



删除字段的同时不要删除老表:

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 \
-allow-on-master --verbose -database="hellodb" -table="coc" \
-alter "drop column newc1" \
-assume-rbr -execute