在pt-osc的切换时候rename的过程会有短暂的表不存在的情况,应用程序可能会报错。
gh-ost 通过先创建一个切换表,一个会话 lock住原表和切换表,然后在另外一个会话执行rename操作导致rename被故意阻塞,然后第一个会话删除掉切换表,在unlock 之前的锁,第二个会话执行成功,这块有个疑问,在第一个会话unlock后,第二个还是执行的rename,怎么保证之前被阻塞的dml在执行rename的时候不会失败,这个问题答案在https://github.com/github/gh-ost/issues/82 这里有说明。应用这个特性后,可以保证rename先于dml执行,这样实现了原子rename.

2021-05-20 22:59:45 INFO Grabbing voluntary lock: gh-ost.99.lock
2021-05-20 22:59:45 INFO Setting LOCK timeout as 6 seconds
2021-05-20 22:59:45 INFO Looking for magic cut-over table
2021-05-20 22:59:45 INFO Creating magic cut-over table `db1`.`_sbtest1_20210520224218_del`
2021-05-20 22:59:45 INFO Magic cut-over table created
2021-05-20 22:59:45 INFO Locking `db1`.`sbtest1`, `db1`.`_sbtest1_20210520224218_del`
2021-05-20 22:59:45 INFO Tables locked
2021-05-20 22:59:45 INFO Session locking original & magic tables is 99
2021-05-20 22:59:45 INFO Writing changelog state: AllEventsUpToLockProcessed:1621522785483689000
2021-05-20 22:59:45 INFO Intercepted changelog state AllEventsUpToLockProcessed
2021-05-20 22:59:45 INFO Handled changelog state AllEventsUpToLockProcessed
2021-05-20 22:59:45 INFO Waiting for events up to lock
2021-05-20 22:59:46 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1621522785483689000
2021-05-20 22:59:46 INFO Done waiting for events up to lock; duration=594.370665ms
2021-05-20 22:59:46 INFO Setting RENAME timeout as 3 seconds
2021-05-20 22:59:46 INFO Session renaming tables is 102
2021-05-20 22:59:46 INFO Issuing and expecting this to block: rename /* gh-ost */ table `db1`.`sbtest1` to `db1`.`_sbtest1_20210520224218_del`, `db1`.`_sbtest1_gho` to `db1`.`sbtest1`
2021-05-20 22:59:46 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2021-05-20 22:59:46 INFO Checking session lock: gh-ost.99.lock
2021-05-20 22:59:46 INFO Connection holding lock on original table still exists
2021-05-20 22:59:46 INFO Will now proceed to drop magic table and unlock tables
2021-05-20 22:59:46 INFO Dropping magic cut-over table
2021-05-20 22:59:46 INFO Releasing lock from `db1`.`sbtest1`, `db1`.`_sbtest1_20210520224218_del`
2021-05-20 22:59:46 INFO Tables unlocked
2021-05-20 22:59:46 INFO Tables renamed
2021-05-20 22:59:46 INFO Lock & rename duration: 847.094157ms. During this time, queries on `sbtest1` were blocked
2021-05-20 22:59:46 INFO Closed streamer connection. err=<nil>
2021-05-20 22:59:46 INFO Dropping table `db1`.`_sbtest1_ghc`
2021-05-20 22:59:46 INFO Table dropped
2021-05-20 22:59:46 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2021-05-20 22:59:46 INFO -- drop table `db1`.`_sbtest1_20210520224218_del`
2021-05-20 22:59:46 INFO Done migrating `db1`.`sbtest1`
2021-05-20 22:59:46 INFO Removing socket file: /tmp/sbtest1.gh-ost.socket
2021-05-20 22:59:46 INFO Tearing down inspector
2021-05-20 22:59:46 INFO Tearing down applier
2021-05-20 22:59:46 ERROR Error 1146: Table 'db1._sbtest1_ghc' doesn't exist
2021-05-20 22:59:46 INFO Tearing down streamer
2021-05-20 22:59:46 INFO Tearing down throttler

下面是general log

2021-05-21T02:19:19.612586Z       125 Query     START TRANSACTION
2021-05-21T02:19:19.612780Z       125 Query     select connection_id()
2021-05-21T02:19:19.613102Z       125 Query     select get_lock('gh-ost.125.lock', 0)
2021-05-21T02:19:19.613287Z       125 Query     set session lock_wait_timeout:=6
2021-05-21T02:19:19.613478Z       135 Query     show /* gh-ost */ table status from `db1` like '_sbtest1_20210521101744_del'
2021-05-21T02:19:19.613791Z       135 Query     create /* gh-ost */ table `db1`.`_sbtest1_20210521101744_del` (
                        id int auto_increment primary key
                ) engine=InnoDB comment='ghost-cut-over-sentry'
2021-05-21T02:19:19.655425Z       137 Connect   msandbox@localhost on db1 using TCP/IP
2021-05-21T02:19:19.655640Z       137 Query     SET NAMES utf8mb4
2021-05-21T02:19:19.655749Z       137 Query     SET autocommit=true
2021-05-21T02:19:19.655910Z       137 Query     insert /* gh-ost */ into `db1`.`_sbtest1_ghc`
                                (id, hint, value)
                        values
                                (NULLIF(1, 0), 'heartbeat', '2021-05-21T10:19:19.654939+08:00')
                        on duplicate key update
                                last_update=NOW(),
                                value=VALUES(value)
2021-05-21T02:19:19.664292Z       125 Query     lock /* gh-ost */ tables `db1`.`sbtest1` write, `db1`.`_sbtest1_20210521101744_del` write
2021-05-21T02:19:19.665650Z       135 Query     insert /* gh-ost */ into `db1`.`_sbtest1_ghc`
                                (id, hint, value)
                        values
                                (NULLIF(2, 0), 'state', 'AllEventsUpToLockProcessed:1621563559665538000')
                        on duplicate key update
                                last_update=NOW(),
                                value=VALUES(value)
2021-05-21T02:19:20.580124Z       135 Query     START TRANSACTION
2021-05-21T02:19:20.580408Z       135 Query     select connection_id()
2021-05-21T02:19:20.580604Z       135 Query     set session lock_wait_timeout:=3
2021-05-21T02:19:20.580807Z       135 Query     rename /* gh-ost */ table `db1`.`sbtest1` to `db1`.`_sbtest1_20210521101744_del`, `db1`.`_sbtest1_gho` to `db1`.`sbtest1`
2021-05-21T02:19:20.581093Z       138 Connect   msandbox@localhost on db1 using TCP/IP
2021-05-21T02:19:20.604537Z       138 Query     SET autocommit=true
2021-05-21T02:19:20.604697Z       138 Query     SET NAMES utf8mb4
2021-05-21T02:19:20.604941Z       138 Query     select id
                        from information_schema.processlist
                        where
                                id != connection_id()
                                and 135 in (0, id)
                                and state like concat('%', 'metadata lock', '%')
                                and info  like concat('%', 'rename', '%')
2021-05-21T02:19:20.605543Z       137 Query     select is_used_lock('gh-ost.125.lock')
2021-05-21T02:19:20.605768Z       125 Query     drop /* gh-ost */ table if exists `db1`.`_sbtest1_20210521101744_del`
2021-05-21T02:19:20.655253Z       138 Query     insert /* gh-ost */ into `db1`.`_sbtest1_ghc`
                                (id, hint, value)
                        values
                                (NULLIF(1, 0), 'heartbeat', '2021-05-21T10:19:20.65504+08:00')
                        on duplicate key update
                                last_update=NOW(),
                                value=VALUES(value)
2021-05-21T02:19:20.662734Z       125 Query     unlock tables
2021-05-21T02:19:20.663224Z       125 Query     ROLLBACK
2021-05-21T02:19:20.761440Z       138 Quit
2021-05-21T02:19:20.761575Z       137 Query     insert /* gh-ost */ into `db1`.`_sbtest1_ghc`
                                (id, hint, value)
                        values
                                (NULLIF(1, 0), 'heartbeat', '2021-05-21T10:19:20.761404+08:00')
                        on duplicate key update
                                last_update=NOW(),
                                value=VALUES(value)
2021-05-21T02:19:20.804735Z       135 Query     ROLLBACK

插入到影子表的语句

 insert /* gh-ost `db1`.`sbtest1` */ ignore into `db1`.`_sbtest1_gho` (`id`, `k`, `c`, `pad`)
      (select `id`, `k`, `c`, `pad` from `db1`.`sbtest1` force index (`PRIMARY`)
        where (((`id` > _binary'1') or ((`id` = _binary'1'))) and ((`id` < _binary'100000') or ((`id` = _binary'100000')))) lock in share mode
      )

默认情况下ghost是链接到从库上,接受从库的binlog进行解析,如果有很大的延时,那么必定会延缓rename,gh-ost 是通过在ghc变更表中插入一条状态记录,然后在从库的binlog中解析道了这个记录,就会更新状态,进行通信通知进行切换。