在pt-osc的切换时候rename的过程会有短暂的表不存在的情况,应用程序可能会报错。
gh-ost 通过先创建一个切换表,一个会话 lock住原表和切换表,然后在另外一个会话执行rename操作导致rename被故意阻塞,然后第一个会话删除掉切换表,在unlock 之前的锁,第二个会话执行成功,这块有个疑问,在第一个会话unlock后,第二个还是执行的rename,怎么保证之前被阻塞的dml在执行rename的时候不会失败,这个问题答案在https://github.com/github/gh-ost/issues/82 这里有说明。应用这个特性后,可以保证rename先于dml执行,这样实现了原子rename.
2021-05-20 22:59:45 INFO Grabbing voluntary lock: gh-ost.99.lock
2021-05-20 22:59:45 INFO Setting LOCK timeout as 6 seconds
2021-05-20 22:59:45 INFO Looking for magic cut-over table
2021-05-20 22:59:45 INFO Creating magic cut-over table `db1`.`_sbtest1_20210520224218_del`
2021-05-20 22:59:45 INFO Magic cut-over table created
2021-05-20 22:59:45 INFO Locking `db1`.`sbtest1`, `db1`.`_sbtest1_20210520224218_del`
2021-05-20 22:59:45 INFO Tables locked
2021-05-20 22:59:45 INFO Session locking original & magic tables is 99
2021-05-20 22:59:45 INFO Writing changelog state: AllEventsUpToLockProcessed:1621522785483689000
2021-05-20 22:59:45 INFO Intercepted changelog state AllEventsUpToLockProcessed
2021-05-20 22:59:45 INFO Handled changelog state AllEventsUpToLockProcessed
2021-05-20 22:59:45 INFO Waiting for events up to lock
2021-05-20 22:59:46 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1621522785483689000
2021-05-20 22:59:46 INFO Done waiting for events up to lock; duration=594.370665ms
2021-05-20 22:59:46 INFO Setting RENAME timeout as 3 seconds
2021-05-20 22:59:46 INFO Session renaming tables is 102
2021-05-20 22:59:46 INFO Issuing and expecting this to block: rename /* gh-ost */ table `db1`.`sbtest1` to `db1`.`_sbtest1_20210520224218_del`, `db1`.`_sbtest1_gho` to `db1`.`sbtest1`
2021-05-20 22:59:46 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2021-05-20 22:59:46 INFO Checking session lock: gh-ost.99.lock
2021-05-20 22:59:46 INFO Connection holding lock on original table still exists
2021-05-20 22:59:46 INFO Will now proceed to drop magic table and unlock tables
2021-05-20 22:59:46 INFO Dropping magic cut-over table
2021-05-20 22:59:46 INFO Releasing lock from `db1`.`sbtest1`, `db1`.`_sbtest1_20210520224218_del`
2021-05-20 22:59:46 INFO Tables unlocked
2021-05-20 22:59:46 INFO Tables renamed
2021-05-20 22:59:46 INFO Lock & rename duration: 847.094157ms. During this time, queries on `sbtest1` were blocked
2021-05-20 22:59:46 INFO Closed streamer connection. err=<nil>
2021-05-20 22:59:46 INFO Dropping table `db1`.`_sbtest1_ghc`
2021-05-20 22:59:46 INFO Table dropped
2021-05-20 22:59:46 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2021-05-20 22:59:46 INFO -- drop table `db1`.`_sbtest1_20210520224218_del`
2021-05-20 22:59:46 INFO Done migrating `db1`.`sbtest1`
2021-05-20 22:59:46 INFO Removing socket file: /tmp/sbtest1.gh-ost.socket
2021-05-20 22:59:46 INFO Tearing down inspector
2021-05-20 22:59:46 INFO Tearing down applier
2021-05-20 22:59:46 ERROR Error 1146: Table 'db1._sbtest1_ghc' doesn't exist
2021-05-20 22:59:46 INFO Tearing down streamer
2021-05-20 22:59:46 INFO Tearing down throttler
下面是general log
2021-05-21T02:19:19.612586Z 125 Query START TRANSACTION
2021-05-21T02:19:19.612780Z 125 Query select connection_id()
2021-05-21T02:19:19.613102Z 125 Query select get_lock('gh-ost.125.lock', 0)
2021-05-21T02:19:19.613287Z 125 Query set session lock_wait_timeout:=6
2021-05-21T02:19:19.613478Z 135 Query show /* gh-ost */ table status from `db1` like '_sbtest1_20210521101744_del'
2021-05-21T02:19:19.613791Z 135 Query create /* gh-ost */ table `db1`.`_sbtest1_20210521101744_del` (
id int auto_increment primary key
) engine=InnoDB comment='ghost-cut-over-sentry'
2021-05-21T02:19:19.655425Z 137 Connect msandbox@localhost on db1 using TCP/IP
2021-05-21T02:19:19.655640Z 137 Query SET NAMES utf8mb4
2021-05-21T02:19:19.655749Z 137 Query SET autocommit=true
2021-05-21T02:19:19.655910Z 137 Query insert /* gh-ost */ into `db1`.`_sbtest1_ghc`
(id, hint, value)
values
(NULLIF(1, 0), 'heartbeat', '2021-05-21T10:19:19.654939+08:00')
on duplicate key update
last_update=NOW(),
value=VALUES(value)
2021-05-21T02:19:19.664292Z 125 Query lock /* gh-ost */ tables `db1`.`sbtest1` write, `db1`.`_sbtest1_20210521101744_del` write
2021-05-21T02:19:19.665650Z 135 Query insert /* gh-ost */ into `db1`.`_sbtest1_ghc`
(id, hint, value)
values
(NULLIF(2, 0), 'state', 'AllEventsUpToLockProcessed:1621563559665538000')
on duplicate key update
last_update=NOW(),
value=VALUES(value)
2021-05-21T02:19:20.580124Z 135 Query START TRANSACTION
2021-05-21T02:19:20.580408Z 135 Query select connection_id()
2021-05-21T02:19:20.580604Z 135 Query set session lock_wait_timeout:=3
2021-05-21T02:19:20.580807Z 135 Query rename /* gh-ost */ table `db1`.`sbtest1` to `db1`.`_sbtest1_20210521101744_del`, `db1`.`_sbtest1_gho` to `db1`.`sbtest1`
2021-05-21T02:19:20.581093Z 138 Connect msandbox@localhost on db1 using TCP/IP
2021-05-21T02:19:20.604537Z 138 Query SET autocommit=true
2021-05-21T02:19:20.604697Z 138 Query SET NAMES utf8mb4
2021-05-21T02:19:20.604941Z 138 Query select id
from information_schema.processlist
where
id != connection_id()
and 135 in (0, id)
and state like concat('%', 'metadata lock', '%')
and info like concat('%', 'rename', '%')
2021-05-21T02:19:20.605543Z 137 Query select is_used_lock('gh-ost.125.lock')
2021-05-21T02:19:20.605768Z 125 Query drop /* gh-ost */ table if exists `db1`.`_sbtest1_20210521101744_del`
2021-05-21T02:19:20.655253Z 138 Query insert /* gh-ost */ into `db1`.`_sbtest1_ghc`
(id, hint, value)
values
(NULLIF(1, 0), 'heartbeat', '2021-05-21T10:19:20.65504+08:00')
on duplicate key update
last_update=NOW(),
value=VALUES(value)
2021-05-21T02:19:20.662734Z 125 Query unlock tables
2021-05-21T02:19:20.663224Z 125 Query ROLLBACK
2021-05-21T02:19:20.761440Z 138 Quit
2021-05-21T02:19:20.761575Z 137 Query insert /* gh-ost */ into `db1`.`_sbtest1_ghc`
(id, hint, value)
values
(NULLIF(1, 0), 'heartbeat', '2021-05-21T10:19:20.761404+08:00')
on duplicate key update
last_update=NOW(),
value=VALUES(value)
2021-05-21T02:19:20.804735Z 135 Query ROLLBACK
插入到影子表的语句
insert /* gh-ost `db1`.`sbtest1` */ ignore into `db1`.`_sbtest1_gho` (`id`, `k`, `c`, `pad`)
(select `id`, `k`, `c`, `pad` from `db1`.`sbtest1` force index (`PRIMARY`)
where (((`id` > _binary'1') or ((`id` = _binary'1'))) and ((`id` < _binary'100000') or ((`id` = _binary'100000')))) lock in share mode
)
默认情况下ghost是链接到从库上,接受从库的binlog进行解析,如果有很大的延时,那么必定会延缓rename,gh-ost 是通过在ghc变更表中插入一条状态记录,然后在从库的binlog中解析道了这个记录,就会更新状态,进行通信通知进行切换。