gh-ost实战运用

一、安装步骤

1、环境

go版本:1.10.3
gh-ost版本:1.0.46

2、安装go语言

# 安装go依赖包
yum install bison ed gawk gcc libc6-dev make -y

# 配置go环境变量
vim ~/.bashrc
export GOROOT=/usr/local/go
export PATH=$PATH:$GOROOT/bin
export GOPATH=/usr/local/go/src/github.com/github/gh-ost
# 使环境变量生效
source ~/.bashrc

# 解压go安装包
安装包在10.135.2.217:data/online/software/go1.10.3.linux-amd64.tar.gz

tar -zxvf go1.10.3.linux-amd64.tar.gz -C /usr/local/

3、安装gh-ost

安装包在:10.135.2.217:data/online/software/gh-ost-binary-linux-20180527215024.tar.gz
tar -zxvf gh-ost-binary-linux-20180527215024.tar.gz -C /usr/local
ln -s /usr/local/gh-ost /usr/bin/gh-ost

二、主库模式

1、常用命令

gh-ost \
   --max-load=Threads_running=16 \
   --critical-load=Threads_running=32 \
   --chunk-size=1000  \
   --initially-drop-old-table \
   --initially-drop-ghost-table \
   --initially-drop-socket-file \
   --ok-to-drop-table \
   --host="10.249.5.39" \
   --port=3306 \
   --user="dbadmin" \
   --password="12345" \
   --assume-rbr \
   --allow-on-master \
   --assume-master-host=10.249.5.39:3306 \
   --database="gh_ost" \
   --table="gh_01" \
   --alter="add column c4 varchar(50) not null default ''" \   
   --panic-flag-file=/tmp/ghost.panic.flag \
   --serve-socket-file=/tmp/ghost.sock \
   --verbose \
   --execute

2、常用参数解释

-allow-on-master
      默认是在从库上应用binlog,如果直接在主库上执行,必须设置该参数
-max-load string
      string是状态表达式,当设置多个状态值,用逗号分隔,如'Threads_running=100,Threads_connected=500',当超过该值,迁移暂停等待
-critical-load string
      与max-load不同的是,当超过该值,迁移直接停止并退出
-chunk-size int
      每次从原表迭代迁移数据的行数(允许值:100-100000) (默认1000)
-initially-drop-ghost-table
      在本次操作前删除可能存在的ghost表(可能之前留下的),默认如果存在就中断
-initially-drop-old-table
      在本次操作前删除old表(可能之前没有删除),默认如果存在中断
-initially-drop-socket-file
      删除已存在的socket文件
-ok-to-drop-table
      DDL完成后自动删除old表
-panic-flag-file string
      当指定该参数后,如果创建该文件,gh-ost立刻中断退出,不会清理产生的临时表和文件
-exact-rowcount
      精确的统计表数据行数而不是预估,即使不准确只是影响进度的计算,实际copy行数是由最大值和最小值确定,与其无关。
-serve-socket-file string
      socket文件
-assume-rbr
      显示告诉gh-ost日志格式是row格式,如果没有该参数,gh-ost每次都会设置row格式并重启复制,需要用户有super权限
-assume-master-host
      显示告诉gh-ost master地址,如果不提供,gh-ost会根据从库查到master
-host、-port
      gh-ost默认是作为slave的连接信息,如果使用slave应用日志,这里填写slave的主机信息;如果直接在master上执行,这里就填写master信息,并且必须存在allow-on-master参数,否则就报错退出。

3、输出日志分析

GH-OST会输出一些关键详细信息,让你了解整个迁移过程。当然,你可以控制输出级别。

–verbose:常用,有用的输出,而不是一切。

–debug:输出所有一切。

开始输出如下:

2018-08-07 14:17:11 INFO starting gh-ost 1.0.46
2018-08-07 14:17:11 INFO Migrating `darren`.`t4`
2018-08-07 14:17:11 INFO connection validated on 10.249.5.39:3306
2018-08-07 14:17:11 INFO User has ALL privileges
2018-08-07 14:17:11 INFO binary logs validated on 10.249.5.39:3306
2018-08-07 14:17:11 INFO Restarting replication on 10.249.5.39:3306 to make sure binlog settings apply to replication thread
2018-08-07 14:17:11 INFO Inspector initiated on shvm-5-39.58os.org:3306, version 5.7.21-log
2018-08-07 14:17:11 INFO Table found. Engine=InnoDB
2018-08-07 14:17:11 INFO Estimated number of rows via EXPLAIN: 58707
2018-08-07 14:17:11 INFO Recursively searching for replication master
2018-08-07 14:17:11 INFO Master found to be shvm-5-39.58os.org:3306
2018-08-07 14:17:11 INFO log_slave_updates validated on 10.249.5.39:3306
2018-08-07 14:17:11 INFO connection validated on 10.249.5.39:3306
2018/08/07 14:17:11 binlogsyncer.go:79: [info] create BinlogSyncer with config {99999 mysql 10.249.5.39 3306 dbadmin   false false <nil>}
2018-08-07 14:17:11 INFO Connecting binlog streamer at shvm-5-39.000040:337570954
2018/08/07 14:17:11 binlogsyncer.go:246: [info] begin to sync binlog from position (shvm-5-39.000040, 337570954)
2018/08/07 14:17:11 binlogsyncer.go:139: [info] register slave for master server 10.249.5.39:3306
2018/08/07 14:17:11 binlogsyncer.go:573: [info] rotate to (shvm-5-39.000040, 337570954)
2018-08-07 14:17:11 INFO rotate to next log name: shvm-5-39.000040
2018-08-07 14:17:11 INFO connection validated on 10.249.5.39:3306
2018-08-07 14:17:11 INFO connection validated on 10.249.5.39:3306
2018-08-07 14:17:11 INFO will use time_zone='SYSTEM' on applier
2018-08-07 14:17:11 INFO Examining table structure on applier
2018-08-07 14:17:11 INFO Applier initiated on shvm-5-39.58os.org:3306, version 5.7.21-log
2018-08-07 14:17:11 INFO Dropping table `darren`.`_t4_gho`
2018-08-07 14:17:11 INFO Table dropped
2018-08-07 14:17:11 INFO Dropping table `darren`.`_t4_del`
2018-08-07 14:17:11 INFO Table dropped
2018-08-07 14:17:11 INFO Dropping table `darren`.`_t4_ghc`
2018-08-07 14:17:11 INFO Table dropped
2018-08-07 14:17:11 INFO Creating changelog table `darren`.`_t4_ghc`
2018-08-07 14:17:11 INFO Changelog table created
2018-08-07 14:17:11 INFO Creating ghost table `darren`.`_t4_gho`
2018-08-07 14:17:11 INFO Ghost table created
2018-08-07 14:17:11 INFO Altering ghost table `darren`.`_t4_gho`
2018-08-07 14:17:11 INFO Ghost table altered
2018-08-07 14:17:11 INFO Intercepted changelog state GhostTableMigrated
2018-08-07 14:17:11 INFO Waiting for ghost table to be migrated. Current lag is 0s
2018-08-07 14:17:11 INFO Handled changelog state GhostTableMigrated
2018-08-07 14:17:11 INFO Chosen shared unique key is PRIMARY
2018-08-07 14:17:11 INFO Shared columns are id,name,c1,c2,c4,c5,c6
2018-08-07 14:17:11 INFO Listening on unix socket file: /tmp/ghost.sock
2018-08-07 14:17:11 INFO Migration min values: [1]
2018-08-07 14:17:11 INFO Migration max values: [58597]
2018-08-07 14:17:11 INFO Waiting for first throttle metrics to be collected
2018-08-07 14:17:11 INFO First throttle metrics collected
# Migrating `darren`.`t4`; Ghost table is `darren`.`_t4_gho`
# Migrating shvm-5-39.58os.org:3306; inspecting shvm-5-39.58os.org:3306; executing on shvm-5-39.58os.org
# Migration started at Tue Aug 07 14:17:11 +0800 2018
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=64; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/ghost.sock

这些信息是GH-OST相对自我解释,他们大多表示一切顺利。你将主要关注迁移并了解其是否顺利进行。一旦迁移实际开始,你将看到如下输出。

Copy: 0/58707 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: shvm-5-39.000040:337574146; State: migrating; ETA: N/A
Copy: 0/58707 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: shvm-5-39.000040:337581355; State: migrating; ETA: N/A
Copy: 27000/58707 46.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 2s(copy); streamer: shvm-5-39.000040:338201054; State: migrating; ETA: 2s
Copy: 58000/58707 98.8%; Applied: 0; Backlog: 0/1000; Time: 3s(total), 3s(copy); streamer: shvm-5-39.000040:338912890; State: migrating; ETA: 0s
2018-08-07 14:17:14 INFO Row copy complete

进度提示

Copy: 27000/58707 46.0%;58707指需要迁移总行数,27000指已经迁移的行数,46%指迁移完成的百分比。
Applied: 0,指在二进制日志中处理的event数量。在上面的例子中,迁移表没有流量,因此没有被处理日志event。
Backlog: 0/1000,表示我们在读取二进制日志方面表现良好,在二进制日志队列中没有任何积压(Backlog)事件。
Backlog: 7/1000,当复制行时,在二进制日志中积压了一些事件,并且需要应用。
Backlog: 1000/1000,表示我们的1000个事件的缓冲区已满(程序写死的1000个事件缓冲区,低版本是100个),此时就注意binlog写入量非常大,gh-ost处理不过来event了,可能需要暂停binlog读取,需要优先应用缓冲区的事件。
streamer: shvm-5-39.000040:338912890;表示当前已经应用到binlog文件位置

状态提示

每隔一定时间会打印友好提示:

# Migrating `darren`.`t4`; Ghost table is `darren`.`_t4_gho`
# Migrating shvm-5-39.58os.org:3306; inspecting shvm-5-39.58os.org:3306; executing on shvm-5-39.58os.org
# Migration started at Tue Aug 07 14:17:11 +0800 2018
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=64; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/ghost.sock

三、从库模式

1、常用命令

gh-ost \
   --max-load=Threads_running=16 \
   --critical-load=Threads_running=32 \
   --chunk-size=1000  \
   --initially-drop-old-table \
   --initially-drop-ghost-table \
   --initially-drop-socket-file \
   --ok-to-drop-table \
   --host="10.249.5.39" \
   --port=3307 \
   --user="dbadmin" \
   --password="12345" \
   --assume-rbr \
   --allow-on-master \
   --assume-master-host=10.249.5.39:3306 \
   --database="gh_ost" \
   --table="gh_01" \
   --alter="add column c4 varchar(50) not null default ''" \
   --panic-flag-file=/tmp/ghost.panic.flag \
   --serve-socket-file=/tmp/ghost.sock \
   --verbose \
   --execute

四、测试模式

gh-ost \
   --test-on-replica \
   --max-load=Threads_running=16 \
   --critical-load=Threads_running=32 \
   --chunk-size=1000  \
   --initially-drop-old-table \
   --initially-drop-ghost-table \
   --initially-drop-socket-file \
   --host="10.249.5.39" \
   --port=3307 \
   --user="dbadmin" \
   --password="12345" \
   --assume-rbr \
   --database="gh_ost" \
   --table="gh_01" \
   --alter="add column c4 varchar(50) not null default ''" \
   --panic-flag-file=/tmp/ghost.panic.flag \
   --serve-socket-file=/tmp/ghost.sock \
   --verbose \
   --execute

参数说明

--test-on-replica
      在从库上执行迁移,但不进行最后的cut-over,并最后会停止复制线程,供测试人员进行数据对比
--migrate-on-replica
      直接在从库上迁移并cut-over,复制线程不会停止

五、暂停、恢复、终止、延迟切换

gh-ost的--serve-socket-file文件用来监听请求,比如可以动态调整性能方面参数,也可以进行暂停、恢复gh-ost线程。

#暂停
echo throttle | socat - /tmp/ghost.sock

#恢复
echo no-throttle | socat - /tmp/ghost.sock

#终止
对应panic-flag-file参数文件,当tmp目录存在该文件立即停止
touch /tmp/ghost.panic.flag

#延迟切换(cut-over阶段)
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag
当设置该参数时cut-over一直延迟切换,直到你删除该文件才进行切换

#动态调整性能参数
echo chunk-size=100 | socat - /tmp/ghost.sock