1、pt-heartbeat的作用
pt-heartbeat measures replication lag on a MySQL or PostgreSQL server. You can use it to update a master or monitor a replica. If possible, MySQL connection options are read from your .my.cnf file. For more details, please use the --help option, or try 'perldoc /usr/bin/pt-heartbeat' for complete documentation.
pt-heartbeat is a two-part MySQL and PostgreSQL replication delay monitoring system that measures delay by looking at actual replicated data. This avoids reliance on the replication mechanism itself, which is unreliable. (For example, SHOW SLAVE STATUS on MySQL).
2、pt-heartbeat的原理
The first part is an --update instance of pt-heartbeat that connects to a master and updates a timestamp (“heartbeat record”) every --interval seconds. Since the heartbeat table may contain records from multiple masters (see “MULTI-SLAVE HIERARCHY”), the server’s ID (@@server_id) is used to identify records.
主库上存在一个用于检查延迟的表heartbeat,可手动或自动创建
pt-heartbeat使用--update参数连接到主库上并持续(根据设定的--interval参数)使用一个时间戳更新到表heartbeat
The second part is a --monitor or --check instance of pt-heartbeat that connects to a slave, examines the replicated heartbeat record from its immediate master or the specified --master-server-id, and computes the difference from the current system time. If replication between the slave and the master is delayed or broken, the computed difference will be greater than zero and otentially increase if --monitor is specified.
pt-heartbeat使用--monitor 或--check连接到从库,检查从主库同步过来的时间戳,并与当前系统时间戳进行比对产生一个差值,
该值则用于判断延迟。(注,前提条件是主库与从库应保持时间同步)
You must either manually create the heartbeat table on the master or use --create-table. See --create-table for the proper heartbeat table structure. The MEMORY storage engine is suggested, but not re-quired of course, for MySQL.
The heartbeat table must contain a heartbeat row. By default, a heartbeat row is inserted if it doesn’t exist. This feature can be disabled with the --[no]insert-heartbeat-row option in case the database user does not have INSERT privileges.
pt-heartbeat depends only on the heartbeat record being replicated to the slave, so it works regardless of the replication mechanism (built-in replication, a system such as Continuent Tungsten, etc). It works at any depth in the replication hierarchy; for example, it will reliably report how far a slave lags its master’s master’s master. And if replication is stopped, it will continue to work and report (accurately!) that the slave is falling further and further behind the master.
pt-heartbeat has a maximum resolution of 0.01 second. The clocks on the master and slave servers must be closely synchronized via NTP. By default, --update checks happen on the edge of the second (e.g. 00:01) and --monitor checks happen halfway between seconds (e.g. 00:01.5). As long as the servers’ clocks are closely synchronized and replication events are propagating in less than half a second, pt-heartbeat will report zero seconds of delay.
pt-heartbeat will try to reconnect if the connection has an error, but will not retry if it can’t get a connection when it first starts.
The --dbi-driver option lets you use pt-heartbeat to monitor PostgreSQL as well. It is reported to work well with Slony-1 replication.
3、获取pt-heartbeat帮助信息
a、获取帮助信息
[root@DBMASTER01 ~]# pt-heartbeat #直接输入pt-heartbeat可获得一个简要描述,使用pt-heartbeat --help获得一个完整帮助信息
Usage: pt-heartbeat [OPTIONS] [DSN] --update|--monitor|--check|--stop
Errors in command-line arguments:
* Specify at least one of --stop, --update, --monitor or --check
* --database must be specified
4、演示使用pt-heartbeat
a、首先添加表
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --create-table --update
MASTER> select * from heartbeat;
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+
| ts | server_id | file | position | relay_master_log_file | exec_master_log_pos |
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+
| 2014-12-01T09:48:14.003020 | 11 | mysql-bin.000390 | 677136957 | mysql-bin.000179 | 120 |
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+
b、更新主库上的heartbeat
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --update &
[1] 31249
c、从库上监控延迟
[root@DBBAK01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --monitor --print-master-server-id
1.00s [ 0.02s, 0.00s, 0.00s ] 11 #实时延迟,1分钟延迟,5分钟延迟,15分钟延迟
1.00s [ 0.03s, 0.01s, 0.00s ] 11
1.00s [ 0.05s, 0.01s, 0.00s ] 11
1.00s [ 0.07s, 0.01s, 0.00s ] 11
1.00s [ 0.08s, 0.02s, 0.01s ] 11
1.00s [ 0.10s, 0.02s, 0.01s ] 11
1.00s [ 0.12s, 0.02s, 0.01s ] 11
1.00s [ 0.13s, 0.03s, 0.01s ] 11
d、其他操作示例
#将主库上的update使用守护进程方式调度
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --update --daemonize
#修改主库上的更新间隔为2s
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --update --daemonize --interval=2
#停止主库上的pt-heartbeat守护进程
[root@DBMASTER01 ~]# pt-heartbeat --stop
Successfully created file /tmp/pt-heartbeat-sentinel
[root@DBMASTER01 ~]# rm -rf /tmp/pt-heartbeat-sentinel
#单次查看从库上的延迟情况
[robin@DBBAK01 ~]$ pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --check
1.00
#使用守护进程监控从库并输出日志
[root@DBBAK01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --monitor --print-master-server-id --daemonize --log=/tmp/slave-lag.log