我也是只菜鸡,blog写的不对或者不严谨的地方还请大伙指出来,我及时改正,免得误人子弟。


实验环境:

CentOS7.3.1611 + MySQL社区版 5.7.19

参考:

    小菜鸟DBA的微信公众号推送


官方文档:

https://dev.mysql.com/doc/internals/en/binary-log-versions.html

https://dev.mysql.com/doc/internals/en/row-based-binary-logging.html

https://dev.mysql.com/doc/internals/en/event-classes-and-types.html

https://dev.mysql.com/doc/internals/en/event-header-fields.html

https://dev.mysql.com/doc/internals/en/event-meanings.html

https://dev.mysql.com/doc/internals/en/event-data-for-specific-event-types.html


3个在线工具:

http://tool.oschina.net/hexconvert/  在线进制转换

http://tool.chinaz.com/Tools/unixtime.aspx   Unix时间戳

https://www.bejson.com/convert/ox2str/   16进制转字符串



binlog实际上由一个个不同类型的binlog event组成,每个binlog event还包含了event header部分和event data部分(可选)。

【注意:每个event最后还有4bytes的校验位,官方文档并没有提到这个地方,不然分析event物理格式时候会发现event长度对不上号】


常见的一个binlog物理文件有如下组成部分:

1、4字节的magic number作为binlog文件的开头

2、N个不同类型的binlog event

3、rotate event 作为binlog文件的结尾(正在使用的binlog里面是没有rotate event的)


此外,还有一个索引文件记录当前有哪些binlog文件,及当前正在使用的binlog文件。(文件名类似:mysql-bin.index)



下表就是的binlog event的一般格式:

+=====================================+

| event  | timestamp         0 : 4    |

| header +----------------------------+

|        | type_code         4 : 1    | = FORMAT_DESCRIPTION_EVENT = 15(binlog v4)

|        +----------------------------+

|        | server_id         5 : 4    |

|        +----------------------------+

|        | event_length      9 : 4    | >= 91

|        +----------------------------+

|        | next_position    13 : 4    |

|        +----------------------------+

|        | flags            17 : 2    |

+=====================================+

| event  | binlog_version   19 : 2    | = 4

| data   +----------------------------+

|        | server_version   21 : 50   |

|        +----------------------------+

|        | create_timestamp 71 : 4    |

|        +----------------------------+

|        | header_length    75 : 1    |

|        +----------------------------+

|        | post-header      76 : n    | = array of n bytes, one byte per event

|        | lengths for all            |   type that the server knows about

|        | event types                |

+=====================================+


常用的EVENT如下:

    FORMAT_DESCRIPTION_EVENT:binlog文件的第一个event,记录版本号等元数据信息

    QUERY_EVENT: 存储statement类的信息,基于statement的binlog格式记录sql语句,在row模式下记录事务begin标签

    XID_EVENT: 二阶段提交xid记录

    TABLE_MAP_EVENT: row模式下记录表源数据,对读取行记录提供规则参考,后面会详细介绍

    WRITE_ROWS_EVENT/DELETE_ROWS_EVENT/UPDATE_ROWS_EVENT: row模式下记录对应行数据变化的记录

    GTID_LOG_EVENT: 这个就是记录GTID事务号了,用于5.6版本之后基于GTID同步的方式

    ROTATE_EVENT: 连接下一个binlog文件

  

需要了解更全面的Event类型详见: https://dev.mysql.com/doc/internals/en/event-classes-and-types.html  (全部的定义在源代码的binlog_event.h中,看了下5.7代码比5.6又增加了几个event类型)


下面是我截取的一个完整的binlog文件,具体的events如下:

image.png


目前,我们一般都是使用row格式的binlog,其他的mixed和statement格式的binlog这里不去关注了。


对于row格式的DML操作而言,实际上在binlog里面记录的是:TABLE_MAP_EVENT+ ROW_LOG_EVENT(ROW_LOG_EVENT还可以细分为WRITE_ROWS_EVENT、UPDATE_ROWS_EVENT、DELETE_ROWS_EVENT)

为什么一个update在ROW模式下需要分解成两个event:一个Table_map,一个Update_rows?

我们想象一下,一个update如果更新了10000条数据,那么对应的表结构信息是否需要记录10000次?其实是对同一个表的操作,所以这里binlog只是记录了一个Table_map用于记录表结构相关信息,而后面的Update_rows记录了更新数据的行信息。他们之间是通过table_id来联系的。【table_id不是固定的,是一个变量,占用的是table_definition_cache和table_open_cache空间(因此flush tables会造成table_id的增长)


如下是一个insert插入1条记录的binlog,可以看到有table_map+ write_rows 这2个event组成。

image.png

table_map记录的是表的元数据信息,例如库名、表名、字段类型等信息。


补充:

image.png


关于table_id的几篇干货:

http://blog.itpub.net/22664653/viewspace-1158547/ 【杨奇龙】

http://agapple.iteye.com/blog/1797061

http://www.cnblogs.com/yuyue2014/p/3721172.html

http://www.sohu.com/a/130698375_610509   【宋利兵】

http://www.cnblogs.com/cenalulu/archive/2012/09/24/2699907.html  【卢钧轶】


其他的几个EVENT类型:

官方文档:https://dev.mysql.com/doc/internals/en/event-data-for-specific-event-types.html


FORMAT_DESCRIPTION_EVENT

这个是最基础的event,每个新的binlog头部就带有这个event。每一个binlog文件只能存在一个FORMAT_DESCRIPTION_EVENT。

image.png

image.png


Fixed data part:

  • 2 bytes. The binary log format version. This is 4 in MySQL 5.0 and up.

  • 50 bytes. The MySQL server's version (example: 5.0.14-debug-log), padded with 0x00 bytes on the right.

  • 4 bytes. Timestamp in seconds when this event was created (this is the moment when the binary log was created). This value is redundant; the same value occurs in the timestamp header field.

  • 1 byte. The header length. This length - 19 gives the size of the extra headers field at the end of the header for other events.

  • Variable-sized. An array that indicates the post-header lengths for all event types. There is one byte per event type that the server knows about.



FORMAT_DESCRIPTION_EVENT 实例:

flush logs; 产生一个全新的binlog文件,导出后如下:

master [localhost] {root} ((none)) > show binlog events in 'mysql-bin.000002';

+------------------+-----+-------------+-----------+-------------+---------------------------------------+

| Log_name         | Pos | Event_type  | Server_id | End_log_pos | Info                                  |

+------------------+-----+-------------+-----------+-------------+---------------------------------------+

| mysql-bin.000002 |   4 | Format_desc |         1 |         120 | Server ver: 5.6.37-log, Binlog ver: 4 |

+------------------+-----+-------------+-----------+-------------+---------------------------------------+


[root@test_mysql26 /root/sandboxes/rsandbox_5_6_37/master/data ]# hexdump -C mysql-bin.000002

00000000  fe 62 69 6e f6 e3 fe 59  0f 01 00 00 00 74 00 00  |.bin...Y.....t..|

00000010  00 78 00 00 00 01 00 04  00 35 2e 36 2e 33 37 2d  |.x.......5.6.37-|

00000020  6c 6f 67 00 00 00 00 00  00 00 00 00 00 00 00 00  |log.............|

00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 13  |................|

00000050  38 0d 00 08 00 12 00 04  04 04 04 12 00 00 5c 00  |8.............\.|

00000060  04 1a 08 00 00 00 08 08  08 02 00 00 00 0a 0a 0a  |................|

00000070  19 19 00 01 08 4c 67 48                           |.....LgH|

00000078


magic number (4bytes

fe 62 69 6e   


event header (19bytes)

f6 e3 fe 59       timestamp

0f                type_code   表示binlog采用v4版本的

01 00 00 00       server_id

74 00 00 00       event_length   116bytes

78 00 00 00       next_position  下一个event从120开始

01 00             flags


event data:

04 00     binlog version  ,表示v4版的binlog格式

35 2e 36 2e 33 37 2d 6c 6f 67 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00     表示的是server_version ,转换成字符串就是5.6.37-log

00 00 00 00     create_timestamp ,用的是相对时间

13            表示的是event header的长度,十进制表示就是19bytes     

38 0d 00 08 00 12 00 04  04 04 04 12 00 00 5c 00 04 1a 08 00 00 00 08 08  08 02 00 00 00 0a 0a 0a 19 19 00 01        36种event 类型

08 4c 67 48       4bytes校验位。


36种event类型:https://dev.mysql.com/doc/internals/en/event-classes-and-types.html




STOP_EVENT:

当正常关闭mysqld时候,或者是从库上执行了reset slave 都 产生这个stop_event

image.png

Stop_log_event is written under these circumstances:

  • A master writes the event to the binary log when it shuts down

  • A slave writes the event to the relay log when it shuts down or when a RESET SLAVE statement is executed


STOP_EVENT实例:

/etc/init.d/mysqld restart 

master [localhost] {root} ((none)) > show binlog events in 'mysql-bin.000002';

+------------------+-----+-------------+-----------+-------------+---------------------------------------+

| Log_name         | Pos | Event_type  | Server_id | End_log_pos | Info                                  |

+------------------+-----+-------------+-----------+-------------+---------------------------------------+

| mysql-bin.000002 |   4 | Format_desc |         1 |         120 | Server ver: 5.6.37-log, Binlog ver: 4 |

| mysql-bin.000002 | 120 | Stop        |         1 |         143 |                                       |

+------------------+-----+-------------+-----------+-------------+---------------------------------------+

2 rows in set (0.00 sec)


[root@test_mysql26 /root/sandboxes/rsandbox_5_6_37/master/data ]# hexdump -C mysql-bin.000002 -s 120

00000078  f9 f0 fe 59 03 01 00 00  00 17 00 00 00 8f 00 00  |...Y............|

00000088  00 00 00 39 d3 4f ad                              |...9.O.|

0000008f


含义未知。官方没有说。



QYERY_EVENT:

使用begin命令开启一个事务的时候,会产生QUERY_EVENT

image.png

固定部分:

    4bytes    thread_id 可以用于审计

    4bytes    该语句的执行时长,单位秒

    1byte    执行命令时候所在的库名的字节长度

    2bytes    错误代码

    2bytes    记录data part部分variable status的长度

可变部分:

    0或者更多状态变量

    默认的库名

    SQL_Statement



master [localhost] {root} (test) > begin;

master [localhost] {root} (test) > insert into tttt2 select 'AAAA';

master [localhost] {root} (test) > commit;


master [localhost] {root} (test) > show binlog events in 'mysql-bin.000001';

+------------------+-----+-------------+-----------+-------------+---------------------------------------------+

| Log_name         | Pos | Event_type  | Server_id | End_log_pos | Info                                        |

+------------------+-----+-------------+-----------+-------------+---------------------------------------------+

| mysql-bin.000001 |   4 | Format_desc |         1 |         120 | Server ver: 5.6.37-log, Binlog ver: 4       |

| mysql-bin.000001 | 120 | Query       |         1 |         199 | BEGIN                                       |

| mysql-bin.000001 | 199 | Query       |         1 |         304 | use `test`; insert into tttt2 select 'AAAA' |

| mysql-bin.000001 | 304 | Xid         |         1 |         335 | COMMIT /* xid=40 */                         |

+------------------+-----+-------------+-----------+-------------+---------------------------------------------+

4 rows in set (0.00 sec)



[root@test_mysql26 /root/sandboxes/rsandbox_5_6_37/master/data ]# hexdump -C mysql-bin.000001  -s 120

00000078  de f3 fe 59 02 01 00 00  00 4f 00 00 00 c7 00 00  |...Y.....O......|

00000088  00 08 00 01 00 00 00 00  00 00 00 04 00 00 21 00  |..............!.|

00000098  00 00 00 00 00 01 00 00  00 40 00 00 00 00 06 03  |.........@......|

000000a8  73 74 64 04 21 00 21 00  08 00 0c 01 74 65 73 74  |std.!.!.....test|

000000b8  00 74 65 73 74 00 42 45  47 49 4e 37 1f 09 57 de  |.test.BEGIN7..W.|

000000c8  f3 fe 59 02 01 00 00 00  69 00 00 00 30 01 00 00  |..Y.....i...0...|

000000d8  00 00 01 00 00 00 00 00  00 00 04 00 00 21 00 00  |.............!..|

000000e8  00 00 00 00 01 00 00 00  40 00 00 00 00 06 03 73  |........@......s|

000000f8  74 64 04 21 00 21 00 08  00 0c 01 74 65 73 74 00  |td.!.!.....test.|

00000108  74 65 73 74 00 69 6e 73  65 72 74 20 69 6e 74 6f  |test.insert into|

00000118  20 74 74 74 74 32 20 73  65 6c 65 63 74 20 27 41  | tttt2 select 'A|

00000128  41 41 41 27 73 f4 e2 90  e0 f3 fe 59 10 01 00 00  |AAA's......Y....|

00000138  00 1f 00 00 00 4f 01 00  00 00 00 28 00 00 00 00  |.....O.....(....|

00000148  00 00 00 17 4f 16 46                              |....O.F|

0000014f



.... 这个类型的event分析卡住了,谁来帮我下。。。。。。。。。  【参考http://www.jianshu.com/p/c16686b35807


ROTATE_EVENT:

当flush logs或者正常的切割binlog时候,会产生ROTATE_EVENT

image.png

When a binary log file exceeds a size limit, a ROTATE_EVENT is written at the end of the file that points to the next file in the squence. This event is information for the slave to know the name of the next binary log it is going to receive.

Fixed data part:

  • 8 bytes. The position of the first event in the next log file. Always contains the number 4 (meaning the next event starts at position 4 in the next binary log). This field is not present in v1; presumably the value is assumed to be 4.

Variable data part:

  • The name of the next binary log. The filename is not null-terminated. Its length is the event size minus the size of the fixed parts.



XID_EVENT

为了事务的一致性,写binlog的时候,先写事务的语句,然后写xid标志,最后才是提交COMMIT命令。

image.png

Fixed part为空

variable part 8bytes,记录的是xid编号



关于rotate event的例子:

master [localhost] {root} ((none)) > show binlog events in 'mysql-bin.000001';

+------------------+-----+-------------+-----------+-------------+---------------------------------------+

| Log_name         | Pos | Event_type  | Server_id | End_log_pos | Info                                  |

+------------------+-----+-------------+-----------+-------------+---------------------------------------+

| mysql-bin.000001 |   4 | Format_desc |         1 |         120 | Server ver: 5.6.37-log, Binlog ver: 4 |

| mysql-bin.000001 | 120 | Rotate      |         1 |         167 | mysql-bin.000002;pos=4                |

+------------------+-----+-------------+-----------+-------------+---------------------------------------+

2 rows in set (0.00 sec)


[root@test_mysql26 /root/sandboxes/rsandbox_5_6_37/master/data ]# hexdump -C mysql-bin.000001  -s 4 -n 19  导出format desc event的内容

00000004  e2 e3 fe 59 0f 01 00 00  00 74 00 00 00 78 00 00  |...Y.....t...x..|

00000014  00 00 00                                          |...|

00000017


00 00    最后2位都是0000表示这个binlog关闭了,如果是01 00表示这个binlog还在使用中。



[root@test_mysql26 /root/sandboxes/rsandbox_5_6_37/master/data ]# hexdump -C mysql-bin.000001 -s 120   导出rotate event的内容

00000078  f6 e3 fe 59 04 01 00 00  00 2f 00 00 00 a7 00 00  |...Y...../......|

00000088  00 00 00 04 00 00 00 00  00 00 00 6d 79 73 71 6c  |...........mysql|

00000098  2d 62 69 6e 2e 30 30 30  30 30 32 ce 7f 95 b8     |-bin.000002....|

000000a7


event_header(19bytes)具体如下:

f6 e3 fe 59  timestamp

04           type_code  其event type是0x04,表示是一个rotate event

01 00 00 00  server_id     表示serverid是 1

2f 00 00 00  event_length    长度为2f,十进制表示就是47bytes

a7 00 00 00   next_position    下一个event起始位置为0xa7,十进制表示就是167

00 00      flags            0000表示这个binlog已经正常关闭了


然后是event data部分,具体如下:

04 00 00 00 00 00 00 00   Fixed data部分,8bytes,记录的是下一个binlog的位置偏移4

6d 79 73 71 6c 2d 62 69 6e 2e 30 30 30 30 30 32  Variable data部分,记录的是下一个binlog的文件名mysql-bin.000002

ce 7f 95 b8   含义未知


TABLE_MAP_EVENT:

Used for row-based binary logging beginning with MySQL 5.1.5.

Fixed data part:

  • 6 bytes. The table ID.

  • 2 bytes. Reserved for future use.

Variable data part:

  • 1 byte. The length of the database name. 库名的长度

  • Variable-sized. The database name (null-terminated).  库名

  • 1 byte. The length of the table name.   表名的长度

  • Variable-sized. The table name (null-terminated). 表名   

  • Packed integer. The number of columns in the table. 列的数量

  • Variable-sized. An array of column types, one byte per column. To find the meanings of these values, look atenum_field_types in the mysql_com.h header file. 每一列的数据类型

  • Packed integer. The length of the metadata block.       metadata block的长度

  • Variable-sized. The metadata block; see log_event.h for contents and format.

  • Variable-sized. Bit-field indicating whether each column can be NULL, one bit per column. For this field, the amount of storage required for N columns is INT((N+7)/8) bytes. 该部分记录字段是否允许为空,一位代表一个字段,占用字int((N+7/8))bytes,N为字段数




WRITE_ROWS_EVENT、DELETE_ROW_EVENTS、UPDATE_ROW_EVENTS 都参考如下这种解释:

Used for row-based binary logging beginning with MySQL 5.1.18.

[TODO: following needs verification; it's guesswork]

Fixed data part:

  • 6 bytes. The table ID.

  • 2 bytes. Reserved for future use.

Variable data part:

  • Packed integer. The number of columns in the table.  列的数量

  • Variable-sized. Bit-field indicating whether each column is used, one bit per column. For this field, the amount of storage required for N columns is INT((N+7)/8) bytes.

  • Variable-sized (for UPDATE_ROWS_LOG_EVENT only). Bit-field indicating whether each column is used in theUPDATE_ROWS_LOG_EVENT after-image; one bit per column. For this field, the amount of storage required for N columns is INT((N+7)/8) bytes.

  • Variable-sized. A sequence of zero or more rows. The end is determined by the size of the event. Each row has the following format:

    • Variable-sized. Bit-field indicating whether each field in the row is NULL. Only columns that are "used" according to the second field in the variable data part are listed here. If the second field in the variable data part has N one-bits, the amount of storage required for this field is INT((N+7)/8) bytes.

    • Variable-sized. The row-image, containing values of all table fields. This only lists table fields that are used (according to the second field of the variable data part) and non-NULL (according to the previous field). In other words, the number of values listed here is equal to the number of zero bits in the previous field (not counting padding bits in the last byte).
      The format of each value is described in the log_event_print_value() function in log_event.cc.

    • (for UPDATE_ROWS_EVENT only) the previous two fields are repeated, representing a second table row.

For each row, the following is done:

  • For WRITE_ROWS_LOG_EVENT, the row described by the row-image is inserted.

  • For DELETE_ROWS_LOG_EVENT, a row matching the given row-image is deleted.

  • For UPDATE_ROWS_LOG_EVENT, a row matching the first row-image is removed, and the row described by the second row-image is inserted.




HEARTBEAT_LOG_EVENT:

A Heartbeat_log_event is sent by a master to a slave to let the slave know that the master is still alive. Events of this type do not appear in the binary or relay logs. They are generated on a master server by the thread that dumps events and sent straight to the slave without ever being written to the binary log. The slave examines the event contents and then discards it without writing it to the relay log.

主从心跳的heartbeat event是不写到binlog里面的。Sent by a master to a slave to let the slave know that the master is still alive. Not written to log files.