目录

  • 一、MVCC概述
  • 二、Consistent Nonlocking Reads、Locking Reads
  • 1.Consistent Nonlocking Reads
  • 2.Locking Reads
  • 三、Innodb会保存任何改变的副本 undo log
  • 四、记录中隐藏参数
  • 五、read view
  • 六、可见性比较算法


一、MVCC概述

InnoDB is a multi-versioned storage engine: it keeps information about old versions of changed rows, to support transactional features such as concurrency and rollback. This information is stored in the tablespace in a data structure called a rollback segment (after an analogous data structure in Oracle). InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.
InnoDB是一个多版本的存储引擎:它保留已更改行的旧版本的信息,来支持并发、回滚之类的事务功能。 该信息存储在rollback segment中,InnoDB使用rollback segment中的信息来执行事务回滚中所需的撤消操作。 它还使用该信息来构建行的早期版本,以实现一致的读取(MVCC)。

InnoDB implements multi-version concurrency control (MVCC), meaning that different users will see different versions of the data they are interacting with (sometimes called snapshots, which is a bit of a misleading term). This is done in order to allow users to see a consistent view of the system without expensive and performance-constraining locking which would limit concurrency. (This is where the “concurrency control” part of the term comes from; one alternative is locking everything the user may need.) Undo logging and InnoDB’s “history” system are the mechanisms that underly its implementation of MVCC, but the way this works is generally very poorly understood.
InnoDB实现了多版本并发控制(MVCC),这意味着不同的用户将看到与之交互的数据的不同版本。这样做是为了使用户能够看到系统的一致视图(看到版本,取决于事务隔离级别),而没用用代价很高的锁(锁会限制并发性)。Undo log和InnoDB’s “history” system是MVCC的基础机制。

二、Consistent Nonlocking Reads、Locking Reads

很多中文翻译成了快照读和当前读

1.Consistent Nonlocking Reads

select from where...

A read operation that uses snapshot information to present query results based on a point in time, regardless of changes performed by other transactions running at the same time. If queried data has been changed by another transaction, the original data is reconstructed based on the contents of the undo log. This technique avoids some of the locking issues that can reduce concurrency by forcing transactions to wait for other transactions to finish.
consistent read:使用快照信息可基于某个时间点显示查询结果,而不管同时运行的其他事务执行的更改如何。 如果查询的数据已被另一个事务更改,则将基于undo log的内容来重建原始数据。 通过强制事务等待其他事务完成,该技术避免了一些可以减少并发性的锁定问题。

Consistent read is the default mode in which InnoDB processes SELECT statements in READ COMMITTED and REPEATABLE READ isolation levels. A consistent read does not set any locks on the tables it accesses, and therefore other sessions are free to modify those tables at the same time a consistent read is being performed on the table.
Consistent read是InnoDB在READ COMMITTED和REPEATABLE READ隔离级别中处理SELECT语句的默认模式。 consistent read不会在它访问的表上设置任何锁,因此其他会话可以自由地在对表执行一致读取的同时修改这些表。

Suppose that you are running in the default REPEATABLE READ isolation level. When you issue a consistent read (that is, an ordinary SELECT statement), InnoDB gives your transaction a timepoint according to which your query sees the database. If another transaction deletes a row and commits after your timepoint was assigned, you do not see the row as having been deleted. Inserts and updates are treated similarly.
假设您以默认的REPEATABLE READ隔离级别运行。 当您发出 consistent read(即普通的SELECT语句)时,InnoDB会为您的事务提供一个时间点,根据该时间点您的查询可以看到数据库。 如果在分配了时间点后另一个事务删除了一行并提交,则看不到该行已被删除(若另一个事务插入和更新的处理方式相似)。

With REPEATABLE READ isolation level, the snapshot is based on the time when the first read operation is performed. With READ COMMITTED isolation level, the snapshot is reset to the time of each consistent read operation.
使用REPEATABLE READ隔离级别时,快照基于执行第一次读取操作的时间。 使用READ COMMITTED隔离级别,快照将重置为每次consistent read。

2.Locking Reads

If you query data and then insert or update related data within the same transaction, the regular SELECT statement does not give enough protection. Other transactions can update or delete the same rows you just queried. InnoDB supports two types of locking reads that offer extra safety:
因为事务的regular select用的是Consistent Nonlocking Reads。

  • SELECT … LOCK IN SHARE MODE
  • SELECT … FOR UPDATE

详细见: 二、Row-level locking

当前读:特殊的读操作,插入/更新/删除操作,属于当前读,需要加锁。

当前读,读取记录的最新版本。并且,读取之后,还需要保证其他并发事务不能修改当前记录,对读取记录加锁。其中,除了select in share mode;,对读取记录加S锁 (共享锁)外,其他的操作,都加的是X锁 (排它锁)。

为什么将 插入/更新/删除 操作,都归为当前读?可以看看下面这个 更新 操作,在数据库中的执行流程:

mysql的undo001和undo_001 mysql undolog mvcc_Server


**一个Update操作的具体流程:**当Update SQL被发给MySQL后,MySQL Server会根据where条件,读取第一条满足条件的记录,然后InnoDB引擎会将第一条记录返回,并加锁 (current read)。待MySQL Server收到这条加锁的记录之后,会再发起一个Update请求,更新这条记录。一条记录操作完成,再读取下一条记录,直至没有满足条件的记录为止。因此,Update操作内部,就包含了一个当前读。同理,Delete操作也一样。Insert操作会稍微有些不同,简单来说,就是Insert操作可能会触发Unique Key的冲突检查,也会进行一个当前读。

注:根据上图的交互,针对一条当前读的SQL语句,InnoDB与MySQL Server的交互,是一条一条进行的,因此,加锁也是一条一条进行的。先对一条满足条件的记录加锁,返回给MySQL Server,做一些DML操作;然后在读取下一条加锁,直至读取完毕。

三、Innodb会保存任何改变的副本 undo log

InnoDB keeps a copy of everything that is changed
The key thing to know in InnoDB’s implementation of MVCC is that when a record is modified, the current (“old”) version of the data being modified is first stashed away as an “undo record” in an “undo log”. It’s called an undo log because it contains the information necessary to undo the change made by the user, reverting the record to its previous version.
在InnoDB的MVCC实现中,关键是:在修改记录时,正在修改的数据的当前(“旧”)版本首先会作为"undo record"保存在“undo log”中。“undo log”:该日志包含撤消用户所做更改并将记录恢复为先前版本所必需的信息。

Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are needed only in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, but they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.
Undo logs分为insert and update undo logs。 insert undo log仅在事务回滚时才需要,并且在事务提交后可以立即将其丢弃。update undo log用于并发一致读,但是只有InnoDB中不存在需要读取早期版本的事务时才可以丢弃它。

四、记录中隐藏参数

Every record contains a reference to its most recent undo record, called a rollback pointer or ROLL_PTR, and every undo record contains a reference to its previous undo record (except for an initial record insert, which can be undone by simply deleting the record), forming a chain of all previous versions of a record. In this way, any previous version of a record can be easily constructed, as long as the the undo records (the “history”) still exist in the undo logs.

每个记录都包含对其最新undo record的引用,称为回滚指针或ROLL_PTR,并且每个undo record都包含指向先前撤消记录的引用(初始记录插入除外,可以通过简单地删除记录来撤消该记录), 形成记录的所有先前版本的链。 这样,只要undo records(“历史记录”)仍存在于撤消日志中,就可以轻松构造任何先前的记录版本。

mysql的undo001和undo_001 mysql undolog mvcc_MySQL_02


Internally, InnoDB adds three fields to each row stored in the database. A 6-byte DB_TRX_ID field indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted. Each row also contains a 7-byte DB_ROLL_PTR field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated. A 6-byte DB_ROW_ID field contains a row ID that increases monotonically as new rows are inserted. If InnoDB generates a clustered index automatically, the index contains row ID values. Otherwise, the DB_ROW_ID column does not appear in any index.

在内部,InnoDB每一行添加三个字段。

DB_TRX_ID: 6字节 ,指示插入或更新该行的最后一个事务ID。

更新时,会新创建一行记录,DB_TRX_ID为此ID;delete时,会用特殊bit标记为删除。

删除也算更新。

DB_ROLL_PTR:7字节,指向写入rollback segment的undo log record.若更新,undo log record包含重建更新前行的重要信息

DB_ROW_ID:6字节,当新行插入时,会自增。这也就解释了为什么没有主键Innodb会生成聚簇索引的原因。(与MVCC关系不大)

更新操作:

①用排他锁锁定该行

②将该行修改前的值Copy到undo log中

③修改当前行的值,填写事务编号DB_TRX_ID,DB_ROLL_PTR指针指向旧记录

mysql的undo001和undo_001 mysql undolog mvcc_MySQL_03

Deleting isn’t really deleting
Whenever a record is deleted, other transactions may still need to see the record as it existed, due to transaction isolation. If, upon delete, the record was immediately removed from the index, other transactions wouldn’t be able to find it, and thus would also not be able to find its reference to the previous record version they may need. (Keep in mind that any number of transactions may see the record in any number of versions, so five different transactions may see up to five separate versions of the record.) In order to handle this, DELETE doesn’t actually delete anything: instead it delete marks the record, flipping a “deleted” flag on.
记录被删除时,由于事务隔离,其他事务可能仍需要查看该记录。 如果在删除记录后立即将其从索引中删除,则其他事务将无法找到它,因此也将无法找到其对他们可能需要的先前记录版本的引用。 (n个事务可以看到n个的版本的记录)。为了处理此问题,DELETE实际上并没有删除任何内容: 它只是做"删除"标记。

五、read view

mysql的undo001和undo_001 mysql undolog mvcc_加锁_04

low_limit_id:目前出现过的最大的事务ID+1,即下一个将被分配的事务ID。

mysql的undo001和undo_001 mysql undolog mvcc_加锁_05


max_trx_id:“还未分配的最小事务ID”,也就是下一个将被分配的事务ID。(low_limit_id 并不是活跃事务列表中最大的事务ID)

mysql的undo001和undo_001 mysql undolog mvcc_Server_06


up_limit_id:活跃事务列表trx_ids中最小的事务ID,如果trx_ids为空,则up_limit_id 为 low_limit_id。

mysql的undo001和undo_001 mysql undolog mvcc_MySQL_07


因为trx_ids中的活跃事务号是逆序的,所以最后一个为最小活跃事务ID。

trx_ids:Read View创建时其他未提交的活跃事务ID列表。意思就是创建Read View时,将当前未提交事务ID记录下来,后续即使它们修改了记录行的值,对于当前事务也是不可见的。

注意:Read View中trx_ids的活跃事务,不包括当前事务自己和已提交的事务(正在内存中)

mysql的undo001和undo_001 mysql undolog mvcc_Server_08


creator_trx_id:当前创建事务的ID,是一个递增的编号(这个编号并不是DB_ROW_ID)

mysql的undo001和undo_001 mysql undolog mvcc_Server_09

六、可见性比较算法

在innodb中,创建一个新事务后,执行第一个select语句的时候,innodb会创建一个快照(read view),快照中会保存系统当前不应该被本事务看到的其他活跃事务id列表(即trx_ids)。当用户在这个事务中要读取某个记录行的时候,innodb会将该记录行的DB_TRX_ID与该Read View中的一些变量进行比较,判断是否满足可见性条件。

假设当前事务要读取某一个记录行,该记录行的DB_TRX_ID(即最新修改该行的事务ID)为trx_id,Read View的活跃事务列表trx_ids中最早的事务ID为up_limit_id,将在生成这个Read Vew时系统出现过的最大的事务ID+1记为low_limit_id(即还未分配的事务ID)。也就是【up_limit_id,low_limit_id】

mysql的undo001和undo_001 mysql undolog mvcc_Server_10

具体的比较算法如下:

  1. 如果 trx_id < up_limit_id, 那么表明“最新修改该行的事务”在“当前事务”创建快照之前就提交了,所以该记录行的值对当前事务是可见的。跳到步骤5。
  2. 如果 trx_id >= low_limit_id, 那么表明“最新修改该行的事务”在“当前事务”创建快照之后才修改该行,所以该记录行的值对当前事务不可见。跳到步骤4。
  3. 如果 up_limit_id <= trx_id < low_limit_id, 表明“最新修改该行的事务”在“当前事务”创建快照的时候可能处于“活动状态”或者“已提交状态”;所以就要对活跃事务列表trx_ids进行查找(源码中是用的二分查找,因为是有序的):(简单来讲,能找到不可见,不能找到可见。)
    (1) 如果在活跃事务列表trx_ids中能找到 id 为 trx_id 的事务,表明在“当前事务”创建快照前,“该记录行的值”被“id为trx_id的事务”修改了,但没有提交;或者在“当前事务”创建快照后,“该记录行的值”被“id为trx_id的事务”修改了(不管有无提交);这些情况下,这个记录行的值对当前事务都是不可见的,跳到步骤4;
    (2)在活跃事务列表中找不到,则表明“id为trx_id的事务”在修改“该记录行的值”后,在“当前事务”创建快照前就已经提交了,所以记录行对当前事务可见,跳到步骤5。
  4. 在该记录行的 DB_ROLL_PTR 指针所指向的undo log回滚段中,取出最新的的旧事务号DB_TRX_ID, 将它赋给trx_id,然后跳到步骤1重新开始判断。
  5. 将该可见行的值返回。

例子参考: (非常详细)

Read Commit和Repetable Read都用这个算法,区别在于Read Commit在一个事务中,每次读时都会创建read view,而Repetable Read只创建一次read view。

参考:
https://dev.mysql.com/doc/refman/5.6/en/innodb-consistent-read.html https://blog.jcole.us/2014/04/16/the-basics-of-the-innodb-undo-logging-and-history-system/ http://ddrv.cn/a/247577