简介

greenplum是一个面向OLAP场景的开源分布式数据库,其在OLTP场景也具有众多应用,如银行,金融以及物流等领域。在分布式系统中确保分布式事务的一致性是重点研究对象之一,常见的策略如下:两阶段提交、三阶段提交、TCC以及基于paxos等分布式提交协议算法。

1 关键数据结构

GlobalTransactionData 全局事务结构体信息,描述其处prepare或即将prepared转态信息:包含事务ID、prepare 日志起始/末尾lsn以及QD与QE间的暗号 gid

/*
 * This struct describes one global transaction that is in prepared state
 * or attempting to become prepared.
 *
 * The lifecycle of a global transaction is:
 *
 * 1. After checking that the requested GID is not in use, set up an entry in
 * the TwoPhaseState->prepXacts array with the correct GID and valid = false,
 * and mark it as locked by my backend.
 *
 * 2. After successfully completing prepare, set valid = true and enter the
 * referenced PGPROC into the global ProcArray.
 *
 * 3. To begin COMMIT PREPARED or ROLLBACK PREPARED, check that the entry is
 * valid and not locked, then mark the entry as locked by storing my current
 * backend ID into locking_backend.  This prevents concurrent attempts to
 * commit or rollback the same prepared xact.
 *
 * 4. On completion of COMMIT PREPARED or ROLLBACK PREPARED, remove the entry
 * from the ProcArray and the TwoPhaseState->prepXacts array and return it to
 * the freelist.
 *
 * Note that if the preparing transaction fails between steps 1 and 2, the
 * entry must be removed so that the GID and the GlobalTransaction struct
 * can be reused.  See AtAbort_Twophase().
 *
 * typedef struct GlobalTransactionData *GlobalTransaction appears in
 * twophase.h
 */

typedef struct GlobalTransactionData
{
	GlobalTransaction next;		/* list link for free list */
	int			pgprocno;		/* ID of associated dummy PGPROC */
	BackendId	dummyBackendId; /* similar to backend id for backends */
	TimestampTz prepared_at;	/* time of preparation */

	/*
	 * Note that we need to keep track of two LSNs for each GXACT. We keep
	 * track of the start LSN because this is the address we must use to read
	 * state data back from WAL when committing a prepared GXACT. We keep
	 * track of the end LSN because that is the LSN we need to wait for prior
	 * to commit.
	 */
	XLogRecPtr	prepare_start_lsn;	/* XLOG offset of prepare record start */
	XLogRecPtr	prepare_end_lsn;	/* XLOG offset of prepare record end */
	TransactionId xid;			/* The GXACT id */

	Oid			owner;			/* ID of user that executed the xact */
	BackendId	locking_backend;	/* backend currently working on the xact */
	bool		valid;			/* true if PGPROC entry is in proc array */
	bool		ondisk;			/* true if prepare state file is on disk */
	bool		inredo;			/* true if entry was added via xlog_redo */
	char		gid[GIDSIZE];	/* The GID assigned to the prepared xact */
}			GlobalTransactionData;

TMGXACT:全局事务信息

typedef struct TMGXACT
{
	/*
	 * Like PGPROC->xid to local transaction, gxid is set if distributed
	 * transaction needs two-phase, and it's reset when distributed
	 * transaction ends, with ProcArrayLock held.
	 */
	DistributedTransactionId	gxid;       // 用于两阶段提交 
	/*
	 * This is similar to xmin of PROC, stores lowest dxid on first snapshot
	 * by process with this as MyTmGxact.
	 */
	DistributedTransactionId	xminDistributedSnapshot;

	bool						includeInCkpt;
	int							sessionId;       // sessionId 标识
}	TMGXACT;

2PC 状态信息

/*
 * Two Phase Commit shared state.  Access to this struct is protected
 * by TwoPhaseStateLock.
 */
typedef struct TwoPhaseStateData
{
	/* Head of linked list of free GlobalTransactionData structs */
	GlobalTransaction freeGXacts;

	/* Number of valid prepXacts entries. */
	int			numPrepXacts;

	/* There are max_prepared_xacts items in this array */
	GlobalTransaction prepXacts[FLEXIBLE_ARRAY_MEMBER];
} TwoPhaseStateData;

static TwoPhaseStateData *TwoPhaseState;

2 源码流程解析

2.1 prepared 阶段

greendao 报错 the bind value at index 1 is null greenpla_全局事务


  

第一阶段:prepare

QD:调用 doPreparedTransaction发起 prepare

1)首先通过全局事务号获取gid [可以理解成同一个分布式事务QD与QE间的联系暗号,因为QD上会执行多个分布式事务,因此通过此暗号,QD与QE之间能够准确通信];

2)然后构建 prepare 消息并将其序列化,通过 libpq协议分发至此事务所涉及的QE。

QE:调用 performDtxprotocolCommand 进行 prepare
1 )解析并反序列化QD发送的prepare请求消息,进入相应的处理逻辑 PrepareTransaction;
2) QE在本地收集 2PC信息,包括 TwoPhaseFileHeader信息,事务锁、谓词锁和MultiXact事务信息 [在后续Commit Prepared或者 Rollback 操作会使用];
3)完成上述操作,在本地写 prerare 日志并持久化,释放此操作过程中所使用的资源【不包括QE事务本身所占用的资源】

若QD收到全部QE 成功prepare结果,则会在本地写 DISTRIBUTED_COMMIT 日志并刷盘,如未收到,会进行重试数次最后回滚该事务。
  

2.2 commit

greendao 报错 the bind value at index 1 is null greenpla_数据库架构_02

第二阶段:commmit
QD:完成一阶段提交后,调用 notifyCommittedDtxTransation 向QE发起 commit prepared 请求
1)首先通过全局事务号获取gid [可以理解成同一个分布式事务QD与QE间的联系暗号,因为QD上会执行多个分布式事务,因此通过此暗号,QD与QE之间能够准确通信];
2)然后构建 Commit prepared 消息并将其序列化,通过 libpq协议分发至此事务所涉及的QE。

QE:调用 performDtxprotocolCommand 进行 Commit prepared
[ DTX_PROTOCOL_COMMAND_COMMIT_PREPARED ]
1 )解析并反序列化QD发送的 Commit prepared 请求消息,进入相应的处理逻辑 performDtxProtocolCommitPrepared;
2) QE在本地开启新的事务,更新事务块和事务状态信息以及获取进行事务操作的相关资源
3)调用 FinishPreparedTransaction真正执行commit prepared操作:
a: 根据 gid 获取对应的全局事务信息,读取TwoPhaseFile文件,解析一阶段中记录的2PC转态数据信息【头信息、子事务、待删除/提交的数据库和表】;
b: 根据解析出来的信息写Commit Prepared日志并持久化[ 如果所有的QD日志均持久化,那么此分布式事务便完成,即使后续QD//QE宕机,都可以通过会回放此日志恢复至一致性状态];在本地写CLOG日志。
c: 更新全局变量 ShmemVariableCache->latestCompletedXid, 并从全局共享ProcArrary数组中移除此分布式事务对应的PROC结构体信息和 TwoPhaseState 事务信息
d: 最后从磁盘中移除 TwoPhaseState文件
4)释放此事务所占用的内存和锁等资源

QD收到QE结果有两种情况:

  1. 所有的QE均commit prepared成功,会调用 doInsertForgetCommitted 函数在本地写 XLOG_XACT_DISTRIBUTED_FORGET日志,已表明该分布式事务已正式提交完成。对于后续事务均可见。
    2)若只收到部分QE,则会发起重试执行上述同样的commit prepared步骤,超过一定次数均为收到全部commit结果,则会回滚此事务。
    最终释放此分布式事务所占用的各种资源如内存、锁等,更新系统信息