在PG9.6中有一个commit,优化了并发事务提交时对PROCARRAY加锁的竞争。
Reduce ProcArrayLock contention by removing backends in batches.
When a write transaction commits, it must clear its XID advertised via
the ProcArray, which requires that we hold ProcArrayLock in exclusive
mode in order to prevent concurrent processes running GetSnapshotData
from seeing inconsistent results. When many processes try to commit
at once, ProcArrayLock must change hands repeatedly, with each
concurrent process trying to commit waking up to acquire the lock in
turn. To make things more efficient, when more than one backend is
trying to commit a write transaction at the same time, have just one
of them acquire ProcArrayLock in exclusive mode and clear the XIDs of
all processes in the group. Benchmarking reveals that this is much
more efficient at very high client counts.

在了解这个优化之前先熟悉几个概念:

  • procArray:对应ProcArrayStruct结构体,用来维护当前系统中所有backend的PGPROC和PGXACT。
  • PGPROC:每个连接的backend都会在共享内存中有一个对应的PGPROC结构体,记录这个backend的一些状态信息,如pid,加锁信息,事务状态信息等等。
  • PGXACT:在PG9.2之前,PGXACT的这个结构体所保存的信息存储在PGPROC中。当前,这个结构体记录了backend的xid,xmin还有vacuum和checkpoint的一些相关信息。后来,测试发现,在CPU core很多的机器上面,如果将它与PGPROC分开存储可以有效加速GetSnapshotData的执行时间,因为可以减少因为cacheline失效而加载到cacheline中的数据量。

背景

在优化之前,每个backend在事务在commit/abort时,都要对其PGPROC和PGXACT中关于事务状态的属性进行设置,标记对应的事务不再运行,此时需要对ProcArray加一把LWLock排他锁进行保护(因为其它backend在开启事务时,会读取ProcArray中所有事务的状态,来计算snapshot)。这样在大并发OLTP类型的负载下,会有大量进程在事务commit时,对这把锁形成竞争加锁,使得性能不是很优。

ProcArray锁优化

在这个优化中,如果某个backend在提交事务时,如果可以加到ProcArray的LWLock排他锁,那么这个backend马上获取到ProcArray的LWLock排他锁,然后做事务清理的相关工作。如果获取不到ProcArray的LWLock排他锁, 那么这个backend不会像之前那样等待加该LWLock,而是将自己放入到一个group之中去,第一个进入该group的backend成为这个group的leader。然后由这个group的leader去申请加ProcArray的LWLock排他锁,并且最后将自己和其它backend事务进行清理。这个优化可以减少大并发下,对于ProcArray的LWLock的竞争,从而提升系统性能。

相关代码如下:

事务清理时,如果可以加到ProcArray的LWLock排他锁,立即加锁并且进行事务清理,否则加入到group,由group leader进行统一事务清理

/* If we can immediately acquire ProcArrayLock, we clear our own XID
* and release the lock. If not, use group XID clearing to improve
* efficiency. */
if (LWLockConditionalAcquire(ProcArrayLock, LW_EXCLUSIVE)){
ProcArrayEndTransactionInternal(proc, pgxact, latestXid);
LWLockRelease(ProcArrayLock);
} else
ProcArrayGroupClearXid(proc, latestXid);

backend标记自己需要进行组清理,如果该backend不是leader,那么会让自己进入休眠状态。group的leader会去获取ProcArray的LWLock排他锁,清理完自己和其它在这个group中backend的事务状态,最后唤醒其它backend。

/*
* ProcArrayGroupClearXid -- group XID clearing
*
* When we cannot immediately acquire ProcArrayLock in exclusive mode at
* commit time, add ourselves to a list of processes that need their XIDs
* cleared. The first process to add itself to the list will acquire
* ProcArrayLock in exclusive mode and perform ProcArrayEndTransactionInternal
* on behalf of all group members. This avoids a great deal of contention
* around ProcArrayLock when many processes are trying to commit at once,
* since the lock need not be repeatedly handed off from one committing
* process to the next.
*/
static void
ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
{
volatile PROC_HDR *procglobal = ProcGlobal;
uint32 nextidx;
uint32 wakeidx;

/* We should definitely have an XID to clear. */
Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));

/* Add ourselves to the list of processes needing a group XID clear. */
proc->procArrayGroupMember = true;
proc->procArrayGroupMemberXid = latestXid;
while (true)
{
nextidx = pg_atomic_read_u32(&procglobal->procArrayGroupFirst);
pg_atomic_write_u32(&proc->procArrayGroupNext, nextidx);

if (pg_atomic_compare_exchange_u32(&procglobal->procArrayGroupFirst,
&nextidx,
(uint32) proc->pgprocno))
break;
}

/*
* If the list was not empty, the leader will clear our XID. It is
* impossible to have followers without a leader because the first process
* that has added itself to the list will always have nextidx as
* INVALID_PGPROCNO.
*/
if (nextidx != INVALID_PGPROCNO)
{
int extraWaits = 0;

/* Sleep until the leader clears our XID. */
for (;;)
{
/* acts as a read barrier */
PGSemaphoreLock(&proc->sem);
if (!proc->procArrayGroupMember)
break;
extraWaits++;
}

Assert(pg_atomic_read_u32(&proc->procArrayGroupNext) == INVALID_PGPROCNO);

/* Fix semaphore count for any absorbed wakeups */
while (extraWaits-- > 0)
PGSemaphoreUnlock(&proc->sem);
return;
}

/* We are the leader. Acquire the lock on behalf of everyone. */
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);

/*
* Now that we've got the lock, clear the list of processes waiting for
* group XID clearing, saving a pointer to the head of the list. Trying
* to pop elements one at a time could lead to an ABA problem.
*/
while (true)
{
nextidx = pg_atomic_read_u32(&procglobal->procArrayGroupFirst);
if (pg_atomic_compare_exchange_u32(&procglobal->procArrayGroupFirst,
&nextidx,
INVALID_PGPROCNO))
break;
}

/* Remember head of list so we can perform wakeups after dropping lock. */
wakeidx = nextidx;

/* Walk the list and clear all XIDs. */
while (nextidx != INVALID_PGPROCNO)
{
PGPROC *proc = &allProcs[nextidx];
PGXACT *pgxact = &allPgXact[nextidx];

ProcArrayEndTransactionInternal(proc, pgxact, proc->procArrayGroupMemberXid);

/* Move to next proc in list. */
nextidx = pg_atomic_read_u32(&proc->procArrayGroupNext);
}

/* We're done with the lock now. */
LWLockRelease(ProcArrayLock);

/*
* Now that we've released the lock, go back and wake everybody up. We
* don't do this under the lock so as to keep lock hold times to a
* minimum. The system calls we need to perform to wake other processes
* up are probably much slower than the simple memory writes we did while
* holding the lock.
*/
while (wakeidx != INVALID_PGPROCNO)
{
PGPROC *proc = &allProcs[wakeidx];

wakeidx = pg_atomic_read_u32(&proc->procArrayGroupNext);
pg_atomic_write_u32(&proc->procArrayGroupNext, INVALID_PGPROCNO);

/* ensure all previous writes are visible before follower continues. */
pg_write_barrier();

proc->procArrayGroupMember = false;

if (proc != MyProc)
PGSemaphoreUnlock(&proc->sem);
}
}