小心pthread_cond_signal和SetEvent之间的差异
今天帮同事查一个多线程的BUG,其中一个线程挂在g_cond_wait上不动了。从代码来看,看出不出任何问题,g_cond_wait和g_cond_signal是严格配对的。折腾了两个小时后,从LOG信息中发现,g_cond_wait和g_cond_signal的顺序有点问题,一个线程先调g_cond_signal,另外一个线程才调g_cond_wait。
g_cond_signal是glib的封装,在Linux下,是用pthread_cond_signal模拟的,在Win32下,是用SetEvent模拟的。在Win32下,SetEvent和WaitForSingleObject在两个线程中的调用顺序没有关系,奇怪,难道在linux下两者的调用顺序有影响吗?
看了pthread的代码,果然如此:pthread_cond_signal发现没有其它线程等待,它直接返回了(见用红色高亮的代码)。
int pthread_cond_signal(pthread_cond_t *cond) { if (cond == NULL) return pth_error(EINVAL, EINVAL); if (*cond == PTHREAD_COND_INITIALIZER) if (pthread_cond_init(cond, NULL) != OK) return errno; if (!pth_cond_notify((pth_cond_t *)(*cond), FALSE)) return errno; return OK; } int pth_cond_notify(pth_cond_t *cond, int broadcast) { /* consistency checks */ if (cond == NULL) return pth_error(FALSE, EINVAL); if (!(cond->cn_state & PTH_COND_INITIALIZED)) return pth_error(FALSE, EDEADLK);
/* do something only if there is at least one waiters (POSIX semantics) */ if (cond->cn_waiters > 0) { /* signal the condition */ cond->cn_state |= PTH_COND_SIGNALED; if (broadcast) cond->cn_state |= PTH_COND_BROADCAST; else cond->cn_state &= ~(PTH_COND_BROADCAST); cond->cn_state &= ~(PTH_COND_HANDLED);
/* and give other threads a chance to awake */ pth_yield(NULL); }
/* return to caller */ return TRUE; } |
晚上回家后,我又看了reactos关于SetEvent的实现。结果也意料之中:没有线程等待这个Event时,它仍然会设置SignalState(见用红色高亮的代码)。
LONG STDCALL KeSetEvent(PKEVENT Event, KPRIORITY Increment, BOOLEAN Wait) { KIRQL OldIrql; LONG PreviousState; PKWAIT_BLOCK WaitBlock;
DPRINT("KeSetEvent(Event %x, Wait %x)/n",Event,Wait);
/* Lock the Dispathcer Database */ OldIrql = KeAcquireDispatcherDatabaseLock();
/* Save the Previous State */ PreviousState = Event->Header.SignalState;
/* Check if we have stuff in the Wait Queue */ if (IsListEmpty(&Event->Header.WaitListHead)) {
/* Set the Event to Signaled */ DPRINT("Empty Wait Queue, Signal the Event/n"); Event->Header.SignalState = 1; } else {
/* Get the Wait Block */ WaitBlock = CONTAINING_RECORD(Event->Header.WaitListHead.Flink, KWAIT_BLOCK, WaitListEntry);
/* Check the type of event */ if (Event->Header.Type == NotificationEvent || WaitBlock->WaitType == WaitAll) {
if (PreviousState == 0) {
/* We must do a full wait satisfaction */ DPRINT("Notification Event or WaitAll, Wait on the Event and Signal/n"); Event->Header.SignalState = 1; KiWaitTest(&Event->Header, Increment); }
} else {
/* We can satisfy wait simply by waking the thread, since our signal state is 0 now */ DPRINT("WaitAny or Sync Event, just unwait the thread/n"); KiAbortWaitThread(WaitBlock->Thread, WaitBlock->WaitKey, Increment); } }
/* Check what wait state was requested */ if (Wait == FALSE) {
/* Wait not requested, release Dispatcher Database and return */ KeReleaseDispatcherDatabaseLock(OldIrql);
} else {
/* Return Locked and with a Wait */ KTHREAD *Thread = KeGetCurrentThread(); Thread->WaitNext = TRUE; Thread->WaitIrql = OldIrql; }
/* Return the previous State */ DPRINT("Done: %d/n", PreviousState); return PreviousState; }
|
而在KeWaitForSingleObject中,它发现SignalState大于0,就会Wait成功(见用红色高亮的代码)。
NTSTATUS STDCALL KeWaitForSingleObject(PVOID Object, KWAIT_REASON WaitReason, KPROCESSOR_MODE WaitMode, BOOLEAN Alertable, PLARGE_INTEGER Timeout) { ... if (CurrentObject->Header.SignalState > 0) { /* Another satisfied object */ KiSatisfyNonMutantWait(CurrentObject, CurrentThread); WaitStatus = STATUS_WAIT_0; goto DontWait; } ... } |
由此可见,glib封装的g_cond_signal/g_cond_wait在Win32下和Linux下行为并不完全一致。即使不使用glib的封装,自己封装或者直接使用时,也要小心这个微妙的陷阱。
~~end~~