小心pthread_cond_signal和SetEvent之间的差异

 

今天帮同事查一个多线程的BUG,其中一个线程挂在g_cond_wait上不动了。从代码来看,看出不出任何问题,g_cond_wait和g_cond_signal是严格配对的。折腾了两个小时后,从LOG信息中发现,g_cond_wait和g_cond_signal的顺序有点问题,一个线程先调g_cond_signal,另外一个线程才调g_cond_wait。

 

g_cond_signal是glib的封装,在Linux下,是用pthread_cond_signal模拟的,在Win32下,是用SetEvent模拟的。在Win32下,SetEvent和WaitForSingleObject在两个线程中的调用顺序没有关系,奇怪,难道在linux下两者的调用顺序有影响吗?

 

看了pthread的代码,果然如此:pthread_cond_signal发现没有其它线程等待,它直接返回了(见用红色高亮的代码)。



int pthread_cond_signal(pthread_cond_t *cond)

{

    if (cond == NULL)

        return pth_error(EINVAL, EINVAL);

    if (*cond == PTHREAD_COND_INITIALIZER)

        if (pthread_cond_init(cond, NULL) != OK)

            return errno;

    if (!pth_cond_notify((pth_cond_t *)(*cond), FALSE))

        return errno;

    return OK;

}

int pth_cond_notify(pth_cond_t *cond, int broadcast)

{      

    /* consistency checks */

    if (cond == NULL)

        return pth_error(FALSE, EINVAL);

    if (!(cond->cn_state & PTH_COND_INITIALIZED))

        return pth_error(FALSE, EDEADLK);

   

    /* do something only if there is at least one waiters (POSIX semantics) */

    if (cond->cn_waiters > 0) {


        /* signal the condition */

        cond->cn_state |= PTH_COND_SIGNALED;

        if (broadcast)

            cond->cn_state |= PTH_COND_BROADCAST;

        else

            cond->cn_state &= ~(PTH_COND_BROADCAST);

        cond->cn_state &= ~(PTH_COND_HANDLED);

   

        /* and give other threads a chance to awake */

        pth_yield(NULL);

    }

 

    /* return to caller */

    return TRUE;

}


 

晚上回家后,我又看了reactos关于SetEvent的实现。结果也意料之中:没有线程等待这个Event时,它仍然会设置SignalState(见用红色高亮的代码)。



LONG

STDCALL

KeSetEvent(PKEVENT Event,

           KPRIORITY Increment,

           BOOLEAN Wait)

{

    KIRQL OldIrql;

    LONG PreviousState;

    PKWAIT_BLOCK WaitBlock;

 

    DPRINT("KeSetEvent(Event %x, Wait %x)/n",Event,Wait);

 

    /* Lock the Dispathcer Database */

    OldIrql = KeAcquireDispatcherDatabaseLock();

 

    /* Save the Previous State */

    PreviousState = Event->Header.SignalState;

 

    /* Check if we have stuff in the Wait Queue */

    if (IsListEmpty(&Event->Header.WaitListHead)) {

 

        /* Set the Event to Signaled */

        DPRINT("Empty Wait Queue, Signal the Event/n");

        Event->Header.SignalState = 1;

    } else {

 

        /* Get the Wait Block */

        WaitBlock = CONTAINING_RECORD(Event->Header.WaitListHead.Flink,

                                      KWAIT_BLOCK,

                                      WaitListEntry);

 

 

        /* Check the type of event */

        if (Event->Header.Type == NotificationEvent || WaitBlock->WaitType == WaitAll) {

 

            if (PreviousState == 0) {

 

                /* We must do a full wait satisfaction */

                DPRINT("Notification Event or WaitAll, Wait on the Event and Signal/n");

                Event->Header.SignalState = 1;

                KiWaitTest(&Event->Header, Increment);

            }

 

        } else {

 

            /* We can satisfy wait simply by waking the thread, since our signal state is 0 now */

            DPRINT("WaitAny or Sync Event, just unwait the thread/n");

            KiAbortWaitThread(WaitBlock->Thread, WaitBlock->WaitKey, Increment);

        }

    }

 

    /* Check what wait state was requested */

    if (Wait == FALSE) {

 

        /* Wait not requested, release Dispatcher Database and return */

        KeReleaseDispatcherDatabaseLock(OldIrql);

 

    } else {

 

        /* Return Locked and with a Wait */

        KTHREAD *Thread = KeGetCurrentThread();

        Thread->WaitNext = TRUE;

        Thread->WaitIrql = OldIrql;

    }

 

    /* Return the previous State */

    DPRINT("Done: %d/n", PreviousState);

    return PreviousState;

}

 


 

而在KeWaitForSingleObject中,它发现SignalState大于0,就会Wait成功(见用红色高亮的代码)。



NTSTATUS

STDCALL

KeWaitForSingleObject(PVOID Object,

                      KWAIT_REASON WaitReason,

                      KPROCESSOR_MODE WaitMode,

                      BOOLEAN Alertable,

                      PLARGE_INTEGER Timeout)

{

         ...

if (CurrentObject->Header.SignalState > 0)

        {

            /* Another satisfied object */

            KiSatisfyNonMutantWait(CurrentObject, CurrentThread);

            WaitStatus = STATUS_WAIT_0;

            goto DontWait;

        }

...

}


 

由此可见,glib封装的g_cond_signal/g_cond_wait在Win32下和Linux下行为并不完全一致。即使不使用glib的封装,自己封装或者直接使用时,也要小心这个微妙的陷阱。

 

~~end~~