Hi,
I'd like to report two bugs in the wxCondition code located in method wxConditionInternal::WaitTimeout(unsigned long milliseconds) that gave me quite a struggle detecting them.
The current implementation looks like this (comments shortened):
01 wxCondError wxConditionInternal::WaitTimeout(unsigned long milliseconds) 02 { 03 { 04 wxCriticalSectionLocker lock(m_csWaiters); 05 m_numWaiters++; 06 } 07 08 m_mutex.Unlock(); 09 // a race condition can occur at this point in the code 10 wxSemaError err = m_semaphore.WaitTimeout(milliseconds); 11 12 if ( err == wxSEMA_TIMEOUT ) 13 { 14 // another potential race condition exists here it is caused when a 15 wxCriticalSectionLocker lock(m_csWaiters); 16 wxSemaError err2 = m_semaphore.WaitTimeout(0); 17 18 if ( err2 != wxSEMA_NO_ERROR ) 19 m_numWaiters--; 20 } 21 22 m_mutex.Lock(); 23 24 return err == wxSEMA_NO_ERROR ? wxCOND_NO_ERROR 25 : err == wxSEMA_TIMEOUT ? wxCOND_TIMEOUT 26 : wxCOND_MISC_ERROR; 27 }
The first issue is in row 16. The result of the second call to semaphore.WaitTimeout is stored in a temporal variable err2. So if the signal is actually received in this second call the value of err == wxSEMA_TIMEOUT while err2 == wxCOND_NO_ERROR. In the end of the method err is returned, but err2 would be the correct return value.
The second problem occurs on row 22 and is a third race condition. Imagine you have one main and one worker thread. The main thread waits in wxConditionInternal::WaitTimeout while the worker thread wants to send a signal once it has finished processing.
Now assume the main thread is waiting to gain ownership of the mutex (row 22) and has not yet received a signal. => m_numWaiters (== 0) has been decreased again in row 19. Then the worker thread (which already has ownership of this mutex) signals. Because m_numWaiters == 0 this signal is lost (wxConditionInternal::Signal() doesn't do anything in this case.) In the main thread wxConditionInternal::WaitTimeout returns with wxCOND_TIMEOUT although the signal was sent!
Here is my proposal to fix both errors:
01 wxCondError wxConditionInternal::WaitTimeout(unsigned long milliseconds) 02 { 03 { 04 wxCriticalSectionLocker lock(m_csWaiters); 05 m_numWaiters++; 06 } 07 08 m_mutex.Unlock(); 09 // a race condition can occur at this point in the code 10 wxSemaError err = m_semaphore.WaitTimeout(milliseconds); 11 // another potential race condition exists here it is caused when a 12 m_mutex.Lock(); 13 14 if ( err == wxSEMA_TIMEOUT ) 15 { 16 wxCriticalSectionLocker lock(m_csWaiters); 17 err = m_semaphore.WaitTimeout(0); 18 19 if ( err == wxSEMA_TIMEOUT ) 20 m_numWaiters--; 21 } 22 23 return err == wxSEMA_NO_ERROR ? wxCOND_NO_ERROR 24 : err == wxSEMA_TIMEOUT ? wxCOND_TIMEOUT 25 : wxCOND_MISC_ERROR; 26 }
I removed the temporal variable err2, moved m_mutex.Lock() right after m_semaphore.WaitTimeout() and replaced "err2 != wxSEMA_NO_ERROR" by "err == wxSEMA_TIMEOUT". The last change is just a unification respecting row 14. I have tested these changes, rebuilt wxWidgets and the multithreading part in my program http://sourceforge.net/projects/freefilesync finally worked perfectly. I hope you can fix this for the next version of wxWidgets. Best regards, ZenJu