How to fix order-violation bugs with condition variables.

Let's look at the following code and try to find the bug in it.

Thread 1:
void init() {
  ...
  mThread = PR_CreateThread(mMain, ...)
  ...
}

Thread 2:
void mMain(...) {
  ...
  mState = mThread->State;
  ...
}

The problem is that if thread two executes before thread one, it can access mThread which is not initialized. We need to make thread two wait for thread one. This is done with condition variables. But let's see how we're going to implement it.

Solution 1 (Incorrect) - Simple wait and signal

Let's say that we're not giving this a lot of thought and just put a wait and signal in the code. We have this code:

pthread_mutex_t mtLock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t mtCond = PTHREAD_COND_INITIALIZER;

Thread 1:
void init() {
  ...
  mThread = PR_CreateThread(mMain, ...)
  pthread_mutex_lock(&mtLock);
  pthread_cond_signal(&mtCond);
  pthread_mutex_unlock(&mtLock);
  ...
}

Thread 2:
void mMain(...) {
  ...
  pthread_mutex_lock(&mtLock);
  pthread_cond_wait(&mtCond, &mtLock);
  pthread_mutex_unlock(&mtLock);
  mState = mThread->State;
  ...
}

The problem here is that thread one can execute before thread two. In this case, thread one will signal on the condition variable, but nobody is waiting on it. Once thread two executes, it will start waiting and nobody is going to signal on the condition variable. Thus, thread two will stay in block state.

Solution 2 (Correct) - Using a synchronization variable.

  pthread_mutex_t mtLock = PTHREAD_MUTEX_INITIALIZER;
  pthread_cond_t mtCond = PTHREAD_COND_INITIALIZER;
  int mtInit = 0;

  Thread 1:
  void init() {
    ...
1   mThread = PR_CreateThread(mMain, ...)
2   pthread_mutex_lock(&mtLock);
3   mtInit = 1;
4   pthread_cond_signal(&mtCond);
5   pthread_mutex_unlock(&mtLock);
    ...
  }

  Thread 2:
  void mMain(...) {
    ...
6   pthread_mutex_lock(&mtLock);
7   if (mtInit == 0)
8     pthread_cond_wait(&mtCond, &mtLock);
9   pthread_mutex_unlock(&mtLock);
10  mState = mThread->State;
    ...
  }

Here we have used the mtInit variable to prevent the bug in Solution 1 from hapenning. Let's try to reproduce the conditions that led to a bug in out first solution. Let's say thread one executes before thread two. When thread two runs, it will see that mtInit is 1 and it won't wait. That basically solves our problem. Let's trace things more carfully with two possible scenarios. Of course, there are other possibilities too, but with these two I have enough confidence in the correctness of the solution.

Scenario 1 - Thread 1 before Thread 2.

Line   T1    T2   COMMENT
       1
       2          Locked mutex
             6    Blocked, because mutex is locked.
       3
       4          Signals on the condition variable.
       5          Unblocks mutex.
             6    Continues, mutex was unblocked.
             7    False, jump to line 9.
             9    Unlock mutex, done

So in this scenario, thread one obtains the lock first and the solution is correct.

Scenario 2 - Thread 2 before Thread 1.

Line   T1    T2   COMMENT
             6    Locked mutex.
             7
             8    Unlocks mutex, waiting on condition variable.
       1      
       2          Locks mutex.
       3      
       4          Signals, but mutex is still locked, so thread two doesn't do anything.
       5          Unlocks mutex. Now thread two can continue.
             8    Returns from wait(), locks mutex.
             9    Unlocks mutex.

This scenario works too.

There's one change that we should do though. Instead of an if statement in line 7, we should use a while. Why? Because when thread two wakes up in line 7 and obtains a lock on the mutex, it should re-verify that the condition that made it wait is not true anymore. While not a problem in this code segment, this can be a problem in other cases.

social