Pitfalls of POSIX condition variables.

I'm reading this awesome book about operating systems. I find that with OSs, you can't understand things by just reading the text. You have to write code, write out things on paper and read certain passages multiple times. That's the difference between learning and reading. Hopefully I'm learning something.

So this book has a nice chapter on condition variables. The way condition variables operate is fairly clear, but I found two pitfalls that I had to dig deeper into in order to see what's going on.

First, a crash course in how condition variables operate. I'm using shortened function names for readability. A condition variable is a variable that you can use to make a thread wait for some condition to be true. So roughly, it looks like this:

// Thread one.
lock(mutex);
wait(condition, mutex);
unlock(mutex);

// Thread two.
lock(mutex);
signal(condition);
unlock(mutex);

Let's say that the first thing that happens here is that thread one obtains the mutex. Thread two runs and blocks on its call to lock. Now thread one proceeds and calls wait(condition, mutex). What happens here is that mutex is unlocked and thread one goes to sleep. It's waiting for the condition to become true. Thread two is able to obtain the lock and it then signals to thread one that the condition is satisfied. Thread one obtains a lock on the mutex again and finishes its job.

Now that this is out of the way, let's look at the pitfalls.

Pitfall 1

When a thread signals a condition, if the thread holds a lock on the mutex, a waiting thread returns from wait() only after the first thread unlocks the mutex.

Looking at our example, intuitively it looks like when we call signal(), the first thread should return from wait. After all, the signal has been sent. But that's not the case. If thread two has more work to do after the signal, all of this will be done and just once unlock() is called, that's when thread one will return from the wait. Thread one will then lock the mutex and once it's done with its work, it will unlock it. This is way more logical than the mess that would happen if signal() makes wait() return immediately.

Pitfall 2

Calling signal() doesn't necessarily unblock all threads waiting on this signal.

Well this one caused me to scratch my head when I was going through one of the exercises in the book. I assumed a call to signal() simply unblocks all the waiting threads. No, it possibly unblocks only one of them. Apparently there's a pthread_cond_broadcast function that can unblock all waiting threads.

So that's it. Happy coding.

social