Wednesday, January 18, 2006

Fun with threads

When it came time to rewrite the X-Plane threading code (which has to run on three APIs) I did what any modern lazy programmer does: I typed “posix threads condition variable” into Google and got this helpful page.

http://www.cs.wustl.edu/~schmidt/win32-cv-1.html

I make no vouchers for how accurate its contents are, but I can tell you that the authors are a lot smarter than I am; I say that because I made the mistake of changing their code and induced a bunch of bugs. But I learned something in the process!

Posix provides condition variables. Basically one or more threads block on the condition variable, and when the condition variable is signaled, one or more threads wakes up and does something useful. When you block on a condition variable, you specify a mutex to release - that release is atomic, meaning no other thread can acquire that mutex you are dropping before you are asleep and blocking on the condition variable. When your thread wakes up from the condition variable, it then acquires the mutex. (This is not atomic - it’s very possible that when you wake up the mutex is locked by someone else and you’ll then go right back to sleep waiting for the mutex. This sounds to me like the recipe for a box-car problem, but that’s off-topic.)

The zen of the condition variable is this: when you have some kind of producer/consumer problem you use a mutex to protect the precious internals of your object and the condition variable to implement blocking operations. A blocking operation acquires the mutex, looks at the internals, decides it needs to sleep, and atomically drops the mutex and blocks on the condition variable. When it fully wakes up (having been signaled by the condition variable and reacquired the mutex) your thread now can re-examine the internals and decide whether it can proceed.

The pseudo-code looks something like this:

void blocking_op ()
{
lock_mutex(my mutex);
mark guts that I am sleeping
while (guts say I am not ready)
wait on condition (my condition, my mutex)
guts are ready - do something with them
mark guts that I am not sleeping
release mutex (my mutex)
}
void unblocking_op()
{
lock_mutex(my mutex);
if (someone is sleeping)
signal
do something with guts
unlock_mutex(my mutex);
}

My utilization of this was for a message queue, but the technique can work with any producer/consumer-type problem. Now being the rocket scientist I am, I looked at this code and said “while loop - who needs it? The condition variable is always signaled when the unblocking op has occurred - why would the guts not be ready?

Turns out there is a race condition, as the paper above describes. Here’s how it works:

First we must understand that the blocking operation only sleeps on the condition variable when it cannot immediately run. In other words, if our message queue has a message and the mutex is free, we just run straight through without any blocking.

Second we must understand that waking up from the condition variable and acquiring the mutex are not atomic. They cannot be - at the instant the condition is signaled the mutex is never free (because it is owned by the unblocking thread). Even if this was not true (I believe that signaling from outside the mutex would introduce a race condition - but this is not the race condition we are looking for) another thread could certainly own the mutex for random reasons. So by Murphy’s Law of threaded code, something is going to happen between when our signalled blocked thread wakes up and when it can actually look at the guts of our object and do something.

The problem is: what states is our object in during this time? In our message queue’s case, the message queue has a message in it. So if a second reader thread comes along and acquires the mutex before our first reader wakes up and acquires it, then the second reader will consume the message without ever sleeping and by the time our first reader actually wakes up, the message is gone!

This is why the while loop is important. The while loop allows the first reader who has had his message stolen to go back to sleep and wait for the next one.

(An implication of all this is that our message queue based on condition variables is not “fair” - if two readers both read the message queue, the earlier one before it is empty the second one after it is filled, the second one may win the message without blocking, while the first one may stay slept. This is probably good for certain applications in some ways, but the thrashing of the slept thread may be unacceptable for others.)

If there’es a lesson from all of this, I think it’s this: threads are fundamentally complex and non-trivial to code. Think carefully before using threads about what the real benefit is and be sure it outweighs the cost of more complex development, less debuggable code and increased maintenance costs.

No comments:

Post a Comment