Remember to lock around all std::condition_variable “variables”

I recently received a question regarding our dispatch queue implementation:

Quote
I had a question concerning the locking of the mutex in the destructor to set the boolean quit_ variable. The way I see it, the mutex protects the queue access and does not really refer to the quit_ variable. When the variable has been set, the condition_variable is used to broadcast that change.

Question: Why is the mutex locked in this case? Is it to force a CPU cache synchronization?

This question is referring to the following snippet of code:

dispatch_queue::~dispatch_queue()
{
    printf("Destructor: Destroying dispatch threads...
");

    // Signal to dispatch threads that it's time to wrap up
    std::unique_lock<std::mutex> lock(lock_);
    quit_ = true;
    cv_.notify_all();
    lock.unlock();

    // Wait for threads to finish before we exit
    for(size_t i = 0; i < threads_.size(); i++)
    {
        if(threads_[i].joinable())
        {
            printf("Destructor: Joining thread %zu until completion
", i);
            threads_[i].join();
        }
    }
}

Which signals the condition_variable in the worker threads, as shown here:

void dispatch_queue::dispatch_thread_handler(void)
{
    std::unique_lock<std::mutex> lock(lock_);

    do
    {
        // Wait until we have data or a quit signal
        cv_.wait(lock, [this] {
            return (q_.size() || quit_);
        });

        // after wait, we own the lock
        if(!quit_ && q_.size())
        {
            auto op = std::move(q_.front());
            q_.pop();

            // unlock now that we're done messing with the queue
            lock.unlock();

            op();

            lock.lock();
        }
    } while(!quit_);
}

I vaguely recalled that I needed to lock around the quit_ variable to avoid a race condition during the destruction process, where my destructor would never complete because some of the worker threads would never wake up, see the quit_ setting, and join(). Eventually, I found the original source of the change (hidden away in an archived project in a commit made years ago) and confirmed this memory. Unfortunately, at that time I was much less diligent at documenting the rationales and references that lead to my conclusion, so I did some digging on why that lock was necessary.

Note
Links and quotes from my research can be found in the References section below.

Fundamentally, you must use both modify and test the shared variable (that which is referenced by the condition_variable) using the same mutex in order to assure that threads are blocked and will see the condition_variable signal. As cppreference clearly lays out:

Quote
The thread that intends to modify the shared variable has to
  1. acquire a std::mutex (typically via lock_guard)
  2. perform the modification while the lock is held
  3. execute notify_one or notify_all on the std::condition_variable (the lock does not need to be held for notification)

Even if the shared variable is atomic, it must be modified under the mutex in order to correctly publish the modification to the waiting thread.

Based on our conversation, related questions on Stack Overflow, and discussions in the C++ Core Guidelines GitHub repository, it seems that a failure to lock around shared variables is a common condition_variable mistake.

Our conversation took another useful turn when exploring this question:

Should you unlock the mutex before or after you invoke notify_one/notify_all on the condition variable?

In the original dispatch queue code, I manually unlocked the mutex before invoking notify, which is a pattern I picked up from the cppreference example code:

    // Manual unlocking is done before notifying, to avoid waking up
    // the waiting thread only to block again (see notify_one for details)
    lk.unlock();
    cv.notify_one();

However, it seems that the C++ Core Guidelines discussions on GitHub prefer this general guideline:

Quote
When notifying a condition_variable, default to doing it before you unlock the mutex instead of after. Rationale: It is never wrong to unlock after the notify, and it is sometimes wrong to unlock before the notify. And there is no performance penalty in unlocking after, as good implementations take this coding pattern into account.

While I agree that unlocking after notifying the condition_variable is a sensible default guideline, I do find “there is no performance penalty” to be an overly strong statement. It seems that there are implementations of pthread that will notice the spurious wakeup case and avoid it, which motivated that claim. However, if you are implementing a condition_variable with an RTOS, be aware that you may be invoking a wakeup and block scenario. As with all optimizations, however, you should confirm that this is a) actually occurring and b) causing a problem in your system.

Why Do You Still Need a Lock if the Variable is Atomic?

Even if you have an atomic variable, you still need the mutex. Without it, there is still the potential to miss the notification under a case of a wake-up interleaved with an update to the atomic variable. The problematic scenario is the interleaving of these steps, which can happen with/without atomic:

  1. Worker thread is awake for some reason, acquires the mutex, checks the predicate, and at the time it checks quit_ is false and queue is empty. It must wait on the CV again, but this is not completed yet.
  2. Main thread sets quit_ to true
  3. Main thread issues notification, which is not received by the awake worker thread yet because it is not waiting on the CV
  4. Waiting thread waits on the CV, but does not ever wake up because the notification is already sent and the main thread is waiting to join.

In other words, the mutex is for synchronization on the condition_variable, not the synchronization on quit_.

Now, you can use an atomic without a lock to set the quit_ variable for the threads, but you would need a different wakeup strategy, i.e. quit_ isn’t part of the predicate in the condition_variable wait(). You would need some other logic to make you wake up in this situation, and then check the quit_ flag.

Review





References

  • Implementing an Asynchronous Dispatch Queue
  • std::condition_variable – cppreference.com

    The condition_variable class is a synchronization primitive that can be used to block a thread, or multiple threads at the same time, until another thread both modifies a shared variable (the condition), and notifies the condition_variable.

    The thread that intends to modify the shared variable has to

    1. acquire a std::mutex (typically via lock_guard)
    2. perform the modification while the lock is held
    3. execute notify_one or notify_all on the std::condition_variable (the lock does not need to be held for notification)

    Even if the shared variable is atomic, it must be modified under the mutex in order to correctly publish the modification to the waiting thread.

  • CP guidelines for condition variables · Issue #554 · isocpp/CppCoreGuidelines

    Condition variables are not semaphores. Notifications will be missed if they are sent when no other thread is blocked waiting on the condition variable, so you must not just rely on notify_one() or notify_all() to signal another thread, you must always have some predicate that is tested, e.g. a boolean flag (protected by the same mutex that is used when waiting on the condition variable). With a predicate that tests some condition, even if the notification is sent before the other thread waits on the condition variable then it won’t be “missed” because the predicate will be true and the wait will return immediately.

    unlocking the mutex before notifying is an optimisation, and not essential. I intentionally didn’t do that, to keep the example simple. There could be a second guideline about that point, but it’s not related to the “always use a predicate” rule. I would object strongly to complicating the example by doing that.

    Failing to use a mutex associated with a condition variable is one of the most common mistakes

  • Renaming CP.25 raii_thread, and fixing CP.26 detached_thread · Issue #925 · isocpp/CppCoreGuidelines

    When notifying a condition_variable, default to doing it before you unlock the mutex instead of after. Rationale: It is never wrong to unlock after the notify, and it is sometimes wrong to unlock before the notify. And there is no performance penalty in unlocking after, as good implementations take this coding pattern into account. [PJ: There may not be a performance penalty on pthread, but there may be on in a custom implementation for an RTOS – but that is RTOS dependent.]

  • std::condition_variable::notify_one – cppreference.com

    The effects of notify_one()/notify_all() and each of the three atomic parts of wait()/wait_for()/wait_until()(unlock+wait, wakeup, and lock) take place in a single total order that can be viewed as modification order of an atomic variable: the order is specific to this individual condition variable. This makes it impossible for notify_one() to, for example, be delayed and unblock a thread that started waiting just after the call to notify_one() was made.

    The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock. However, some implementations (in particular many implementations of pthreads) recognize this situation and avoid this “hurry up and wait” scenario by transferring the waiting thread from the condition variable’s queue directly to the queue of the mutex within the notify call, without waking it up.

    Notifying while under the lock may nevertheless be necessary when precise scheduling of events is required, e.g. if the waiting thread would exit the program if the condition is satisfied, causing destruction of the notifying thread’s condition variable. A spurious wakeup after mutex unlock but before notify would result in notify called on a destroyed object.

  • CP: std::condition_variable(), unlock mutex before notfifying waiting thread(s)? · Issue #1272 · isocpp/CppCoreGuidelines
  • c++ – std::conditional_variable::notify_all does not wake up all the threads – Stack Overflow

    This is the normal pattern for using condition variables correctly – you need to both test and modify the condition you want to wait on within the same mutex.

  • c++ – Sync is unreliable using std::atomic and std::condition_variable – Stack Overflow

    This works very well if threads enter the fence over a period of time. However, if they try to do it almost simultaneously, it seems to sometimes happen that between the atomic decrementation (1) and starting the wait on the conditional var (3), the thread yields CPU time and another thread decrements the counter to zero (1) and fires the cond. var (2). This results in the previous thread waiting forever in (3), because it starts waiting on it after it has already been notified.

  • c++ – Shared atomic variable is not properly published if it is not modified under mutex – Stack Overflow
  • c++ – Does notify happen-before wakeup of wait? – Stack Overflow

4 Replies to “Remember to lock around all std::condition_variable “variables””

  1. Is there any reason the bool quit_ isn’t an atomic boolean std::atomic<bool>? If quit_ were atomic I think you could avoid the lock in the destructor entirely. Atomically setting quit_ to true before notification would ensure that all working threads that aren’t blocked yet would see the change and the notify_all() would ensure any already waiting threads would wake up and see quit_.

  2. The silly reason it isn’t atomic is that for whatever reason I didn’t have a lock-free atomic implementation for our platform at the time I wrote this (we do now), and so I would just be using two locks.

    But atomic doesn’t obviate the need for it to use a mutex – you still have the potential to miss the notification under a case of a wakeup interleaved with an update to the atomic variable. The problematic scenario is the interleaving of these steps, which can happen with/without atomic:

    1. Worker thread is awake for some reason, acquires the mutex, checks the predicate, and at the time it checks quit_ is false and queue is empty. It must wait on the CV again, but this is not completed yet.
    2. Main thread sets quit_ to true
    3. Main thread issues notification, which is not received by the awake worker thread yet because it is not waiting on the CV
    4. Waiting thread waits on the CV, but does not ever wake up because the notification is already sent and the main thread is waiting to join.

    In other words, the mutex is for synchronization on the CV, not the synchronization on quit_.

    Now, you can use an atomic without a lock to set the quit_ variable for the threads, but you would need a different wakeup strategy, i.e. quit_ isn’t part of the predicate in the condition_variable wait(). You would need some other logic to make you wake up in this situation, and then check the quit_ flag.

    Quoting from cppreference.com:

    Even if the shared variable is atomic, it must be modified under the mutex in order to correctly publish the modification to the waiting thread.

    Here’s a C++ Core Guidelines discussion mentioning that atomics need mutexes: https://github.com/isocpp/CppCoreGuidelines/issues/554

    And stack overflow questions on the topic:
    c++ – Sync is unreliable using std::atomic and std::condition_variable – Stack Overflow
    c++ – Shared atomic variable is not properly published if it is not modified under mutex – Stack Overflow
    c++ – Does notify happen-before wakeup of wait? – Stack Overflow

  3. “you can use an atomic without a lock to set the quit_ variable for the threads, but you would need a different wakeup strategy, quit_ isn’t part of the predicate in the condition_variable wait(). You would need some other logic to make you wake up in this situation, and then check the quit_ flag.” How to achieve this goal? I thought and thought, but I cann’t find the answer.

    1. Hi jhone,

      Specifically, you would need a condition that a) does not involve the quit_ variable, and b) has logic which will let the thread wake from the condition so that the body of the thread can then check for quit_ and terminate. Above, our condition is to check that the queue is not empty. This isn’t a reliable check for exiting the condition to then run the thread loop and check the quit_ flag. We may have quit_ set and an empty queue. To wake the thread in that case, we could insert an object into the queue to ensure they wait, but that would still require the use of a lock.

      A trivial example might be a condition based on an execution_active_ boolean flag. When that flag is true, you can use an atomic around quit_ – the thread logic would check for quit_. The exact condition here doesn’t matter – just something that will ensure we can leave the condition variable and see the quit_ flag in the run loop.

      Another way to think about it might be a deferred action – e.g., “quit next time you’re awake” – you have some condition, and the next time the condition is true you see the quit_ is set and and exit. If you’re not trying to force threads to quit, this might work for you.

      Of course, I anchored it to quit_ logic because that’s what’s referenced in the article. But if you have some condition + an associated flag, a deferred action on that flag the next time the condition is true, you can get away with a std::atomic. But that flag cannot be used within the condition predicate logic itself. In practice, is this useful? Probably in limited scenarios. Going back to the execution_active_ example, if execution_active_ == false, like the case above, you will then need to lock a mutex, set the variable, and notify the thread. Sure, now quit_ is atomic… but you didn’t eliminate the need to use a lock to exit the thread. At best, you eliminated the need to always use a lock when quitting. Hardly an optimization worth the effort. Perhaps a deferred action is useful in some scenarios – such as a setting that should take effect during the next execution loop. But for an action like quitting, probably not that great of a design choice.

Share Your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.