Threading

FreeRTOS Task Notifications: A Lightweight Method for Waking Threads

I was recently implementing a FreeRTOS-based system and needed a simple way to wake my thread from an ISR. I was poking around the FreeRTOS API manual looking at semaphores when I discovered a new feature: task notifications.

FreeRTOS claims that waking up a task using the new notification system is ~45% faster and uses less RAM than using a binary semaphore.

The following APIs are used to interact with task notifications:

The notification system utilizes the 32-bit task handle (TaskHandle_t) that is set during the task creation process:

BaseType_t xTaskCreate(    TaskFunction_t pvTaskCode,
                            const char * const pcName,
                            unsigned short usStackDepth,
                            void *pvParameters,
                            UBaseType_t uxPriority,
                            TaskHandle_t *pxCreatedTask
                          );

Task notifications can be used to emulate mailboxes, binary semaphores, counting semaphores, and event groups.

While task notifications have speed and RAM advantages over other FreeRTOS structures, they are limited in that they can only be used to wake a single task with an event. If multiple tasks must be woken by an event, you must utilize the traditional structures.

Real World Example

I took advantage of the task notifications to implement a simple thread-waking scheme.

I have a button which can be used to request a system power state change. The power state changes can take quite a bit of time, and we don't want to be locked up inside of our interrupt handler for 1-10s.

Instead, I created a simple thread which sleeps until a notification is received. When the thread wakes, we check to see if there is an action we need to take:

void powerTaskEntry(__unused void const* argument)
{
    static uint32_t thread_notification;

    while(1)
    {
        /* Sleep until we are notified of a state change by an 
        * interrupt handler. Note the first parameter is pdTRUE, 
        * which has the effect of clearing the task's notification 
        * value back to 0, making the notification value act like
        * a binary (rather than a counting) semaphore.  */

        thread_notification = ulTaskNotifyTake(pdTRUE, 
                        portMAX_DELAY);

        if(thread_notification)
        {
            system_power_evaluate_state();
        }
    }
}

In order to wake the thread, we send a notification from the interrupt handler:

void system_power_interrupt_handler(uint32_t time)
{
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;

    if(time < SYS_RESET_SHUTDOWN_THRESHOLD_MS)
    {
        // Perform a reset of computers
        computer_reset_request_ = RESET_ALL_COMPUTERS;
    }
    else if((time >= SYS_RESET_SHUTDOWN_THRESHOLD_MS) &&
            (time < SYS_RESET_HARD_CUTOFF_THRESHOLD_MS))
    {
        system_power_request_state(SYSTEM_POWER_LOW);
    }
    else
    {
        system_power_request_state(SYSTEM_POWER_OFF_HARD);
    }

    // Notify the thread so it will wake up when the ISR is complete
    vTaskNotifyGiveFromISR(powerTaskHandle, 
                         &xHigherPriorityTaskWoken);
    portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
}

Note the use of portYIELD_FROM_ISR(). This is required when waking a task from an interrupt handler. If vTaskNotifyGiveFromISR indicates that a higher priority task is being woken, the portYIELD_FROM_ISR() routine will context switch to that task after returning from the ISR.

Failure to use this function will result in execution will resuming at the previous point rather than switching to the new context.

Further Reading

Protothreads: a Lightweight Threading Solution for Resource-constrained Systems

The protothreads library provides a lightweight, stackless threading solution targeted for resource-constrained embedded systems. Protothreads provides an event-driven system with a blocking context, without the overhead of per-thread stacks. And I'm serious when I say "lightweight": each protothread requires only 2 bytes of memory! The library is implemented in pure C and can be used with any architecture.

Unlike traditional multithreading solutions, the application is responsible for handling the scheduling protothreads. Protothreads are driven by repeated calls to a protothread function (from a runloop, for example). Each time the a protothread function is invoked, the protothread will run until it blocks or exits.

Protothreads is nicely coupled with event-driven embedded systems or systems whose memory resources are too constrained for an RTOS.

Important Usage Notes

It is important to keep in mind the following usage notes:

  1. Avoid using local variables within a protothread, because they do not save stack context across a blocking call. The local variable values are not preserved.
  2. A protothread cannot use a switch statement due to the implementation, which takes advantage of C switch behavior.
  3. A protothread runs within a single C function and cannot span over other functions.
  4. A protothread may call normal C functions, but cannot block inside a called function. If you wish to call a blocking function, you must spawn a separate protothread. This provides an explicit indication of which functions may block and which functions are not able to.
  5. Threads run on the same stack and use stack unwinding for context switching.

Protothread Examples

Protothreads are used by the Aachen Robotics Club in the modm and xpcc frameworks. The protothread website also contains examples.

Included below is a very simple protothreads example. We have one thread which waits until the counter variable reaches 1000, then it prints a string and resets the count. The main() function invokes the thread in a runloop and increases the count.

#include "pt.h"

static int counter;
static struct pt example_pt;

static
PT_THREAD(example(struct pt *pt))
{
  PT_BEGIN(pt);

  while(1) {
    PT_WAIT_UNTIL(pt, counter == 1000);
    printf("Threshold reached\n");
    counter = 0;
  }

  PT_END(pt);
}

int main(void)
{
  counter = 0;
  PT_INIT(&example_pt);

  while(1) {
    example(&example_pt);
    counter++;
  }
  return 0;
}

Further Reading

Refactoring the ThreadX Dispatch Queue To Use std::mutex

Now that we've implemented std::mutex for an RTOS, let's refactor a library using RTOS-specific calls so that it uses std:mutex instead.

Since we have a ThreadX implementation for std::mutex, let's update our ThreadX-based dispatch queue. Moving to std::mutex will result in a simpler code structure. We still need to port std::thread and std::condition_variable to achieve true portability, but utilizing std::mutex is still a step in the right direction.

For a quick refresher on dispatch queues, refer to following articles:

Table of Contents

  1. How std::mutex Helps Us
    1. C++ Mutex Wrappers
  2. Refactoring the Asynchronous Dispatch Queue
    1. Class Definition
    2. Constructor
    3. Destructor
    4. Dispatch
    5. Thread Handler
  3. Putting It All Together
  4. Further Reading

How std::mutex Helps Us

Even though we can't yet make our dispatch queue fully portable, we still benefit from using std::mutex in the following ways:

  1. We no longer have to worry about initializing or deleting our mutex since the std::mutex constructor and destructor handles that for us
  2. We can take advantage of RAII to lock whenever we enter a scope, and to automatically unlock when we leave that scope
  3. We can utilize standard calls (with no arguments!), reducing the burden of remembering the exact ThreadX functions and arguments

But these arguments might not have real impact. Just take a look at the ThreadX native calls:

uint8_t status = tx_mutex_get(&mutex_, TX_WAIT_FOREVER);

// do some stuff

status = tx_mutex_put(&mutex_);

And here's the std::mutex equivalent:

mutex_.lock()

//do some stuff

mutex_.unlock()

Don't you prefer the std::mutex version?

C++ Mutex Wrappers

While we could manually call lock() and unlock() on our mutex object, we'll utilize two helpful C++ mutex constructs: std::lock_guard and std::unique_lock.

The std::lock_guard wrapper provides an RAII mechanism for our mutex. When we construct a std::lock_guard, the mutex starts in a locked state (or waits to grab the lock). Whenever we leave that scope the mutex will be released automatically. A std::lock_guard is especially useful in functions that can return at multiple points. No longer do you have to worry about releasing the mutex at each exit point: the destructor has your back.

We'll also take advantage of the std::unique_lock wrapper. Using std::unique_lock provides similar benefits to std::lock_guard: the mutex is locked when the std::unique_lock is constructed, and unlocked automatically during destruction. However, it provides much more flexibility than std::lock_guard, as we can manually call lock() and unlock(), transfer ownership of the lock, and use it with condition variables.

Refactoring the Asynchronous Dispatch Queue

We will utilize both std::lock_guard and std::unique_lock to simplify our ThreadX dispatch queue.

Our starting point for this refactor will be the dispatch_threadx.cpp file in the embedded-resources repository.

Class Definition

In our dispatch class, we need to change the mutex definition from TX_MUTEX to std::mutex:

std::mutex mutex_;

Constructor

Mutex initialization is handled for us by the std::mutex constructor. We can remove the tx_mutex_create call from our dispatch queue constructor:

// Initialize the Mutex
uint8_t status = tx_mutex_create(&mutex_, "Dispatch Mutex", TX_INHERIT);
assert(status == TX_SUCCESS && "Failed to create mutex!");

Destructor

Mutex deletion is handled for us by the std::mutex destructor. We can remove the tx_mutex_delete call from the dispatch queue destructor:

status = tx_mutex_delete(&mutex_);
assert(status == TX_SUCCESS && "Failed to delete mutex!");

Dispatch

By using std::lock_guard, we can remove both the mutex get and put calls. RAII will ensure that the mutex is unlocked when we leave the function.

Here's the dispatch() implementation using std::lock_guard:

void dispatch_queue::dispatch(const fp_t& op)
{
    std::lock_guard<std::mutex> lock(mutex_);

    q_.push(op);

    // Notifies threads that new work has been added to the queue
    tx_event_flags_set(&notify_flags_, DISPATCH_WAKE_EVT, TX_OR);
}

If you still wanted to unlock before setting the event flag, use std::unique_lock instead of std::lock_guard. Using std::unique_lock allows you to call unlock().

void dispatch_queue::dispatch(const fp_t& op)
{
    std::unique_lock<std::mutex> lock(mutex_);

    q_.push(op);

    lock.unlock();

    // Notifies threads that new work has been added to the queue
    tx_event_flags_set(&notify_flags_, DISPATCH_WAKE_EVT, TX_OR);
}

Either approach is acceptable and looks much cleaner than the native calls.

Why would you potentially care about calling unlock()? If you are using std::lock_guard, it is possible that the event flags will wake a thread, go to grab the mutex, and then sleep again until the dispatch() function exits. However, the dispatch() function will just release the mutex and the thread that is waiting will wake up and resume operation.

Thread Handler

We need to manually lock and unlock around specific points in our thread handler. Instead of std::lock_guard, we will use std::unique_lock so we can call unlock().

Here's our simplified thread handler:

void dispatch_queue::dispatch_thread_handler(void)
{
    ULONG flags;
    uint8_t status;

    std::unique_lock<std::mutex> lock(mutex_);

    do {
        //after wait, we own the lock
        if(q_.size() && !quit_)
        {
            auto op = std::move(q_.front());
            q_.pop();

            //unlock now that we're done messing with the queue
            lock.unlock();

            op();

            lock.lock();
        }
        else if(!quit_)
        {
            lock.unlock();

            // Wait for new work
            status = tx_event_flags_get(&notify_flags_, 
                DISPATCH_WAKE_EVT, 
                TX_OR_CLEAR,
                &flags, 
                TX_WAIT_FOREVER);
            assert(status == TX_SUCCESS && 
                "Failed to get event flags!");

            lock.lock();
        }
    } while (!quit_);

    // We were holding the mutex after we woke up
    lock.unlock();

    // Set a signal to indicate a thread exited
    status = tx_event_flags_set(&notify_flags_, 
        DISPATCH_EXIT_EVT, TX_OR);
    assert(status == TX_SUCCESS && "Failed to set event flags!");
}

Looks a bit saner already!

Putting It All Together

Example source code can be found in the embedded-resources GitHub repository. The original ThreadX dispatch queue implementation can also be found in embedded-resources.

To build the example, run make at the top-level or inside of the examples/cpp folder.

The example is built as a static library. ThreadX headers are provided in the repository, but not binaries or source.

As we implement std::thread and std::condition_variable in the future, we will simplify our RTOS-based dispatch queue even further.

Further Reading