Implementing an Asynchronous Dispatch Queue with FreeRTOS

29 January 2018 by Phillip Johnston • Last updated 15 December 2021

We previously provided an implementation of a dispatch queue using ThreadX RTOS primitives.

In this article, I’ll provide an example C++ dispatch queue implementation using the popular FreeRTOS.

We’ll start with a review of what dispatch queues are. If you’re familiar with them, feel free to skip to the following section.

Table of Contents:

A Review of Dispatch Queues
A C++11 and FreeRTOS Dispatch Queue
Thread Priorities and Time Slicing
Putting it All Together
Further Reading

A Review of Dispatch Queues

A dispatch queue contains multiple generic-use threads and a work queue. Consumers can dispatch standalone functional operations to the work queue. Each thread pulls from the work queue (or sleeps and waits for new work).

To quote Apple on the advantages of using dispatch queues instead of threads:

It reduces the memory penalty your application pays for storing thread stacks in the application’s memory space.
It eliminates the code needed to create and configure your threads.
It eliminates the code needed to manage and schedule work on threads.
It simplifies the code you have to write.

These benefits are pretty real and tangible. As we saw in The Problem With Threads”, threading introduces nondeterminism into our system. By controlling our threading models using concurrent and serial dispatch queues, we gain a better grasp on the nondeterminism of our system.

The dispatch queue concept simplifies many of the threading scenarios encountered in embedded programming. Often, I just need to run small simple tasks asynchronously without blocking the primary thread. This results in spawning numerous threads with single small purposes:

When user presses a button, update the drawing on the screen
When charging is complete, change LEDs and notify the system
When recording starts, turn on an LED and start drawing the elapsed record time on the screen

These simple steps can run on any generic thread. These trivial operations don’t require the overhead of explicit thread management, excessive context switching, and higher potential for other threading errors.

A C++11 and FreeRTOS Dispatch Queue

We’ll be modifying the C++11 dispatch queue implementation to use FreeRTOS RTOS primitives instead of the C++11 types std::mutex, std::thread, and std::condition_variable. We will stick to C++11 features in places where RTOS primitives are not required.

FreeRTOS Requirements

The asynchronous dispatch queue shown below requires the following FreeRTOS headers:

#include <freertos/FreeRTOS.h>
#include <freertos/task.h>
#include <freertos/event_groups.h>
#include <freertos/semphr.h>

The following FreeRTOSConfig.h settings are required for this project:

configSUPPORT_DYNAMIC_ALLOCATION is set to 1 (or undefined)
INCLUDE_eTaskGetState is set to 1
configUSE_TIME_SLICING is set to 1

Additionally, make sure that FreeRTOS/source/event_groups.c is included in your build.

`std::function` refresher

std::function is a useful C++11 feature for capturing Callable objects. As a refresher:

Instances of std::function can store, copy, and invoke any Callable target — functions, lambda expressions, bind expressions, or other function objects, as well as pointers to member functions and pointers to data members.

For this example, we will prototype our function objects as:

typedef std::function<void(void)> fp_t;

Bounce Refresher

FreeRTOS is implemented in C, and our dispatch queue is being implemented in C++. We’ll need to utilize the bounce function to make sure our FreeRTOS thread interfaces with the correct object’s dispatch handler. For more information on the bounce function, please see the bounce article.

Here’s the implementation of bounce that we will use:

/// This Bounce implementation is pulled from bounce.cpp
template<class T, class Method, Method m, class ...Params>
static auto bounce(void *priv, Params... params) ->
        decltype(((*reinterpret_cast<T *>(priv)).*m)(params...))
{
    return ((*reinterpret_cast<T *>(priv)).*m)(params...);
}

/// Convenience macro to simplify bounce statement usage
#define BOUNCE(c,m) bounce<c, decltype(&c::m), &c::m>

A Queue of Functions

The primary purpose of using a dispatch queue is to provide a first-in, first-out processing model.

C++ luckily provides us a simple std::queue type which we can use for this purpose:

std::queue<fp_t> q_;

To add to the queue we push:

q_.push(op);

And to get the next item:

auto op = q_.front(); //get the front item
q_.pop(); //and pop it from the queue

Allocating Queue Threads

Our goal is to make our dispatch queue generic enough that we can change the number of threads for each queue we create. This allows us to create concurrent queues that allow generic tasks to run in parallel, as well as serial queues that only utilize one thread to protect a resource.

Instead of using a std::vector of std::thread, we’ll instead build a container based on the FreeRTOS type TaskHandle_t:

/// Thread type
struct freertos_thread_t {
    TaskHandle_t thread;
    std::string name;
};

Each thread’s stack and name will be tracked with the internal thread object. We’ll then create a std::vector of freertos_thread_t to keep track of our dispatch threads:

std::vector<freertos_thread_t> threads_;

Making Our Dispatch Queue Thread-Safe

Our dispatch queue is a shared resource in two potential directions:

Any thread can add work to the queue
The queue may have multiple threads which remove work from the queue for processing

In order to make sure we implement this safely, we must rely on a locking mechanism. In this case we will utilize FreeRTOS’s “semaphore” type, which also works as a mutex:

SemaphoreHandle_t mutex_;

The queue itself is the critical piece, so we will lock around all queue modifications.

Constructing Our Dispatch Queue

Our FreeRTOS dispatch queue constructor is responsible for instantiating three components:

The internal mutex which protects the work queue
The event flags which wake the threads
The worker threads

Our constructor prototype will also take an additional function argument: thread_stack_size. This can have a default value (such as 1KB). You can also specify a custom value during construction.

dispatch_queue(std::string name, size_t thread_cnt = 1,
    size_t thread_stack_size) :
            name_{std::move(name)},
        threads_(thread_cnt)

Creating the mutex and event flags structures involve straightforward FreeRTOS calls:

// Create the Mutex
mutex_ = xSemaphoreCreateRecursiveMutex();
assert(mutex_ != NULL && "Failed to create mutex!");

// Create the event flags
notify_flags_ = xEventGroupCreate();
assert(notify_flags_ != NULL && "Failed to create event group!");

When constructing our dispatch queue, we can specify the number of threads desired. Our constructor does the work of creating the required number of freertos_thread_t objects in our std::vector container. For each thread, we’ll need to create a unique thread name and create the thread.

In this example, I’ve chosen the xTaskCreate API, which allocates thread stacks from the heap automatically. You can also use the xTaskCreateStatic API if you wish to provide your own thread stack buffers.

In order for FreeRTOS to find its way to the correct dispatch_queue object, we’ll utilize BOUNCE to make sure we get back to the correct object:

reinterpret_cast<void(*)(ULONG)>(
    BOUNCE(dispatch_queue, dispatch_thread_handler)),
reinterpret_cast<ULONG>(this),

Here’s our full thread initialization loop:

// Dispatch thread setup
for(size_t i = 0; i < threads_.size(); i++)
{
    // Define the name
    threads_[i].name = std::string("Dispatch Thread " +
        std::to_string(i));

    // Create the thread
    BaseType_t status = xTaskCreate(
            reinterpret_cast<void(*)(void*)>(
                BOUNCE(dispatch_queue,
                dispatch_thread_handler)),
            threads_[i].name.c_str(),
            thread_stack_size,
            reinterpret_cast<void*>(this),
            DISPATCH_Q_PRIORITY,
            &threads_[i].thread);
    assert(status == pdPASS && "Failed to create thread!");
}

Note that the xTaskCreate function requires you to specify a thread priority. For this example, I’ve defined a default value:

/// Example thread priority and time slice
#define DISPATCH_Q_PRIORITY 15

For further discussion on selecting thread priority, see Thread Priorities and Time Slicing below.

Dispatch Thread Handler Requirements

The dispatch queue worker thread handler should be a simple one. Its only requirements are:

Wait until there is something to run
Pop that item from the queue
Run the item
Check whether I need to quit, if not: wait again

Once we understand our requirements for the worker threads, we encounter a question: how do I know that there’s something to execute without keeping these threads awake?

Event Flags: Our Condition Variable Replacement

Instead of using std::condition_variable to wake threads when work is ready, we will utilize the FreeRTOS built-in event flags type:

/// FreeRTOS event flags - like condition variable
EventGroupHandle_t notify_flags_;

We will define two event flags to be used by the queue. One flag will tell threads to wake up, and the other flag will be set when a thread exits.

/// Definitions for dispatch event flags
#define DISPATCH_WAKE_EVT    (0x1)
#define DISPATCH_EXIT_EVT    (0x2)

Adding Work to the Queue

We can let our threads sleep until there is work in the queue. By setting an event flag, the next available thread will wake up, remove work from the queue, and execute.

The mutex will always protect our queue, so we need to lock and unlock before pushing a new piece of work onto the queue.

void dispatch_queue::dispatch(const fp_t& op)
{
    BaseType_t status = xSemaphoreTakeRecursive(mutex_,
        portMAX_DELAY);
    assert(status == pdTRUE && "Failed to lock mutex!");

    q_.push(op);

    status = xSemaphoreGiveRecursive(mutex_);
    assert(status == pdTRUE && "Failed to unlock mutex!");

    // Notifies threads that new work is in the queue
    xEventGroupSetBits(notify_flags_, DISPATCH_WAKE_EVT);
}

Exiting

The next question is: how do I know when to stop running and exit?

The simplest way is to add an exit_ or active_ boolean flag to our dispatch queue. When instructed to stop() or when destructing the queue, you can set this flag, notify threads that they need to wake up, and wait for confirmation that they have finished.

Because FreeRTOS does not have its own “join” function, we will imitate the behavior. We’ll tell threads to wake up until we have confirmation that every thread is destroyed. We set the “wake” flag to wake up any remaining threads, and we wait for an “exit” event. Because we are not guaranteed that threads will exit in order, we will utilize a timeout on with xEventGroupWaitBits. This timeout allows us to continue through our loop even if all threads have exited.

Each thread will delete itself once woken by the exit notification, so our “join” emulation will wait for each thread to report an eDeleted status.

dispatch_queue::~dispatch_queue()
{
    BaseType_t status;

    // Signal to dispatch threads that it's time to wrap up
    quit_ = true;

    // We will join each thread to confirm exiting
    for (size_t i = 0; i < threads_.size(); ++i) {
        eTaskState state;

        do {
            // Signal wake - check exit flag
            xEventGroupSetBits(notify_flags_,
                DISPATCH_WAKE_EVT);

            // Wait until a thread signals exit.
            // Timeout is acceptable.
            xEventGroupWaitBits(notify_flags_,
                DISPATCH_EXIT_EVT,
                pdTRUE, pdFALSE, 10);

            // If it was not thread_[i], that is ok,
            // but we will loop around
            // until threads_[i] has exited
            state = eTaskGetState(threads_[i].thread);
        } while (state != eDeleted);

        threads_[i].name.clear();
    }

    // Cleanup event flags and mutex
    vEventGroupDelete(notify_flags_);

    vSemaphoreDelete(mutex_);
}

We can then add state checking to the thread handler. The thread handler can monitor the quit_ flag and exit when requested.

The thread handler will also need to set the DISPATCH_EXIT_EVT flag when quitting to work with the logic shown above:

// Set a signal to indicate a thread exited
status = xEventGroupSetBits(notify_flags_, DISPATCH_EXIT_EVT);
assert(status == pdTRUE && "Failed to set event flags!");

After setting the notification, the worker thread then deletes itself:

// Delete the current thread
vTaskDelete(NULL);

Dispatch Thread Handler Implementation

In our worker thread, we primarily sleep until is new work. Upon waking, the thread will take the lock, get an item from the queue, and resume operation.

If there is no work to execute, we will release the lock and sleep until new work is in the queue.

void dispatch_queue::dispatch_thread_handler(void)
{
    BaseType_t status = xSemaphoreTakeRecursive(mutex_,
        portMAX_DELAY);
    assert(status == pdTRUE && "Failed to lock mutex!");

    do {
        //after wait, we own the lock
        if(q_.size() && !quit_)
        {
            auto op = std::move(q_.front());
            q_.pop();

            //unlock now that we're done messing with the queue
            status = xSemaphoreGiveRecursive(mutex_);
            assert(status == pdTRUE && 
                "Failed to unlock mutex!");

            op();

            status = xSemaphoreTakeRecursive(mutex_,
                portMAX_DELAY);
            assert(status == pdTRUE && "Failed to lock mutex!");
        }
        else if(!quit_)
        {
            status = xSemaphoreGiveRecursive(mutex_);
            assert(status == pdTRUE && 
                  "Failed to unlock mutex!");

            // Wait for new work - clear flags on exit
            xEventGroupWaitBits(notify_flags_,
                DISPATCH_WAKE_EVT,
                pdTRUE, pdFALSE,
                portMAX_DELAY);

            status = xSemaphoreTakeRecursive(mutex_,
                portMAX_DELAY);
            assert(status == pdTRUE && "Failed to lock mutex!");
        }
    } while (!quit_);

    // We were holding the mutex after we woke up
    status = xSemaphoreGiveRecursive(mutex_);
    assert(status == pdTRUE && "Failed to unlock mutex!");

    // Set a signal to indicate a thread exited
    status = xEventGroupSetBits(notify_flags_,
        DISPATCH_EXIT_EVT);
    assert(status == pdTRUE && "Failed to set event flags!");

    // Delete the current thread
    vTaskDelete(NULL);
}

Thread Priorities and Time Slicing

Selecting thread priorities and ensuring your system runs smoothly without any priority inversions can be a difficult task. In general, your dispatch queue should have a mid-level or low thread priority. If the priority of the dispatch queue is too high, low-priority asynchronous work may end up starving the CPU and blocking primary system threads from running.

If you need queues of differing priorities, you can always create multiple queues and utilize different priorities in each queue.

FreeRTOS enables time slicing by default, but you may want to double-check that your configuration doesn’t have it disabled. Make sure that configUSE_TIME_SLICING is undefined or set to 1.

By default (if configUSE_TIME_SLICING is not defined, or if configUSE_TIME_SLICING is defined as 1) FreeRTOS uses prioritised preemptive scheduling with time slicing. That means the RTOS scheduler will always run the highest priority task that is in the Ready state, and will switch between tasks of equal priority on every RTOS tick interrupt.

Putting it all Together

I’ve added the complete FreeRTOS dispatch queue implementation to GitHub.

Because the example uses FreeRTOS function calls, I have only built the example as a static library. It will not link or execute unless you supply a FreeRTOS library for your platform.

For how to use the dispatch queue, please see the original dispatch example:

dispatch_queue q("Phillip's Demo Dispatch Queue", 4);

q.dispatch([]{printf("Dispatch 1!\n");});
q.dispatch([]{printf("Dispatch 2!\n");});
q.dispatch([]{printf("Dispatch 3!\n");});
q.dispatch([]{printf("Dispatch 4!\n");});

Major Revisions

20190909:
- The constructor now moves name into the member variable
20190627:
- Corrected erroneous note about FreeRTOS disabling time slicing by default (Thanks Kamil Kisiel!)

Related Terms:

6 Replies to “Implementing an Asynchronous Dispatch Queue with FreeRTOS”

Kamil Szczygieł says:

10 February 2018 at 17:18

Great post with a great explanation and a lot of useful details – thanks a lot!

It’s a pity that this implementation uses both std::function and std::queue – both of these classes use dynamically allocated memory. This would prevent use of presented dispatch queue from the place it would be very useful (at least in my opinion/experience) – from interrupt handlers.

As an author of a C++ RTOS – http://distortos.org/ – I still have such dispatch queue on my to-do list. I’ll try to implement it with statically allocated memory only (possible with some use of raw buffers + std::bind + static queues I already have), as in almost every project I do when I use interrupts I find myself in a need of such a feature.

Regards,
Kamil Szczygieł
Phillip Johnston says:

10 February 2018 at 18:33

Hi Kamil,

Thanks for commenting. Your point is valid, and you are correct that this will not work with an interrupt queue. I actually have an implementation which works an IRQ handle (using a bottom half / top half handler strategy). I planned to discuss in the future, so your comment encourages me to move that up in my article pipeline. I’d love to have a deeper conversation about your C++ RTOS and IRQ queue – if you’re interested, please send me an email (About -> Contact Us).

In general, I just aim to provide illustrative examples. As always with embedded systems, needs vary greatly. Some systems don’t worry about static memory allocation at all, and others disallow any dynamic memory allocation and solely rely on static memory assignments. It’s hard to make examples that suit everyone’s needs all in one bucket. My general hope is that if the overall structure and concept is sensible, people can tune implementations to fir their specific needs in the future.
Kamil Szczygieł says:

11 February 2018 at 16:40

Just to clarify – I’m not against dynamic allocation in embedded systems. However, in general access to heap is serialized, most likely with a mutex. Trying to lock that mutex from interrupt handler (when trying to allocate dynamic memory) will either fail or will cause the system to crash. Both options are pretty bad, as with C++ exceptions disabled you cannot really report an error to the caller /; The only possibility for dynamic allocation to (sort-of) work in interrupt handlers is to serialize access to heap by disabling all interrupts, but this still leaves the problem of how to report failure without C++ exceptions…

Life in embedded is so much harder than on desktop (;

Anyway – I’ll be waiting for an article you mentioned above! The concept of dispatch queue and the multitude of details you provide (links, other articles, …) is very useful – I’ve bookmarked your article, so I could use it when implementing dispatch queue for distortos (;
Kamil Kisiel says:

20 June 2019 at 23:28

FreeRTOS actually does enable time slicing by default: https://www.freertos.org/a00110.html#configUSE_TIME_SLICING

"By default (if configUSE_TIME_SLICING is not defined, or if configUSE_TIME_SLICING is defined as 1) FreeRTOS uses prioritised preemptive scheduling with time slicing. That means the RTOS scheduler will always run the highest priority task that is in the Ready state, and will switch between tasks of equal priority on every RTOS tick interrupt."
Nicolas Samus says:

14 May 2020 at 08:17

Hi Phillip, very interesting post. I will start reading apple’s article soon to go deeper on this.

Just a brief question. I tried to implement this the uC i’m currently working on (stm32L4 family) and I found that heap requirements exceed my uc scope. Just wanted to know (as I’m new in C++) in wich range of memory this code can be implemented. My current uC has 64kb of ram and even assigning 32kb to the heap I couldn’t pass the initialization of the queue.

Thanks in advance!
Phillip Johnston says:

14 May 2020 at 11:15

Hi Nicolas,

Can you tell me how you are initializing the dispatch queue? What number of threads and what thread stack size value?

I don’t have exact measurements on heap requirements handy. I’ve most recently used a derivative of this dispatch queue example on an nRF52 with FreeRTOS (using the Embedded Template Library for alternates to std::vector and std::function). I know that’s ~4x the RAM, but I’d expect this would work fine on 64kb of memory, depending on how the threads themselves are configured.

Share Your ThoughtsCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Implementing an Asynchronous Dispatch Queue with FreeRTOS

A Review of Dispatch Queues

A C++11 and FreeRTOS Dispatch Queue

FreeRTOS Requirements

`std::function` refresher

Bounce Refresher

A Queue of Functions

Allocating Queue Threads

Making Our Dispatch Queue Thread-Safe

Constructing Our Dispatch Queue

Dispatch Thread Handler Requirements

Event Flags: Our Condition Variable Replacement

Adding Work to the Queue

Exiting

Dispatch Thread Handler Implementation

Thread Priorities and Time Slicing

Putting it all Together

Further Reading:

Using C++ Without the Heap

Major Revisions

6 Replies to “Implementing an Asynchronous Dispatch Queue with FreeRTOS”

Share Your ThoughtsCancel reply

A Review of Dispatch Queues

A C++11 and FreeRTOS Dispatch Queue

FreeRTOS Requirements

std::function refresher

Bounce Refresher

A Queue of Functions

Allocating Queue Threads

Making Our Dispatch Queue Thread-Safe

Constructing Our Dispatch Queue

Dispatch Thread Handler Requirements

Event Flags: Our Condition Variable Replacement

Adding Work to the Queue

Exiting

Dispatch Thread Handler Implementation

Thread Priorities and Time Slicing

Putting it all Together

Further Reading:

Using C++ Without the Heap

Major Revisions

6 Replies to “Implementing an Asynchronous Dispatch Queue with FreeRTOS”

Share Your ThoughtsCancel reply

`std::function` refresher