Implementing Malloc with ThreadX

On Wednesday, I demonstrated a free-list malloc implementation. However, what if you are already using an RTOS with a memory allocator?

Let's see how we can use our RTOS to build malloc and free.

ThreadX

ThreadX is an RTOS proced by Express Logic that is widely used. You will see many ThreadX examples on this website.

ThreadX provides two types of memory services: byte pools and block pools. For our purposes of implementing malloc, we'll create a byte pool. Let's take a look at the APIs and see what ThreadX provides.

Creating a byte pool

We notice immediately that ThreadX's memory management is a bit more complicated than a simple malloc(size_t size):

UINT tx_byte_pool_create(TX_BYTE_POOL *pool_ptr, const CHAR *name_ptr, VOID *pool_start, ULONG pool_size);

We are required to create and maintain a byte pool for our block of memory. This will require us to call an initialization function before we can use malloc.

Allocating Memory

Thankfully, allocating memory is simpler than initialization. We pass in the pointer to our byte pool, the pointer for the allocated memory address, the requested size, and a timeout. This should be pretty easy to work with.

UINT tx_byte_allocate(TX_BYTE_POOL *pool_ptr, VOID **memory_ptr, ULONG memory_size, ULONG wait_option);

Freeing Memory

Freeing memory in ThreadX is as easy as expected - we simply pass in the pointer to the memory we want to free.

UINT tx_byte_release(VOID *memory_ptr);

Initialization

We know from reviewing the APIs that we need to maintain a ThreadX memory pool for our malloc implementation:

// ThreadX internal memory pool stucture
static TX_BYTE_POOL malloc_pool_ = {0};

And this memory pool needs to be initialized. Let's prototype a function which we can use to initialize our malloc library:

int init_malloc(uintptr_t heap_start, size_t heap_size)

We can call the init API with the address of our heap in memory, and the size of the heap we want to allocate.

int init_malloc(uintptr_t heap_start, size_t heap_size)
{
    uint8_t r;

    /**
    * This is ThreadX's API to create a byte pool using a memory block.
    * We are essentially just wrapping the API into a simpler form
    */
    r = tx_byte_pool_create(&malloc_pool_, "Heap Memory Pool",
            (void *)heap_start,
            heap_size);

    return (r == TX_SUCCESS) ? 0 : -1;    
}

If tx_byte_pool_create fails for some reason, we will return -1 to the caller as an error code to indicate this failure.

malloc

Now that our byte pool is created, we can implement our malloc function:

void * malloc(size_t size)
{
    void * ptr = NULL;

    if(size > 0)
    {
        // We simply wrap the threadX call into a standard form
        uint8_t r = tx_byte_allocate(&malloc_pool_, &ptr, size,
            TX_WAIT_FOREVER);

        if(r != TX_SUCCESS)
        {
            ptr = NULL;
        }
    } 
    //else NULL if there was no size

    return ptr;
}

free

Our free implementation using ThreadX is very simple, since we just need to wrap the ThreadX call with no other modifications:

void free(void * ptr)
{
    if(ptr) {
        //We simply wrap the threadX call into a standard form
        uint8_t r = tx_byte_release(ptr);
    }
}

Hiding the Initialization

If you look closely, you will notice that the malloc implementation shown above isn't very safe. What if someone tries to call malloc before initialization is finished? There are no protections in place.

The immediate response is probably that we can add an initialized flag and check state. But what if we could remove the requirement to call init_malloc first? Could we hide the details to the end user, only providing a malloc API?

Functions Pointers & Atomics

It turns out that it's quite easy to hide initialization from the end user using three simple ideas:

  • malloc is defined as a function pointer to init_malloc
  • When init_malloc is called, it initializes the memory pool and then passes the requested size onto do_malloc, returning a pointer
  • We can use atomic operations and swap the malloc function pointer to do_malloc after initialization is completed so future calls will only allocate memory

So how do we need to modify our approach above to accomplish these goals?

Memory Definition

In order to define malloc as a function pointer to init_malloc and do_malloc, we need to make sure their function prototypes are the same. How can we remove the input dependences for init_malloc?

int init_malloc(uintptr_t heap_start, size_t heap_size)

Instead of passing in a memory address and size to our init_malloc routine, we should just define this in a platform specific header somewhere.

For example:

#define HEAP_START 0x10008000
#define HEAP_END 0x1000FFFF

You can check in your library if these symbols are defined. If they are not, throw an error:

#ifndef HEAP_START
#error Please define HEAP_START symbol!
#endif

#ifndef HEAP_END
#error Please define HEAP_END symbol!
#endif

Malloc as a Function Pointer

Now that we've abstracted memory location away from the init_malloc API, we can get to work on our function pointer approach.

malloc has a well-defined API, so let's just prototype our functions based on that:

typedef void *(* malloc_ptr_t)(size_t);

We can then morph init_malloc to match:

static void * init_malloc(size_t size);
static void * do_malloc(size_t size);

Now all that's left is simply declaring our malloc variable and initializing it to init_malloc:

volatile malloc_ptr_t malloc = &init_malloc;

Atomic Swap & init_malloc

The basis of our implementation lies in the atomic compare-and-swap functionality:

atomic_compare_and_swap(&malloc, &init_malloc, &do_malloc)

What does the compare-and-swap function actually do?

Since the operation is atomic, we are guaranteed that the value will not be changed by a competing thread during the function. We compare the object (malloc function pointer) to see if the value matches init_malloc or do_malloc. If the malloc pointer is still set to init_malloc, the value is changed to do_malloc and the function returns true. If the value is not set to init_malloc, the function returns false. Using this behavior, we can ensure that init_malloc will only be called once.

I should warn you that the atomic_compare_and_swap function above is not a standard prototype. C11 provides a stdatomic header that can be used, but I have chosen to simply use GNU's compare and swap. This function also works with clang.

#define atomic_compare_and_swap __sync_val_compare_and_swap

Since it's still theoretically possible for multiple functions to try to allocate memory first, we will still need a bool flag for other threads to spin on while the first thread finishes initialization:

volatile static bool initialized_ = false;

Now that we have all the basic pieces, let's see what our new init_malloc function looks like:

void * init_malloc(size_t size)
{
    assert(size > 0);

    uintptr_t heap_start = (uintptr_t)HEAP_START;
    uintptr_t heap_size = (uintptr_t)HEAP_END - heap_start;

    /*
    * When we enter into init_malloc, we check the current value 
    * of the malloc pointer. If it's still init_malloc, we swap 
    * the value to do_malloc and return true. If the value is 
    * do_malloc, another thread beat us to the punch and we fall
    * through to do_malloc(), skipping initialization.
    */

    if(atomic_compare_and_swap(&malloc, &init_malloc, &do_malloc))
    {
        uint8_t r;

        r = tx_byte_pool_create(&malloc_pool_, "Heap Memory Pool",
                (void *)heap_start,
                heap_size);
        assert(r == TX_SUCCESS);

        //Signal to any threads waiting on do_malloc that we are done
        initialized_ = true;
    }

    /*
    * Remember - two situations happen here:
    *     1) malloc initialized on the first call, and then passed through to normal malloc
    *    2) two threads raced to init. One initializes, and the other falls through to malloc
    */
    return do_malloc(size);
}

Let's walk through what init_malloc is doing step by step.

First, we calculate the heap values that we need for tx_byte_pool_create:

uintptr_t heap_start = (uintptr_t)HEAP_START;
uintptr_t heap_size = (uintptr_t)HEAP_END - heap_start;

Next we perform the compare-and-swap routine. If this is the first function to reach the compare-and-swap, we will run initialization. If not, we will skip this initialization code.

if(atomic_compare_and_swap(&malloc, &init_malloc, &do_malloc))
{
    uint8_t r;

    r = tx_byte_pool_create(&malloc_pool_, "Heap Memory Pool",
            (void *)heap_start,
            heap_size);
    assert(r == TX_SUCCESS);

    //Signal to any threads waiting on do_malloc that we are done
    initialized_ = true;
}

Whether we initialized the heap ourselves or fell through, both paths will lead to do_malloc at the end:

return do_malloc(size);

do_malloc

The changes required to do_malloc are very simple. We just need to make sure that the initialization has completed before we can proceed to memory allocation. Note the new while(!initialized) loop below:

void * do_malloc(size_t size)
{
    void * ptr = NULL;

    /**
    * Since multiple threads could be racing to malloc, 
    * if we lost the race we need to make sure the ThreadX pool 
    * has been created before we try to allocate memory, or there 
    * will be an error
    */
    while(!initialized_)
    {
        tx_thread_sleep(1);
    }

    if(size > 0)
    {
        // We simply wrap the threadX call into a standard form
        uint8_t r = tx_byte_allocate(&malloc_pool_, &ptr, size, 
                                    TX_WAIT_FOREVER);

        assert(r == TX_SUCCESS && "malloc failed");
    } //else NULL if there was an error

    return ptr;
}

Making malloc/free available to consumers

How do consumers outside of your library know about this malloc function pointer? You'll need to define a header file and declare this as an extern variable in the header:

extern volatile malloc_ptr_t malloc = &init_malloc;

This allows us to keep consumers blissfully unaware of the existence of do_malloc and init_malloc.

Free can be defined as normal in your header:

void free(void * ptr);

Putting it all together

I've added an example ThreadX malloc implementation to github.

You can build this example by running make c at the toplevel Makefile, or make malloc_tx in the C examples folder.

Note that this source file is only compiled, not linked into a usable binary. The example redefines the symbol malloc, and ThreadX objects are not provided for linking.

If you look into the C examples Makefile, you will notice this compiler flag: -fno-builtin.

We are starting to get into the territory of redefining existing symbols and overriding the default library behaviors. Without this flag, we cannot declare malloc as a function pointer because it is already defined as a function in the compiler. Our -fno-builtin flag tells the compiler that we don't want these builtin functions. Curious about what is a builtin? Check out this list for GNU compilers.

I will go into details on compiler and linker flags for overriding builtin behavior in a future article.

Further Reading