Memory

foonathan/memory: Simplifying the C++ Memory Allocator

The memory library is developed by Jonathan Müller, a C++ library developer and author of foonathan::blog(). This library provides an new STL-compatible C++ memory allocator called RawAllocator. The RawAllocator is similar to the standard Allocator but is easier to use. The library also provides a BlockAllocator type which can be used for allocating large blocks of memory.

The project includes a variety of implementations, adapters, wrappers, and storage classes, including:

  • new allocator
  • heap allocator
  • malloc allocator
  • memory pools
  • static allocator
  • virtual memory allocator
  • make_unique and make_shared replacements which allocate memory using a RawAllocator

We are excited about using this library in our next embedded project and gaining increased control over memory allocations.

Further Reading

For more on the memory library:

Implementing Malloc With FreeRTOS

In the past I've shared malloc implementations which are built using a first-fit free list and ThreadX. Today I'd like to share a malloc implementation based on another popular RTOS: FreeRTOS.

FreeRTOS Memory Allocation

FreeRTOS has two dynamic memory allocation functions that we will utilize:

  • pvPortMalloc
  • vPortFree

FreeRTOS supports multiple heap allocation schemes. We'll implement malloc and free in a way that will apply to schemes 1, 2, 4, and 5. Since scheme 5 allows the heap to span multiple sections of memory, we'll also look at an implementation that allows for us to initialize the heap with multiple regions of memory.

A Simple FreeRTOS malloc

Creating a malloc function that works with the built-in FreeRTOS heap is quite simple. We just wrap the pvPortMalloc call:

void* malloc(size_t size)
{
    void* ptr = NULL;

    if(size > 0)
    {
        // We simply wrap the FreeRTOS call into a standard form
        ptr = pvPortMalloc(size);
    } // else NULL if there was an error

    return ptr;
}

free

The free implementation is equally as simple as malloc:

void free(void* ptr)
{
    if(ptr)
    {
        // We simply wrap the FreeRTOS call into a standard form
        vPortFree(ptr);
    }
}

Remember that free cannot be called if you are using the heap 1 allocation scheme.

Heap Initialization with heap_5

Before we dive into the heap 5 malloc implementation, let's review the heap initialization requirements for heap_5.c.

The portable.h header defines the following type, which is used in heap_5.c:

typedef struct HeapRegion
{
    /* Start address of a block of memory that will be part of the heap.*/
    uint8_t *pucStartAddress;

    /* Size of the block of memory. */
    size_t xSizeInBytes;
} HeapRegion_t;

We can create an array of these structures to define our different memory blocks that can be assigned to the heap:

const HeapRegion_t xHeapRegions[] =
{
    { ( uint8_t * ) 0x80000000UL, 0x10000 },
    { ( uint8_t * ) 0x90000000UL, 0xa0000 },
    { NULL, 0 } /* Terminates the array. */
};

The array of heap blocks must meet two requirements:

  1. The memory regions must appear in address order (from low addresses to high addresses)
  2. The array must be NULL terminated

Once we've defined our heap map, we pass the array into vPortDefineHeapRegions().

Note that FreeRTOS memory allocations will fail prior to passing in heap information to vPortDefineHeapRegions(), so we must make sure that we initialize our heap before we allow malloc() calls to succeed.

Heap 5 Groundwork

In order to encapsulate the functionality of malloc_addblock, we want to manage an internal HeapRegion_t array to track heap memory blocks. We'll also need a few state variables to help us manage the initialization process.

First, we'll create a max limit for heap region entries. We'll use an ifndef directive so applications can redefine FREERTOS_HEAP_REGION_CNT if more entries are needed. Otherwise, we provide a reasonably small default value:

#ifndef FREERTOS_HEAP_REGION_CNT
#define FREERTOS_HEAP_REGION_CNT 2
#endif

Since the HeapRegion_t table needs to be NULL terminated, we allocate an extra entry to ensure there is always space for NULL termination:

static HeapRegion_t heap_regions[FREERTOS_HEAP_REGION_CNT + 1] = {0};

We'll create a const variable representing our maximum number of entries for better type safety:

/// Maximum number of heap regions that can be specified
static const uint8_t heap_region_max = FREERTOS_HEAP_REGION_CNT;

As well as a variable which will track the current number of entries:

/// Current number of allocated heap regions
static volatile uint8_t heap_region_cnt = 0;

Since we aren't allowed to allocate memory until the heap has been initialized, we'll create a boolean variable to check the initialize status:

volatile static bool initialized_ = false;

Adding Memory to the Heap

Since we want to support the ability to use multiple memory blocks with the heap, we need to split up the malloc_addblock and malloc_init functions. This simplified malloc_addblock implementation simply adds a new entry to the heap region table.

Instead of sorting when we add a block, we'll sort prior to initialization: we don't need to re-sort on every insertion.

void malloc_addblock(void* addr, size_t size)
{
    assert(addr && (size > 0));
    assert((heap_region_cnt < heap_region_max) && "Too many heap regions!");

    // Increment the count early to claim a spot in case of multi-threads
    uint8_t cnt = heap_region_cnt++;

    if(cnt < heap_region_max)
    {
        // We have space - add it to the table
        heap_regions[cnt].pucStartAddress = (uint8_t *) addr;
        heap_regions[cnt].xSizeInBytes = size;
    }
    else
    {
        // Decrement the count if we don't have space
        heap_region_cnt--;
    }
}

This function can be called up to FREERTOS_HEAP_REGION_CNT times. Since we statically allocated the table, we don't want extra calls stomping on memory.

Initializing malloc

Once we've added some memory blocks, we can initialize the heap by calling vPortDefineHeapRegions. Prior to calling that function we need to sort the heap regions to make sure they're in order.

Once we have initialized the heap, we set initialized_ to true and allow future malloc() calls to succeed.

void malloc_init()
{
    assert((heap_region_cnt > 0) && !initialized_);

    if(heap_region_cnt > 0 && !initialized_)
    {
        // Sort the heap regions so addresses are in the correct order
        qsort(heap_regions, heap_region_cnt, sizeof(HeapRegion_t), cmp_heap);

        // Pass the array into vPortDefineHeapRegions() to enable malloc()
        vPortDefineHeapRegions(heap_regions);

        // Signal to any waiting threads that we are done initializing
        initialized_ = true;
    }
}

This comparison function will allow us to access and compare the start address for each entry in our heap table:

static int cmp_heap(const void* a, const void* b)
{
    const HeapRegion_t *ua = a, *ub = b;

    return ((ua->pucStartAddress < ub->pucStartAddress) ? -1 :
        ((ua->pucStartAddress != ub->pucStartAddress)));
}

Heap 5 malloc and free Changes

Once the setup code is completed, our heap 5 malloc() is quite simple to implement. Primarily we are just wrapping the pvPortMalloc call, but we also want to make sure that we block all malloc calls until initialization is complete.

void* malloc(size_t size)
{
    void* ptr = NULL;

    while(!initialized_)
    {
        // Thread blocks until application malloc has been correctly initialized
        vTaskDelay(1);
    }

    if(size > 0)
    {
        // We simply wrap the FreeRTOS call into a standard form
        ptr = pvPortMalloc(size);
    } // else NULL if there was an error

    return ptr;
}

We don't really need to modify free, but we can still add a logical check to make sure that nobody is calling free before initialization occurs:

void free(void* ptr)
{
    /// free should NEVER be called before malloc is init'd
    assert(initialized_);

    if(ptr)
    {
        // We simply wrap the FreeRTOS call into a standard form
        vPortFree(ptr);
    }
}

Putting it All Together

If you're using heap 5, you want to call malloc_addblock and malloc_init as early in the startup process as possible. Remember: malloc_init() must be called before any dynamic memory allocations will work.

malloc_addblock(0x80000000, 0x10000);
malloc_addblock(0xf00000000, 0xa000);
malloc_init();

// Now malloc/pvPortMalloc will work!

If you need more memory blocks, you can always supply your own the definition for FREERTOS_HEAP_REGION_CNT.

You can find the source for this FreeRTOS malloc implementation on GitHub.

Further Reading

C++ Smart Pointers with Aligned Malloc/Free

Recall in the aligned_malloc article that we noted the need to pair aligned_malloc with aligned_free. Using the wrong free call can cause serious problems, as we have modified the pointer that malloc originally returned to us.

This problem of pairing allocators and deleters also applies in other situations: new must be paired with delete, while new[] must be paired with delete[].

But what if I free the memory at the later stage and don't realize it's an aligned pointer (or was allocated with new[])? What if I just have a brain fart? How can we protect against using the incorrect free or delete call?

In my earlier article about C++11 smart pointers, I highlighted that we can specify a deleter when declaring our smart pointers. When the pointer goes out of scope or is reset, the correct deleter is automatically called. Let's use this to protect ourselves from introducing unnecessary errors.

std::unique_ptr

If you are using a deleter with std::unique_ptr, you must specify the prototype for your deleter function in the pointer type.

std::unique_ptr<uint8_t, decltype(&aligned_free)>

This can be quite tedious to type, so instead I recommend creating an alias. We can define a unique_ptr_aligned type that includes our aligned_free prototype while leaving the type as templated value:

template<class T> using unique_ptr_aligned = std::unique_ptr<T, 
    decltype(&aligned_free)>;

The primary benefit from the alias is that you can use it in multiple locations and function prototypes without the pesky decltype(&aligned_free) being typed everywhere.

Now that we have our alias, we can use it to create a new aligned pointer. Note that you need to use aligned_malloc rather than new to generate the memory to pass into the constructor. We also must still specify the specific deleter function we need to call.

unique_ptr_aligned<uint8_t[]> p(
    static_cast<uint8_t*>(aligned_malloc(32, 1024)), 
    &aligned_free);

That's still quite a bit of typing. Wouldn't it be better if I could just type the following?

auto p = aligned_uptr<uint8_t>(32, 100);

Luckily, with templates, we can:

template<class T>
unique_ptr_aligned<T> aligned_uptr(size_t align, size_t size)
{
    return unique_ptr_aligned<T>(
        static_cast<T*>(aligned_malloc(align, size)), 
        &aligned_free);
}

We have just hidden the details within our templated function. We can make sure all aligned_uptr calls pass through aligned_malloc and specify aligned_free as the detail, leaving us to simply worry about the type, the alignment, and the memory allocation size.

std::shared_ptr

std::shared_ptr is an easier case to handle than std::unique_ptr. While std::unique_ptr requires the deleter to be part of the pointer type, std::shared_ptr does not. You simply need to include the deleter in the constructor call for your std::shared_ptr:

std::shared_ptr<uint8_t> p(
     static_cast<uint8_t*>(aligned_malloc(32, 128)), 
     &aligned_free);

Like with the std::unique_ptr example, we can shorten this typing significantly with a templated function:

template<class T>
std::shared_ptr<T> aligned_sptr(size_t align, size_t size)
{
    return std::shared_ptr<T>(
        static_cast<T*>(aligned_malloc(align, size)), 
        &aligned_free);
}

Resulting in this much simpler declaration:

auto p = aligned_sptr<uint8_t>(64, 100);

Putting it all together

I have uploaded a "smart pointer aligned" example to the embedded-resources github repo. To build the example in examples/cpp, simply run:

make aligned_ptr

I also added memory.h to examples/c, where the aligned_malloc and aligned_free prototypes live.

The smart pointer example uses malloc_aligned.c, so I have moved the COMPILE_AS_EXAMPLE definition to the C examples Makefile. This change allows the C++ examples to use malloc_aligned.c as a library by removing the main function.

Further Reading