Implementing Malloc: First-fit Free List

Now that we've seen some useful C++ examples that can be applied to embedded systems, you're probably curious about getting C++ code up and running on your embedded platform.

If you try linking C++ code for your platform, you might find that you have numerous undefined symbol errors. Unfortunately, C++ runtimes are not provided for many systems. How can we utilize these features if we don't have one of the out-of-the-box supported platforms, or need a smaller bare metal implementation?

The first step on our journey is an implementation of malloc. Like C++, many C features are probably not natively supported on your platform!

Free List Allocator

Let's assume you don't have an RTOS running and don't have malloc implemented for your platform. How can you dynamically allocate memory?

The simplest allocator we can implement is a first-fit free-list. We initialize our allocator by defining an address in memory and the amount of space available. The allocator then keeps a linked list of available memory (free-list), giving out the first block of space sufficient to contain the requested size (first-fit). We can split up large blocks into smaller ones before vending memory addresses to utilize our space more effectively.

What memory do I give it?

What size and address you provide to the malloc allocator is totally platform and implementation dependent. Does your chip have 128KB of SRAM? Maybe you can only set aside 16KB for dynamic allocations. Do you have 2Gb of DDR? Set aside 32MB at the end of DRAM!

For the purposes of our example, we will simply request a block of memory from the PC's implementation of malloc and pass that into the allocator.

Pre-requisites

The only pre-requisite for the simple malloc implementation is a linked list library. While you can take the time to implement your own, there are plenty on the internet that can be utilized.

The examples here will use this definition of a doubly-linked list node:

typedef struct ll_head {
     struct ll_head *next;
     struct ll_head *prev;
} ll_t;

Implementation

Let's dig into the implementation of our free-list allocator.

Storage

First, we need to know how we're going to keep track of our free list and memory allocations. We need to allocate space for the linked list node that will attach it to the free list, and we definitely need to know the size of each block. Consider this structure:

typedef struct {
     ll_t node;
     size_t size;
     char * block;
} alloc_node_t;

However, we don't want the user to manage this information, so we will only vend the actual data pointer to the consumer. This is quite easy using pointer math (or offsetof & container_of). As long as we have the data pointer, we can look backwards in memory and get the containing structure.

For simplification, we'll define this macro:

#define ALLOC_HEADER_SZ offsetof(alloc_node_t, block)

Adding a Block of Memory

Now that we know how we're going to keep track of our data, how do we add our block of memory to the free list?

We simply take the memory we are provided, initialize a single free-list block matching our data, and add it to the free-list for later use. Easy peasy!

void malloc_addblock(void *addr, size_t size)
{
     alloc_node_t *blk;

     // align the start addr of our block to the next pointer aligned addr
     blk = (void *) align_up((uintptr_t)addr, sizeof(void*));

     // calculate actual size - mgmt overhead
     blk->size = (uintptr_t) addr + size - (uintptr_t) blk 
         - ALLOC_HEADER_SZ;

     //and now our giant block of memory is added to the list!
     list_add(&blk->node, &free_list);
}

Allocating Memory

We'll use the malloc function prototype:

void * malloc(size_t size)

When memory is requested, we need to try to find a block in our free list of acceptable size:

void * ptr = NULL;
alloc_node_t *blk = NULL;

// try to find a big enough block to alloc
list_for_each_entry(blk, &free_list, node)
{
    if(blk->size >= size)
    {
        ptr = &blk->block;
        break;
    }
}

If we found a block (ptr is not NULL), then we should try to split that block if we can. This will let us preserve as much memory for later consumers as possible.

// Can we split the block?
if((blk->size - size) >= MIN_ALLOC_SZ)
{
    alloc_node_t *new_blk;
    new_blk = (alloc_node_t *)((uintptr_t)(&blk->block) + size);
    new_blk->size = blk->size - size - ALLOC_HEADER_SZ;
    blk->size = size;
    list_add(&new_blk->node, &blk->node, blk->node.next);
}

In this example, I subtract the requested size (size) from the available size in the free block (blk->size) and check whether the resulting value is above our minimum allocation threshold (e.g. 32 bytes). If that is not true, it's simply not worth the overhead of splitting the block to keep a small number of available bytes around.

Either way, we need to remove the memory we're allocating from the free list:

list_del(&blk->node);

Now we just need to return our data pointer (ptr) and we are done!

Freeing Memory

When memory is ready to be freed, we need to convert from the vended data pointer to our container structure and add the note back to our free list.

Using the standard free prototype:

void free(void * ptr)

We use container_of to walk backwards in memory from ptr to the container struct:

alloc_node_t *blk, *free_blk;

// we take the pointer and use container_of to get the corresponding block
blk = container_of(ptr, alloc_node_t, block);

Now that we have our block, we can search the free-list and put it back at the correct location:

//Let's put it back in the proper memory order
list_for_each_entry(free_blk, &free_list, node)
{
   if(free_blk > blk)
   {
        list_add_(&blk->node, free_blk->node.prev, &free_blk->node);
        goto blockadded; //break out
   }
}

// if we didn't break, add to the end of the list
list_add_tail(&blk->node, &free_list);

Cleaning Up the Free List

We can review our free list and check to see if any memory blocks can be combined into larger blocks. This helps us fight against memory fragmentation in a simple way - we always try to have the largest contiguous free blocks possible.

void defragment_free_list(void)
{
    alloc_node_t *b, *lb = NULL, *t;

    list_for_each_entry(b, t, &free_list, node)
    {
        if(lb)
        {
            if((((uintptr_t)&lb->block) + lb->size) == (uintptr_t)b)
            {
                lb->size += sizeof(*b) + b->size;
                list_del(&b->node);
                continue;
            }
        }
        lb = b;
    }
}

The sample code provided does this on every iteration of free(). We add this to the blockadded label mentioned in free().

blockadded:
    // Let's see if we can combine any memory
    defrag_free_list();

Putting It All Together

I've added a C examples folder to the embedded-resources git repository.

In that folder, you can find a linked list example, a malloc free-list example, as well as some test code showing the API usage.

I have also added a Makefile at the top level of the repository and at the C level. Simply run make c at the toplevel or make freelist if you are in the C examples folder.

Further Reading