Bringup

Implementing Malloc With FreeRTOS

In the past I've shared malloc implementations which are built using a first-fit free list and ThreadX. Today I'd like to share a malloc implementation based on another popular RTOS: FreeRTOS.

FreeRTOS Memory Allocation

FreeRTOS has two dynamic memory allocation functions that we will utilize:

  • pvPortMalloc
  • vPortFree

FreeRTOS supports multiple heap allocation schemes. We'll implement malloc and free in a way that will apply to schemes 1, 2, 4, and 5. Since scheme 5 allows the heap to span multiple sections of memory, we'll also look at an implementation that allows for us to initialize the heap with multiple regions of memory.

A Simple FreeRTOS malloc

Creating a malloc function that works with the built-in FreeRTOS heap is quite simple. We just wrap the pvPortMalloc call:

void* malloc(size_t size)
{
    void* ptr = NULL;

    if(size > 0)
    {
        // We simply wrap the FreeRTOS call into a standard form
        ptr = pvPortMalloc(size);
    } // else NULL if there was an error

    return ptr;
}

free

The free implementation is equally as simple as malloc:

void free(void* ptr)
{
    if(ptr)
    {
        // We simply wrap the FreeRTOS call into a standard form
        vPortFree(ptr);
    }
}

Remember that free cannot be called if you are using the heap 1 allocation scheme.

Heap Initialization with heap_5

Before we dive into the heap 5 malloc implementation, let's review the heap initialization requirements for heap_5.c.

The portable.h header defines the following type, which is used in heap_5.c:

typedef struct HeapRegion
{
    /* Start address of a block of memory that will be part of the heap.*/
    uint8_t *pucStartAddress;

    /* Size of the block of memory. */
    size_t xSizeInBytes;
} HeapRegion_t;

We can create an array of these structures to define our different memory blocks that can be assigned to the heap:

const HeapRegion_t xHeapRegions[] =
{
    { ( uint8_t * ) 0x80000000UL, 0x10000 },
    { ( uint8_t * ) 0x90000000UL, 0xa0000 },
    { NULL, 0 } /* Terminates the array. */
};

The array of heap blocks must meet two requirements:

  1. The memory regions must appear in address order (from low addresses to high addresses)
  2. The array must be NULL terminated

Once we've defined our heap map, we pass the array into vPortDefineHeapRegions().

Note that FreeRTOS memory allocations will fail prior to passing in heap information to vPortDefineHeapRegions(), so we must make sure that we initialize our heap before we allow malloc() calls to succeed.

Heap 5 Groundwork

In order to encapsulate the functionality of malloc_addblock, we want to manage an internal HeapRegion_t array to track heap memory blocks. We'll also need a few state variables to help us manage the initialization process.

First, we'll create a max limit for heap region entries. We'll use an ifndef directive so applications can redefine FREERTOS_HEAP_REGION_CNT if more entries are needed. Otherwise, we provide a reasonably small default value:

#ifndef FREERTOS_HEAP_REGION_CNT
#define FREERTOS_HEAP_REGION_CNT 2
#endif

Since the HeapRegion_t table needs to be NULL terminated, we allocate an extra entry to ensure there is always space for NULL termination:

static HeapRegion_t heap_regions[FREERTOS_HEAP_REGION_CNT + 1] = {0};

We'll create a const variable representing our maximum number of entries for better type safety:

/// Maximum number of heap regions that can be specified
static const uint8_t heap_region_max = FREERTOS_HEAP_REGION_CNT;

As well as a variable which will track the current number of entries:

/// Current number of allocated heap regions
static volatile uint8_t heap_region_cnt = 0;

Since we aren't allowed to allocate memory until the heap has been initialized, we'll create a boolean variable to check the initialize status:

volatile static bool initialized_ = false;

Adding Memory to the Heap

Since we want to support the ability to use multiple memory blocks with the heap, we need to split up the malloc_addblock and malloc_init functions. This simplified malloc_addblock implementation simply adds a new entry to the heap region table.

Instead of sorting when we add a block, we'll sort prior to initialization: we don't need to re-sort on every insertion.

void malloc_addblock(void* addr, size_t size)
{
    assert(addr && (size > 0));
    assert((heap_region_cnt < heap_region_max) && "Too many heap regions!");

    // Increment the count early to claim a spot in case of multi-threads
    uint8_t cnt = heap_region_cnt++;

    if(cnt < heap_region_max)
    {
        // We have space - add it to the table
        heap_regions[cnt].pucStartAddress = (uint8_t *) addr;
        heap_regions[cnt].xSizeInBytes = size;
    }
    else
    {
        // Decrement the count if we don't have space
        heap_region_cnt--;
    }
}

This function can be called up to FREERTOS_HEAP_REGION_CNT times. Since we statically allocated the table, we don't want extra calls stomping on memory.

Initializing malloc

Once we've added some memory blocks, we can initialize the heap by calling vPortDefineHeapRegions. Prior to calling that function we need to sort the heap regions to make sure they're in order.

Once we have initialized the heap, we set initialized_ to true and allow future malloc() calls to succeed.

void malloc_init()
{
    assert((heap_region_cnt > 0) && !initialized_);

    if(heap_region_cnt > 0 && !initialized_)
    {
        // Sort the heap regions so addresses are in the correct order
        qsort(heap_regions, heap_region_cnt, sizeof(HeapRegion_t), cmp_heap);

        // Pass the array into vPortDefineHeapRegions() to enable malloc()
        vPortDefineHeapRegions(heap_regions);

        // Signal to any waiting threads that we are done initializing
        initialized_ = true;
    }
}

This comparison function will allow us to access and compare the start address for each entry in our heap table:

static int cmp_heap(const void* a, const void* b)
{
    const HeapRegion_t *ua = a, *ub = b;

    return ((ua->pucStartAddress < ub->pucStartAddress) ? -1 :
        ((ua->pucStartAddress != ub->pucStartAddress)));
}

Heap 5 malloc and free Changes

Once the setup code is completed, our heap 5 malloc() is quite simple to implement. Primarily we are just wrapping the pvPortMalloc call, but we also want to make sure that we block all malloc calls until initialization is complete.

void* malloc(size_t size)
{
    void* ptr = NULL;

    while(!initialized_)
    {
        // Thread blocks until application malloc has been correctly initialized
        vTaskDelay(1);
    }

    if(size > 0)
    {
        // We simply wrap the FreeRTOS call into a standard form
        ptr = pvPortMalloc(size);
    } // else NULL if there was an error

    return ptr;
}

We don't really need to modify free, but we can still add a logical check to make sure that nobody is calling free before initialization occurs:

void free(void* ptr)
{
    /// free should NEVER be called before malloc is init'd
    assert(initialized_);

    if(ptr)
    {
        // We simply wrap the FreeRTOS call into a standard form
        vPortFree(ptr);
    }
}

Putting it All Together

If you're using heap 5, you want to call malloc_addblock and malloc_init as early in the startup process as possible. Remember: malloc_init() must be called before any dynamic memory allocations will work.

malloc_addblock(0x80000000, 0x10000);
malloc_addblock(0xf00000000, 0xa000);
malloc_init();

// Now malloc/pvPortMalloc will work!

If you need more memory blocks, you can always supply your own the definition for FREERTOS_HEAP_REGION_CNT.

You can find the source for this FreeRTOS malloc implementation on GitHub.

Further Reading

Implementing libc: stdlib, pt. 2

Earlier this year I was focusing on bringing up an embedded framework. I've been distracted with client work, but am finally resuming framework development. The first stage in that journey is finishing off the Embedded Artistry libc so I can begin work on my memory allocation library and the C++ libraries.

Early warning: If you're unfamiliar with the the standard library functions, I recommend reading this "Guide to C stdlib functions". I will not be describing what each function does.

This article continues from the first stdlib bringup article and cites the primary source and any modifications made for these stdlib functions:

  • atof
  • strtof
  • heapsort
  • qsort
  • qsort_r (reentrant)
  • imaxabs
  • imaxdiv
  • rand
  • rand_r (reentrant)
  • srand

I also implemented the following "support" functions:

  • fls
  • flsl
  • flsll

I have decided to leave strtold (string-to-long-double) unimplemented at this time. The odds that we will need to use it on embedded systems seems exceedingly low, and I will tackle it whenever the time comes. The following functions will be covered in a future article:

  • abort
  • atexit
  • at_quick_exit
  • exit
  • _Exit
  • quick_exit

Header Changes

A new strings.h header has been added due to the fls function family, and stdlib.h has been updated to reflect the new functions.

stdlib.h Updates

The stdlib.h header has been updated to reflect the new features.

The header now include stdint.h due to the imax* function prototypes:

#include <stdint.h>

The imaxdiv_t type was defined to support the imax* functions:

typedef struct
{
    intmax_t quot;
    intmax_t rem;
} imaxdiv_t;

The following function prototypes were added:

float strtof(const char* __restrict, char** __restrict);
intmax_t imaxabs(intmax_t j);
imaxdiv_t imaxdiv(intmax_t numer, intmax_t denom);
int rand_r(unsigned int* ctx);
int rand(void);
void srand(unsigned seed);
int heapsort(void* vbase, size_t nmemb, size_t size, int (*compar)(const void*, const void*));
void qsort_r(void* a, size_t n, size_t es, void* thunk,
             int (*cmp)(void*, const void*, const void*));
void qsort(void* a, size_t n, size_t es, int (*compar)(const void*, const void*));

strings.h

The POSIX header strings.h has been added in a very basic way. The fls family of functions is defined here.

My strings.h implementation:

#ifndef __STRINGS_H_
#define __STRINGS_H_

#include <stddef.h>
#include <stdint.h>

#ifdef __cplusplus
extern "C" {
#endif //__cplusplus

int fls(int mask);
int flsl(long mask);
int flsll(long long mask);

#ifdef __cplusplus
}
#endif //__cplusplus

#endif // __STRINGS_H_

atof

I imported the Apple Open Source atof, which simply serves as a strtod wrapper:

#include <stdlib.h>

double atof(const char* ascii)
{
    return (strtod(ascii, NULL));
}

strtof

I imported the old Sun Microsystems implementation of strtof rather than the more complicated gtoa library.

Since the function is long, I will refer you to the source code. I have converted to the standard type names (TRUE becomes true, CONST becomes const) and removed dead code. The logic is still the same.

imaxabs

I've imported the FreeBSD implementation of imaxabs:

#include <stdlib.h>

intmax_t imaxabs(intmax_t j)
{
    return (j < 0 ? -j : j);
}

imaxdiv

I've imported the FreeBSD implementation of imaxdiv:

#include <stdlib.h>

/* See comments in div.c for implementation details. */
imaxdiv_t imaxdiv(intmax_t numer, intmax_t denom)
{
    imaxdiv_t retval;

    retval.quot = numer / denom;
    retval.rem = numer % denom;
    if(numer >= 0 && retval.rem < 0)
    {
        retval.quot++;
        retval.rem -= denom;
    }
    return (retval);
}

rand and company

I used the Apple Open Source rand.c file as a reference for my basic rand implementation.

Since we're targeting an embedded system, I've cleaned up the code to remove the tests and the random device references. I also converted the rand, srand, and rand_r implementations to be weakly linked, so that we can override these functions on our target architectures.

The resulting rand.c implementation:

#include <stdlib.h>

static unsigned long next = 1;
static int do_rand(unsigned long* ctx)
{
#ifdef USE_WEAK_SEEDING
    /*
     * Historic implementation compatibility.
     * The random sequences do not vary much with the seed,
     * even with overflowing.
     */
    return ((*ctx = *ctx * 1103515245 + 12345) % ((unsigned long)RAND_MAX + 1));
#else /* !USE_WEAK_SEEDING */
    /*
     * Compute x = (7^5 * x) mod (2^31 - 1)
     * without overflowing 31 bits:
     *      (2^31 - 1) = 127773 * (7^5) + 2836
     * From "Random number generators: good ones are hard to find",
     * Park and Miller, Communications of the ACM, vol. 31, no. 10,
     * October 1988, p. 1195.
     */
    long hi, lo, x;

    /* Can't be initialized with 0, so use another value. */
    if(*ctx == 0)
        *ctx = 123459876;
    hi = *ctx / 127773;
    lo = *ctx % 127773;
    x = 16807 * lo - 2836 * hi;
    if(x < 0)
        x += 0x7fffffff;
    return ((*ctx = x) % ((unsigned long)RAND_MAX + 1));
#endif /* !USE_WEAK_SEEDING */
}

__attribute__((weak)) int rand_r(unsigned int* ctx)
{
    unsigned long val = (unsigned long)*ctx;
    int r = do_rand(&val);

    *ctx = (unsigned int)val;
    return (r);
}

__attribute__((weak)) int rand()
{
    return (do_rand(&next));
}

__attribute__((weak)) void srand(unsigned seed)
{
    next = seed;
}

fls Family

The fls function family was imported from Apple's Open Source libplatform. I separated the functions into three files rather than Apple's single source file.

Each implementation will prefer the builtin function version if it exists.

fls

#include <strings.h>

/*
 * Find Last Set bit
 */
int fls(int mask)
{
#if __has_builtin(__builtin_fls)
    return __builtin_fls(mask);
#elif __has_builtin(__builtin_clz)
    if(mask == 0)
        return (0);

    return (sizeof(mask) << 3) - __builtin_clz(mask);
#else
    int bit;

    if(mask == 0)
        return (0);
    for(bit = 1; mask != 1; bit++)
        mask = (unsigned)mask >> 1;
    return (bit);
#endif
}

flsl

#include <strings.h>

/*
 * Find Last Set bit
 */
int flsl(long mask)
{
#if __has_builtin(__builtin_flsl)
    return __builtin_flsl(mask);
#elif __has_builtin(__builtin_clzl)
    if(mask == 0)
        return (0);

    return (sizeof(mask) << 3) - __builtin_clzl(mask);
#else
    int bit;

    if(mask == 0)
        return (0);
    for(bit = 1; mask != 1; bit++)
        mask = (unsigned long)mask >> 1;
    return (bit);
#endif
}

flsll

#include <strings.h>

/*
 * Find Last Set bit
 */
int flsll(long long mask)
{
#if __has_builtin(__builtin_flsll)
    return __builtin_flsll(mask);
#elif __has_builtin(__builtin_clzll)
    if(mask == 0)
        return (0);

    return (sizeof(mask) << 3) - __builtin_clzll(mask);
#else
    int bit;

    if(mask == 0)
        return (0);
    for(bit = 1; mask != 1; bit++)
        mask = (unsigned long long)mask >> 1;
    return (bit);
#endif
}

Sorting Functions

heapsort

I imported the Apple Open Source heapsort implementation. Since the file is quite large, I will refer you directly to the source code.

qsort

I imported the Apple Open Source qsort implementation. I have kept most of the logic in place, but have removed all I_AM_QSORT_B definitions from the file. I left I_AM_QSORT_R defined so that I could implement the qsort_r function in a simple way.

Since the file is quite large, I will refer you directly to the source code.

qsort_r

I have kept the qsort_r.c strategy used in the Apple Open Source implementation:

/*
 * This file is in the public domain.  Originally written by Garrett
 * A. Wollman.
 *
 * $FreeBSD: src/lib/libc/stdlib/qsort_r.c,v 1.1 2002/09/10 02:04:49 wollman Exp $
 */
#define I_AM_QSORT_R
#include "qsort.c"

Putting it All Together

I want to showcase where I find open source function implementations and how I modify them to fit my purposes. I don't believe in re-inventing the wheel, so I am trying to reuse as much code as possible. In many cases in my career, I've seen these functions re-implemented rather than referencing quality implementations that are available and widely used.

The Embedded Artistry libc implementation is nearing completion. The current suite of functions is more than capable of meeting the needs of embedded systems developers. In future articles we will be covering the exit/abort functionality and unit testing support.

Further Reading

libc: stdlib, pt. 1

The C standard library is a widely used resource, as it provides the base functionality for many more advanced features. The C standard library provides functionality like:

  • string-to-number conversion
  • math features like division and absolute value
  • memory allocation (malloc, calloc, realloc)
  • random number generation
  • abort/exit

However, the usefulness of the standard library doesn't stop at C. Many standard library features are also needed to bringup libc++, which we will be tackling in the future.

If you're unfamiliar with the the standard library functions, I recommend reading this "Guide to C stdlib functions". I will not be describing what each function does.

Note: For the remainder of the article, I will be referring to the standard library as stdlib.

Implemented Functions

I implemented the following functions in my first-pass of stdlib. These provide essential building blocks for our systems.

  • abs
  • atoi
  • atol
  • atoll
  • bsearch
  • calloc
  • div
  • free
  • labs
  • ldiv
  • llabs
  • lldiv
  • malloc
  • rand (host-only, non-portable)
  • realloc
  • reallocf
  • srand (host-only, non-portable)
  • strtod
  • strtol
  • strtoll
  • strtoul
  • strtoull

Functions Pending Implementation

The functions listed below currently remain unimplemented and will be tackled in part 2. Note that even with the pending functions stdlib will not be completed. I am only targeting a reduced subset of stdlib that is useful for embedded systems.

  • Floating point functions
    • atof
    • strtof
    • strtold
  • imax* functions
  • Exit/Abort functionality
    • abort
    • atexit
    • at_quick_exit
    • exit
    • _Exit
    • quick_exit
  • qsort

Custom Function Implementations

Many of the stdlib functions are directly ported from Apple open source projects. There are a small subset of functions that I did not pilfer:

  • atoi
  • atol
  • atoll
  • realloc
  • reallocf
  • calloc

Let's see what each of these implementations looks like.

atoi

atoi is a relatively simple function. This version assumes all numbers are in base 10, and builds upon the ctype functions we implemented a few weeks ago.

int atoi(const char *str)
{
    bool neg = false;
    int val = 0;

    switch(*str)
    {
        case '-':
            neg = true;
            //intentional fallthrough to advance str
        case '+':
            str++;
    }

    while (isdigit(*str)) {
        val = (10 * val) + (*str++ - '0');
    }

    return (neg ? -val : val);
}

atol

atol follows the same pattern as atoi, but uses different values to account for the different types.

long atol(const char *str)
{
    long val = 0;
    bool neg = false;

    while(isspace(*str))
    {
        str++;
    }

    switch (*str) {
        case '-':
            neg = true;
        //intentional fallthrough
        case '+':
            str++;
    }

    while (isdigit(*str))
    {
        val = (10 * val) + (*str++ - '0');
    }

    return neg ? -val : val;
}

atoll

atoll is the last in the set, and like its predecessors operates in the same way. C++ function overloading and templating sure starts to look nice, as we could skip these duplicate implementations.

long long atoll(const char *str)
{
    long long val=0;
    bool neg = false;

    while(isspace(*str))
    {
        str++;
    }

    switch (*str) {
        case '-':
            neg = true;
            //Intentional fallthrough
        case '+':
            str++;
    }

    while (isdigit(*str))
    {
        val = (10 * val) + (*str++ - '0');
    }
    return neg ? -val : val;
}

calloc

/*
 * This is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
 * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW
 */
#define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))

void *calloc(size_t num, size_t size)
{
    /* num * size unsigned integer wrapping check */
    if ((num >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
        num > 0 && SIZE_MAX / num < size)
    {
        return NULL;
    }

    size_t total_size = num * size;
    void *ptr = malloc(total_size);

    if (ptr)
    {
        memset(ptr, 0, total_size);
    }

    return ptr;
}

realloc

I implemented my own realloc because I have an opinion on how I want to handle the realloc(ptr,0) special case: I want to return NULL. The original data will not be freed.

void *realloc(void *ptr, size_t size)
{
    void * new_data = NULL;

    if(size)
    {
        if (!ptr)
        {
            return malloc(size);
        }

        new_data = malloc(size);
        if (new_data)
        {
            memcpy(new_data, ptr, size); //TODO: unsafe copy...
            free(ptr); //we always move the data. free.
        }
    }

    return new_data;
}

reallocf

reallocf is a FreeBSD extension to realloc that I like: if realloc fails, the original data pointer is freed. This helps account for option #2 when calling realloc(ptr,0), since realloc uses option #1.

Implementation is simple:

void *reallocf(void *ptr, size_t size)
{
    void *p = realloc(ptr, size);

    if ((p == NULL) && (ptr != NULL))
    {
        free(ptr);
    }

    return p;
}

Note that I do not hand the BSD case of realloc(ptr,0) freeing the original data pointer.

Ported Functions

I am a staunch believer in not re-inventing the wheel unless absolutely necessary. Many of the stdlib function implementations were ported from various Apple Open Source libraries. I've provided links to each function's original source and noted any modifications made.

Apple Open Source Libc:

I removed errno support from strtoull, as I have not ported the errno functionality.

Apple Open Source XNU

Apple Open Source TCL

I removed errno support, as I have not ported the errno functionality.

Apple Open Source Lukemftp

I removed errno support, as I have not ported the errno functionality.

Current stdlib.h Header

Here's the current stdlib.h header contents, based on the first pass functions:

#ifndef __STDLIB_H_
#define __STDLIB_H_

#include <stddef.h>

#ifdef __cplusplus
extern "C" {
#endif //__cplusplus

typedef struct {
    int quot;
    int rem;
} div_t;

typedef struct {
    long quot;
    long rem;
} ldiv_t;

typedef struct {
    long long quot;
    long long rem;
} lldiv_t;

#define EXIT_FAILURE 1
#define EXIT_SUCCESS 0
#define RAND_MAX (0x7fffffff)

int atoi (const char *);
long atol (const char *);
long long atoll (const char *);
double atof (const char *);

double strtod (const char *__restrict, char **__restrict);
long strtol (const char *__restrict, char **__restrict, int);
unsigned long strtoul (const char *__restrict, char **__restrict, int);
long long strtoll (const char *__restrict, char **__restrict, int);
unsigned long long strtoull (const char *__restrict, char **__restrict, int);

int abs (int);
long labs (long);
long long llabs (long long);

div_t div (int, int);
ldiv_t ldiv (long, long);
lldiv_t lldiv (long long, long long);

int rand (void);
void srand (unsigned);

void *bsearch (const void *, const void *, size_t, size_t, int (*)(const void *, const void *));

void * malloc(size_t size);
void free(void * ptr);
void * calloc(size_t num, size_t size);
void *realloc(void *ptr, size_t size);
void *reallocf(void *ptr, size_t size);

#ifdef __cplusplus
}
#endif //__cplusplus

#endif // __STDLIB_H_

Putting it All Together

You can find the current stdlib implementation in the embedded-resources Github repository.

Further Reading: