Generating Aligned Memory

Embedded systems often have requirements for pointer alignment. These alignment requirements exist in many places, some including:

  • General device/CPU requirements
    • Unaligned access may generate a processor exception
  • Cache line size
    • You don't want to accidentally start performing clean/invalidate operations on random data
  • DMA transfer requirements
  • Optimized pointer / memory access handling
    • Unaligned addresses require multiple read instructions
    • aligned addresses require a single read instruction

How to Align Memory?

Our needs to align memory extend to both static and dynamic memory allocations. Let's look at how to handle both cases.

Compiler Alignment Attribute

For static & stack allocations, we can use the GNU defined alignment attribute.

This attribute will force the compiler to allocate the variable with at least the requested alignment (e.g. you could request 8-byte alignment and get 32-byte alignment).

Example usage of the alignment attribute from the GNU documentation:

struct S { short f[3]; } __attribute__ ((aligned (8)));
typedef int more_aligned_int __attribute__ ((aligned (8)));

Dynamic Memory Alignment

When we call malloc, we are not guarunteed to have our pointer returned with any particular alignment. So what can we use to get dynamically allocated memory to be aligned?

A common POSIX API that you may be familiar with is memalign. memalign provides exactly what we need:

void *memalign(size_t alignment, size_t size);

Let's see how to implement the equivalent support for our system.

Dynamic Memory Alignment: Overall Approach

Since we have already implemented malloc on our system (or have malloc defined on a host), we can use malloc as our base memory allocator.

Since malloc is not guaranteed to align our memory for us, we'll need to perform two extra steps:

  1. Request extra bytes so we can returned an aligned address
  2. Request extra bytes and store the offset between our original pointer and our aligned pointer

By allocating these extra bytes, we are making a tradeoff between generating aligned memory and wasting some bytes to ensure the alignment requirement can be met.

Now that we have our high-level strategy, let's prototype the calls for our aligned malloc implementation. Mirroring memalign, we will have:

void * aligned_malloc(size_t align, size_t size);
void aligned_free(void * ptr);

//Convenience macro for memalign, the POSIX API
#define memalign(align, size) aligned_malloc(align, size)

Why do we require a separate free API for our aligned allocations? We are going to be storing an offset and returning an address that is different from the address returned by malloc. Before we can call free on that memory, we have to translate from our aligned pointer to the original pointer returned by malloc.

Now that we know what our APIs look like, what definitions do we need to manage our storage overhead?

//Number of bytes we're using for storing the aligned pointer offset
typedef uint16_t offset_t;
#define PTR_OFFSET_SZ sizeof(offset_t)

I've defined the offset_t to be a uint16_t. This supports up to 64KB alignment, a size which is already unlikely to be used for alignment.

I've also generated a convenience macro for the offset size. You could skip this macro and just use sizeof(offset_t) if you prefer.

Finally, we need some way to align our memory. I use this align_up definition:

#ifndef align_up
#define align_up(num, align) \
    (((num) + ((align) - 1)) & ~((align) - 1))
#endif

Note that this operates on powers of two, so we will have to limit our alignment values to powers of two.

aligned_malloc

Let's start with aligned_malloc. Recall the prototype:

void * aligned_malloc(size_t align, size_t size)

Thinking about our basic function skeleton: we need to ensure align and size are non-zero values before we try to allocate any memory. We also need to check that our alignment request is a power of two, due to our align_up macro.

These requirements result in the following skeleton:

void * ptr = NULL;

//We want it to be a power of two since align_up operates on powers of two
assert((align & (align - 1)) == 0);

if(align && size)
{
    //...
}

return ptr;

Now that we have protections in place, let's work on our actual aligned memory allocation. We know we need to allocate extra bytes, but what do we actually allocate?

Consider the following:

  • I call malloc and get a memory address X.
  • I know I need to store a pointer offset value Y, which is fixed in size.
  • Our alignment Z is variable.
  • To handle this in a generic way, I always need to store alignment offset.
    • This is true even if the pointer is aligned
  • When I allocate memory, X+Y (address + offset size) has the possibility to already be aligned, but it may also be unaligned
    • If X+Y is aligned, we would need no extra bytes
    • If X+Y is unaligned, we would need Z-1 extra bytes in the worst case
  • Example:
    • Requested alignment 8
    • malloc returns 0xF07
    • we add two bytes for our offset storage, which brings us to 0xF09
    • We need 7 extra bytes to get us to 0xF10.
  • Example #2 (let's try to prove we don't need 8):
    • Requested alignment 8
    • malloc returns 0xF06
    • We add two bytes for our offset storage, bringing us to 0xF08
    • We are now 8 byte aligned

So our worst case padding for malloc is:

sizeof(offset_t) + (alignment - 1)

Which translates to our allocation as:

uint32_t hdr_size = PTR_OFFSET_SZ + (align - 1);
void * p = malloc(size + hdr_size);

After we've made the call to malloc, we need to actually align our pointer and store the offset:

if(p)
{
    ptr = (void *) align_up(((uintptr_t)p + PTR_OFFSET_SZ), align);
    *((offset_t *)ptr - 1) = (offset_t)((uintptr_t)ptr - (uintptr_t)p);
}

Note that we align the address after including the offset size, as shown in the example above. Even in the best case scenario where our pointer is already aligned, we need to handle this API in a generic way. Offset storage is always required.

Note: If you are unfamiliar with uintptr_t, it is a standard type that is large enough to contain a pointer address.

Once we have our new aligned address, we move backwards in memory from the aligned location to store the offset. We now know we always need to look one location back from our aligned pointer to find the true offset.

Here's what our finished aligned_malloc looks like:

void * aligned_malloc(size_t align, size_t size)
{
    void * ptr = NULL;

    //We want it to be a power of two since align_up operates on powers of two
    assert((align & (align - 1)) == 0);

    if(align && size)
    {
        /*
         * We know we have to fit an offset value
         * We also allocate extra bytes to ensure we can meet the alignment
         */
        uint32_t hdr_size = PTR_OFFSET_SZ + (align - 1);
        void * p = malloc(size + hdr_size);

        if(p)
        {
            /*
             * Add the offset size to malloc's pointer (we will always store that)
             * Then align the resulting value to the arget alignment
             */
            ptr = (void *) align_up(((uintptr_t)p + PTR_OFFSET_SZ), align);

            //Calculate the offset and store it behind our aligned pointer
            *((offset_t *)ptr - 1) = (offset_t)((uintptr_t)ptr - (uintptr_t)p);

        } // else NULL, could not malloc
    } //else NULL, invalid arguments

    return ptr;
}

aligned_free

As is true in most of the free implementations that we've seen, aligned_free is a much simpler implementation than aligned_malloc.

With aligned_free, we look backwards from the pointer to find the offset:

offset_t offset = *((offset_t *)ptr - 1);

Once we have the offset we can recover the original pointer and pass that to free:

void * p = (void *)((uint8_t *)ptr - offset);
free(p);

Here's what our finished aligned_free looks like:

void aligned_free(void * ptr)
{
    assert(ptr);

    /*
    * Walk backwards from the passed-in pointer to get the pointer offset
    * We convert to an offset_t pointer and rely on pointer math to get the data
    */
    offset_t offset = *((offset_t *)ptr - 1);

    /*
    * Once we have the offset, we can get our original pointer and call free
    */
    void * p = (void *)((uint8_t *)ptr - offset);
    free(p);
}

Warning

Note well: you must be very careful not to mix up free and aligned_free. If you call free on an aligned pointer, free will not recognize the allocation and you may crash or experience other strange effects. Calling aligned_free on an unaligned pointer will likely result in you reading an invalid offset value and calling free with random data.

In a future article I will show you how to protect against these simple error cases by using C++ special pointers with custom allocators and deleters.

Putting it All Together

You can find an aligned malloc example in the embedded-resources git repository.

To build it, simply run make from the top level, or in examples/c/ run make or make malloc_aligned.

If you want to use aligned malloc in your own project, simplly change this line:

#define COMPILE_AS_EXAMPLE

to:

#undef COMPILE_AS_EXAMPLE

Further Reading