Generating Aligned Memory - Embedded Artistry

22 February 2017 by Phillip Johnston • Last updated 9 October 2024

Embedded systems often have requirements for pointer alignment. These alignment requirements exist in many places, some including:

General device/CPU requirements
- Unaligned access may generate a processor exception with registers that have strict alignment requirements
Cache line size
- You don’t want to accidentally perform clean/invalidate operations on random data
Peripheral hardware requirements
- DMA and USB peripherals often require 8-, 16-, 32-, or 64-byte alignment of buffers depending on the hardware design
MPU regions (e.g., 32-byte aligned)
MMU Transition tables
Stack pointers
Interrupt vectors
- ARM requires the base address to be 32-, 64-, 128-, 256-word aligned.
Optimized pointer / memory access handling
- Unaligned addresses require multiple read instructions
- aligned addresses require a single read instruction

In this article, we’ll look at methods for allocating aligned memory and implementing aligned variants of malloc and free.

Table of Contents:

How to Align Memory
Aligning Dynamically Allocated Memory
Putting it All Together
Further Reading

How to Align Memory

Our needs to align memory extend to both static and dynamic memory allocations. Let’s look at how to handle both cases.

Compiler Alignment Attribute

For static & stack allocations, we can use the GNU defined alignment attribute.

This attribute will force the compiler to allocate the variable with at least the requested alignment (e.g. you could request 8-byte alignment and get 32-byte alignment).

Example usage of the alignment attribute from the GNU documentation:

struct S { short f[3]; } __attribute__ ((aligned (8)));
typedef int more_aligned_int __attribute__ ((aligned (8)));

Standard Alignment Functionality

C11 and C++11 introduced standard alignment specifiers:

C: _Alignas() and the alignas macro (defined in the header <stdalign.h> until C23, when it becomes a compiler-defined macro)
C++: alignas()

You can use the C and C++ variants in the same way: by specifying the desired alignment in bytes via a specific size, or by specifying desired alignment as a type:

// this buffer will be 16-byte aligned, not 1-byte aligned
alignas(16) uint8_t buffer[BUFFER_SIZE];

// This buffer will be 4-byte aligned
alignas(uint32_t) uint8_t another_buffer[BUFFER_SIZE];

// In C++, we can apply alignas to a struct or class. The outcome here is:
// - every object of type my_struct will be aligned to a 32-byte boundary
// - otherwise, normal struct padding/offset rules apply
//     - member x will be 32-byte aligned, as it is the first member
//     - member buffer will be 1-byte aligned
struct alignas(32) my_struct
{
    uint32_t x;
    uint8_t buffer[BUFFER_SIZE];
};

// In C, alignas won't work on a struct definition. You need to align the first member.
// Alternatively, the aligned attribute will work on the struct definition.
struct my_struct
{
    alignas(32) uint32_t x;
    uint8_t buffer[BUFFER_SIZE];
};

// You could also just align the variable declaration, rather than the first member.
// But you would have to do this every time if alignment mattered for every object.
alignas(32) struct my_struct an_object;

// This example uses alignas on a "later" struct member. The outcome here is:
// - every object of type data will be 128-byte aligned
// - member cacheline will also be 128-byte aligned
// - there will be padding between member x and member cacheline
// - overall size of the object increases from 132 bytes to 256 bytes
struct data {
  uint32_t x;
  alignas(128) char cacheline[128]; // over-aligned array of char, 
                                    // not array of over-aligned chars
};

// If you wanted to avoid the padding in the example above, you could:
// - Put the aligned data _first_, if there were no other ordering requirements forcing it to come later
// - Allocate the buffer externally and keep a ** in the struct. The tradeoff here is a likely cleanup operation.

You can also get the alignment of a specific type using:

C11: _Alignof and the alignof macro (defined in the header <stdalign.h> until C23, when it becomes a compiler-defined macro)
C++11: alignof

These return the alignment in bytes required for the specified type.

Dynamic Memory Alignment

When we call malloc, we are going to receive memory with “fundamental alignment,” which is an alignment that is suitably aligned to store any kind of variable. This is vague, and the fundamental alignment can change from one system to another. But, in most cases, you are going to receive memory that is 8-byte aligned on 32-bit systems, and 16-byte aligned on 64-bit systems.
Of course, we might have alignment requirements that are greater, such as a USB buffer being 32-byte aligned, or a 128-byte aligned variable that will fit in a cache line. What can we do to get dynamically allocated memory to match these greater alignment requirements?

A common POSIX API that you may be familiar with is memalign. memalign provides exactly what we need:

void *memalign(size_t alignment, size_t size);

Let’s see how to implement the equivalent support for our system.

Aligning Dynamically Allocated Memory

Since we have already implemented malloc on our system (or have malloc defined on a development machine), we can use malloc as our base memory allocator. Any other allocator will work, such as the built-in FreeRTOS or ThreadX allocators.

Since malloc (or another dynamic memory allocator) is not guaranteed to align memory as we require, we’ll need to perform two extra steps:

Request extra bytes so we can return an aligned address
Request extra bytes and store the offset between our original pointer and our aligned pointer

By allocating these extra bytes, we are making a tradeoff between generating aligned memory and wasting some bytes to ensure the alignment requirement can be met.

Now that we have our high-level strategy, let’s prototype the calls for our aligned malloc implementation. Mirroring memalign, we will have:

void * aligned_malloc(size_t align, size_t size);
void aligned_free(void * ptr);

// Convenience macro for memalign, the POSIX API
#define memalign(align, size) aligned_malloc(align, size)

Why do we require a separate free API for our aligned allocations?

We are going to be storing an offset and returning an address that differs from the address returned by malloc. Before we can call free on that memory, we have to translate from our aligned pointer to the original pointer returned by malloc.

Now that we know what our APIs look like, what definitions do we need to manage our storage overhead?

// Number of bytes we're using for storing 
// the aligned pointer offset
typedef uint16_t offset_t;
#define PTR_OFFSET_SZ sizeof(offset_t)

I’ve defined the offset_t to be a uint16_t. This supports alignment values up to 64k, a size which is already unlikely to be used for alignment.

Note

Should we need to support larger alignments, we can upgrade this type by adjusting the typedef and increasing the number of bytes used to store the offset with each aligned memory pointer.

I’ve also generated a convenience macro for the offset size. You could skip this macro and just use sizeof(offset_t) if you prefer.

Finally, we need some way to align our memory. I use this align_up definition:

#ifndef align_up
#define align_up(num, align) \
    (((num) + ((align) - 1)) & ~((align) - 1))
#endif

Note that this operates on powers of two, so we will have to limit our alignment values to powers of two.

aligned_malloc

Let’s start with aligned_malloc. Recall the prototype:

void * aligned_malloc(size_t align, size_t size)

Thinking about our basic function skeleton: we need to ensure align and size are non-zero values before we try to allocate any memory. We also need to check that our alignment request is a power of two, because of our align_up macro.

These requirements result in the following skeleton:

void * ptr = NULL;

// We want it to be a power of two since 
// align_up operates on powers of two
assert((align & (align - 1)) == 0);

if(align && size)
{
    //...
}

return ptr;

Now that we have protections in place, let’s work on our actual aligned memory allocation. We know we need to allocate extra bytes, but what do we actually allocate?

Consider:

I call malloc and get a memory address X.
I know I need to store a pointer offset value Y, which is fixed in size.
Our alignment Z is variable.
To handle this generically, I always need to store alignment offset.
- This is true even if the pointer is aligned
When I allocate memory, X+Y (address + offset size) has the possibility to be aligned, but it may also be unaligned
- If X+Y is aligned, we would need no extra bytes
- If X+Y is unaligned, we would need Z-1 extra bytes in the worst case
Example:
- Requested alignment 8
- malloc returns 0xF07
- we add two bytes for our offset storage, which brings us to 0xF09
- We need 7 extra bytes to get us to 0xF10.
Example #2 (let’s try to prove we don’t need 8):
- Requested alignment 8
- malloc returns 0xF06
- We add two bytes for our offset storage, bringing us to 0xF08
- We are now 8 byte aligned

So our worst case padding for malloc is:

sizeof(offset_t) + (alignment - 1)

Which translates to our allocation as:

uint32_t hdr_size = PTR_OFFSET_SZ + (align - 1);
void * p = malloc(size + hdr_size);

After we’ve made the call to malloc, we need to actually align our pointer and store the offset:

if(p)
{
    ptr = (void *) align_up(((uintptr_t)p + PTR_OFFSET_SZ), align);
    *((offset_t *)ptr - 1) = (offset_t)((uintptr_t)ptr - (uintptr_t)p);
}

Note that we align the address after including the offset size, as shown in the example above. Even in the best-case scenario where our pointer is already aligned, we need to handle this API generically. Offset storage is always required.

Note

If you are unfamiliar with uintptr_t, it is a standard type that is large enough to contain a pointer address.

Once we have our new aligned address, we move backwards in memory from the aligned location to store the offset. We now know we always need to look one location back from our aligned pointer to find the true offset.

Here’s what our finished aligned_malloc looks like:

void * aligned_malloc(size_t align, size_t size)
{
    void * ptr = NULL;
    
    // We want it to be a power of two since
    // align_up operates on powers of two
    assert((align & (align - 1)) == 0);

    if(align && size)
    {
        /*
         * We know we have to fit an offset value
         * We also allocate extra bytes to ensure we 
         * can meet the alignment
         */
        uint32_t hdr_size = PTR_OFFSET_SZ + (align - 1);
        void * p = malloc(size + hdr_size);

        if(p)
        {
            /*
             * Add the offset size to malloc's pointer 
             * (we will always store that)
             * Then align the resulting value to the 
             * target alignment
             */
            ptr = (void *) align_up(((uintptr_t)p + PTR_OFFSET_SZ), align);

            // Calculate the offset and store it 
            // behind our aligned pointer
            *((offset_t *)ptr - 1) = 
                (offset_t)((uintptr_t)ptr - (uintptr_t)p);

        } // else NULL, could not malloc
    } //else NULL, invalid arguments

    return ptr;
}

aligned_free

As is true in most of the free implementations that we’ve seen, aligned_free is a much simpler implementation than aligned_malloc.

With aligned_free, we look backwards from the pointer to find the offset:

offset_t offset = *((offset_t *)ptr - 1);

Once we have the offset we can recover the original pointer and pass that to free:

void * p = (void *)((uint8_t *)ptr - offset);
free(p);

Here’s what our finished aligned_free looks like:

void aligned_free(void * ptr)
{
    assert(ptr);

    /*
    * Walk backwards from the passed-in pointer 
    * to get the pointer offset. We convert to an offset_t 
    * pointer and rely on pointer math to get the data
    */
    offset_t offset = *((offset_t *)ptr - 1);

    /*
    * Once we have the offset, we can get our 
    * original pointer and call free
    */
    void * p = (void *)((uint8_t *)ptr - offset);
    free(p);
}

Warning

Note well: you must be very careful not to mix up free and aligned_free. If you call free on an aligned pointer, free will not recognize the allocation and you may crash or experience other strange effects. Calling aligned_free on an unaligned pointer will likely result in you reading an invalid offset value and calling free with random data.

In a future article, I will show you how to protect against these simple error cases by using C++ special pointers with custom allocators and deleters.

Other Dynamic Allocation Strategies

The above dynamic aligned allocation example is just one approach. It introduces memory overhead, and depending on the required alignment and memory constraints, this overhead can be significant. However, many strategies for dynamically allocating aligned memory will involve some “wasted storage”, so the choice comes in adopting a strategy that minimizes waste for your particular use case.

If you’re making allocations at a common fixed size and with a particular alignment requirement (e.g., image frames from a camera), you can reduce the overhead with a custom allocator that returns blocks of memory from a pre-allocated pool with the required alignment and size (called, unsurprisingly, a “block” or “pool” allocator). The downside here is that blocks are of a fixed size, so you will still end up wasting memory if you need less than a block’s worth of memory. For a case like an image frame from a camera, however, this isn’t too much of a concern: you rarely know in advance just how big a frame will be, and so you want to allocate with the maximum frame size you might receive.

Alternatively, you can investigate a buddy allocation strategy. Buddy allocation has the desired property that blocks will always align to a power-of-two. The allocator’s largest block size and lower limit can be tuned based on your system’s needs and to minimize the amount of possible wasted memory per allocation.

Putting it All Together

You can find an aligned malloc example in the embedded-resources git repository.

To build it, simply run make from the top level, or in examples/c/ run make or make malloc_aligned.

If you want to use aligned_malloc in your own project, simply change this line:

#define COMPILE_AS_EXAMPLE

to:

#undef COMPILE_AS_EXAMPLE

The production version of this code is also found in embeddedartistry/libmemory, in the file aligned_malloc.c. The file link is to the version corresponding to the implementation approach in this article.

7 Replies to “Generating Aligned Memory”

Anonymous says:

8 May 2021 at 22:44

Awesome post, but a uint16_t allows for alignment values of up to 32Kb, not 64Kb,
Anonymous says:

15 April 2022 at 15:58

uint16_t Is unsigned short int, 16 bit unsigned integer.
You can see max value on limita.h at define USHR_MAX you can see 65536.
65536/1024=64kb.
64kb are corrct.
Smeet Somaiya says:

7 August 2022 at 15:56

Can you elaborate more on the different data types used while doing pointer arithmetic? For ex: align_up(((uintptr_t)p + PTR_OFFSET_SZ), align);

Here if uintptr_t is 4 bytes, wouldn’t p + PTR_OFFSET_SZ = p + 2*sizeof(uintptr_t)? This would add 8 bytes instead of 2 making it incorrect. Am I missing something?
Phillip Johnston says:

8 August 2022 at 08:12

Here if uintptr_t is 4 bytes, wouldn’t p + PTR_OFFSET_SZ = p + 2*sizeof(uintptr_t)? This would add 8 bytes instead of 2 making it incorrect. Am I missing something?

The cast to uintptr_t converts the value of p to a number (uintptr_t: an unsigned integer type large enough to hold an address), so when we add PTR_OFFSET_SZ (2 bytes), we are in “normal math” land instead of “pointer math” land.
Conor ORourke says:

20 October 2022 at 08:12

Minor corner cases:

It might be worthwhile asserting that align is not 0.

You’ve made align unsigned. Now a sizeof(size_t) should be equal to sizeof(uintptr_t) but it’s worth thinking about a world where, say, a uintptr_t is 64 bit and a size_t is 32 bit. Then ~(align – 1) ends up being inverted only within 32 bits, without sign extension. That would zero the top 32 bits of your pointer.
Anonymous says:

12 October 2024 at 14:10

In this piece of code :
*((offset_t *)ptr – 1) =
(offset_t)((uintptr_t)ptr – (uintptr_t)p);

why is it not
*((offset_t *)ptr – PTR_OFFSET_SZ ) =
(offset_t)((uintptr_t)ptr – (uintptr_t)p);
?

Similarly in free.

Thank you!
Phillip Johnston says:

14 October 2024 at 08:37

We’re operating on a pointer type, so pointer arithmetic is in place, and we’re moving back by the size of one element . PTR_OFFSET_SZ in that context would move back too far.

Share Your ThoughtsCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

How to Align Memory

Compiler Alignment Attribute

Standard Alignment Functionality

Dynamic Memory Alignment

Aligning Dynamically Allocated Memory

aligned_malloc

aligned_free

Warning

Other Dynamic Allocation Strategies

Putting it All Together

Further Reading

7 Replies to “Generating Aligned Memory”

Share Your ThoughtsCancel reply