memset, memcpy, memcmp, and memmove

We need to knock out many more libc functions before we can start with our C++ runtime bringup. Today we'll tackle the mem* funtions:

  1. memcmp
  2. memcpy
  3. memmove
  4. memset

These functions are vital for our system and are used throughout the standard libaries (e.g. calloc, realloc). These functions are also used during the startup process, when we move any relocatable sections and initialize variables in the bss section to 0.

The implementations I have chosen can be optimized further depending on your platform and time commitment. I have chosen portable implementations that will generally work well.

Many of these functions already have multiple implementations out in the wild. The strategy I have highlighted here, and one I recommend, is to find a good implementation and use it on your projects. There truly is no need to invent the wheel.


This memcmp implementation is my own (probably hacked together from various implementations I've seen before). This version short circuits if the memory is the same (it totally happens), sparing you the need to compare the buffers.

If the buffers are different, we simply iterate through the memory until we check each byte or we find a difference.

int memcmp(const void *p1, const void *p2, size_t n)
    size_t i;

    * p1 and p2 are the same memory? easy peasy! bail out
    if (p1 == p2)
        return 0;

    // This for loop does the comparing and pointer moving...
    for (i = 0; (i < n) && (*(uint8_t *)p1 == *(uint8_t *)p2);
        i++, p1 = 1 + (uint8_t *)p1, p2 = 1 + (uint8_t *)p2);

    //if i == length, then we have passed the test
    return (i == n) ? 0 : (*(uint8_t *)p1 - *(uint8_t *)p2);

You could always go with a simple implementation, like musl libc. Note that this version does not short-circuit.

int memcmp(const void *vl, const void *vr, size_t n)
    const unsigned char *l=vl, *r=vr;
    for (; n && *l == *r; n--, l++, r++);
    return n ? *l-*r : 0;


memcpy is certainly a function with many optimized versions. I have chosen this open source memcpy implementation and directly imported it into my repository. I like this version because it can be made portable very easily: change the unsigned long types to uintptr_t.

Another important aspect of this version is highlighted in a code comment:

 * Copy a block of memory, handling overlap.
 * This is the routine that actually implements
 * (the portable versions of) bcopy, memcpy, and memmove.

Because this version of memcpy handles overlap, we can actually use this implementation for memmove as well.

memcpy is an example of a function which can be optimized particularly well for specific platforms. If performance is a problem, some time searching for a platform-specific implementation that may better suit your needs.


As I mentioned above, our memcpy implementation handles overlapping memory regions. This means easiest way to implement memmove is to simply call memcpy. I have seen this implementation in many places, most notably with BSD and Apple's open source libraries.

void * memmove(void *s1, const void *s2, size_t n)
    return memcpy(s1, s2, n);

If your chosen memcpy implementation does NOT handle overlapping regions, you will need to actually implement memmove. Here is an example standalone memmove function from musl libc.


For memset, I have chosen to use a modified musl memset implementation. I stripped out the #ifdef __GNUC__ portion of the musl memset and kept the "pure C fallback" portion of the function.

I like this version because of the head/tail filling. Once the unaligned portions are filled, we can use more efficient aligned access functions.

void * __attribute__((weak)) memset(void * dest, int c, size_t n)
    unsigned char *s = dest;
    size_t k;

    /* Fill head and tail with minimal branching. Each
     * conditional ensures that all the subsequently used
     * offsets are well-defined and in the dest region. */

    if (!n) return dest;
    s[0] = s[n-1] = c;
    if (n <= 2) return dest;
    s[1] = s[n-2] = c;
    s[2] = s[n-3] = c;
    if (n <= 6) return dest;
    s[3] = s[n-4] = c;
    if (n <= 8) return dest;

    /* Advance pointer to align it at a 4-byte boundary,
     * and truncate n to a multiple of 4. The previous code
     * already took care of any head/tail that get cut off
     * by the alignment. */

    k = -(uintptr_t)s & 3;
    s += k;
    n -= k;
    n &= -4;
    n /= 4;

    uint32_t *ws = (uint32_t *)s;
    uint32_t wc = c & 0xFF;
    wc |= ((wc << 8) | (wc << 16) | (wc << 24));

    /* Pure C fallback with no aliasing violations. */
    for (; n; n--, ws++) *ws = wc;

    return dest;

Putting it All Together

I have started a libc example implementation in the embedded-resources repository. You can run make or make libc at the top level, or simply run make in examples/libc.

You can find my example memcmp, memcpy, memmove, and memset implementations in the string directory.

libc currently compiles as a library (libc.a).

Weak Symbols

If you check the source on github, you will notice that these functions are marked with __attribute__((weak)):

void * __attribute__((weak)) memmove(void *s1, const void *s2, size_t n)
    return memcpy(s1, s2, n);

A weak symbol can be overridden by another function definition.

Weak symbols allow you to keep a generic implementation that is portable across platforms. If you have an optimized version valid only for specific platforms, you can override the default implementation and get the optimization benefits for that platform.

Further Reading