29 June 2020 by Klemens Morgenstern • Last updated 16 December 2021

Today we have a guest post from Klemens Morgenstern, an embedded C++ consultant. You can learn more about Klemens on his website.

Metal.Serial: Line Information Without Strings

I am currently working on my metal.serial tool. It will make it easy to communicate through a serial port (or any byte steam) with a device, with minimal overhead on the target. This series of articles goes through some issues that I found interesting.

Macro Magic

When debugging anything, it is necessary to know where an event or error occurred. This is why logging and testing frameworks usually output the source location. The most common way to do this is to use the __FILE__ and __LINE__ macros¹.

This works fine in most environments. You can just write a simple macro like this:

#include <iostream>
#define LOG(Expression) std::clog << __FILE__ << "(" __LINE__ << "): " << Expression << std::endl;

void log_this(int i)
{
    LOG("i is " << i);
    // logs:
    // /home/embedded-artistry/manhattan-project/src/main.cpp(9): i is 42
}

The first problem we might face in an embedded environment is the amount of allocations and time needed to convert the int to a string. Let’s see if we can optimize this operation using additional macros:

#define STRINGIFY(Value) #Value

#define LINE_AS_CSTR_IMPL(Value) STRINGIFY(Value)
#define LINE_AS_CSTR LINE_AS_CSTR_IMPL(__LINE__)

This macro uses the preprocessor to convert the current line into a string. This enables us to use string concatenations like so (link to gce):

#define LOCATION_STR __FILE__ ":" LINE_AS_CSTR
// /home/embedded-artistry/manhattan-project/src/main.cpp:9
const char * location = LOCATION_STR;

Now this approach brings up another issue in embedded environments, namely space. If we have a predefined macro for every line, we are generating a bunch of text. For example, let’s say the example macro has a length of 57 bytes. If we have 100 macros like this, we fill up 5,7 kB with text.

Jacek Galowicz has developed a method to strip the full path from the __FILE__ macro, leaving only the filename. We’ll incorporate his method and combine it with our existing __LINE__concatenation:

#define STRINGIFY(Value) #Value

#define LINE_AS_CSTR_IMPL(Value) STRINGIFY(Value)
#define LINE_AS_CSTR LINE_AS_CSTR_IMPL(__LINE__)

using cstr = const char * const;

static constexpr cstr past_last_slash(cstr str, cstr last_slash)
{
    return
        *str == '\0' ? last_slash :
        *str == '/'  ? past_last_slash(str + 1, str + 1) :
                       past_last_slash(str + 1, last_slash);
}

static constexpr cstr past_last_slash(cstr str) 
{ 
    return past_last_slash(str, str);
}

#define __SHORT_FILE_AND_LINE__ past_last_slash(__FILE__ ":" LINE_AS_CSTR)

This gives us a compile time constant value for the current source location, but if you look at the disassembly (Compiler explorer) you will see the following (with -O3):

.LC0:
        .string "./example.cpp:23"
location:
        .quad   .LC0+2

The compiler actually keeps the full string and generates code that gives us an offset. We do get a more readable & compile-time constant value, but still might have a binary full of text.

Enter Inline Assembly

Note that this only works with GCC/Clang, not MSVC. It also needs debug symbols.

Here’s my idea: we place a custom label using inline assembly and use that to identify the source location.

void test()
{
    asm("test_label:");
}

This gives us the following assembly outline:

test():
        push    rbp
        mov     rbp, rsp
test_label:
        nop
        pop     rbp
        ret

This tells us that we can put a label in the middle of a function. Now that we have placed a marker, how do we get its address?

extern const std::uintptr_t test_label;

This gives us the value of test_label in memory, if the name is not mangled. Since C++ does mangle we need to use the gcc asm label extension to give the code marker an explicit name like this:

extern const std::uintptr_t random_name asm("test_label");

Now let’s make it unique, using a bit of macro magic to get the location value:

#define LOCATION_IMPL(CNT) \
    []{ \
        __asm("__metal_serial_" #CNT ":" ); \
        extern const std::uintptr_t __code_location ## CNT __asm("__metal_serial_" #CNT);   \
        return & __code_location ## CNT; \
    }()

#define LOCATION_IMPL2(CNT) LOCATION_IMPL(CNT)
#define LOCATION() LOCATION_IMPL2(__COUNTER__)

So now we can get a pointer representing our code location, like so:

int main()
{
    auto loc = LOCATION();
    std::cout << loc << std::endl;
    return 0;
}

This will now print the location in the form of an address. You can use addr2line to figure out exactly where the call came from.

Except there is one more potential complication. If you run your binary on a bare metal device, it is quite common for the binary in memory to start a address zero. For more detail, see the appendix.

For hosted environments this is not the case, so you need to calculate an offset and map the virtual address to the binary address/offset. The easiest way to do that is to print out the address of a function, e.g. main, and compare it to the one you find the outline given by nm. That is:

int main()
{
    std::cout << "main: " << reinterpret_cast<void*>(&main) << std::endl;
    auto loc = LOCATION();
    std::cout << "location: " << loc << std::endl;
    return 0;
}

The above program might then output something like this:

main: 0x401191
location: 0x401234

Then we grab the disassembly with nm my_binary, and get the location of main in a line looking like this:

> arm-none-eabi-nm my_binary
0000000000001191 T main

Which let’s us calculate the offset as 0x400000 , so location is 0x1234, allowing us to invoke addr2line

arm-none-eabi-addr2line my_binary 0x1234
/home/embedded-artistry/manhattan-project/src/main.cpp:9

Summary

We’ve shown two ways to give code locations for logging or testing in this article. The latter is more complicated, but reduces the size from a string to one pointer. I am currently working on a tool called metal.serial that will automate this process for you and will of course post about it’s progress on this blog.

Appendix: Binaries in memory

Any program needs to be in an address space for the CPU to execute it. This way the CPU holds a pointer to the address of the current instruction and can easily increment that to execute the next command.

In modern operating systems a program is usually loaded into memory, meaning that with every execution it is in another physical place of the RAM, changing the addresses of the function pointers. The program is however loaded as a chunk so that the internal relative addresses (which is how function calls work) still fit. This is why we can calculate the offset and thereby get the right address, since function symbols are stored as relocations, i.e. relocation from zero.

In an embedded system this is different. Since no operating system is loading the program, the CPU just has a fixed address at which is starts execution. If this happens to be zero all our values work. If the CPU looks for the first instruction at 0x42 this won’t work.

For further optimization, the Microcontrollers can also map the ROM to this address, so there is no need to even load the program into memory. The ROM behaves like readable RAM. As minor trivia: cartridges as used by the Commodore 64 also map directly to memory, so no data get’s loaded. The cartridge data is read as if it was already in memory.

We assume the line is enough, since it is idiomatic for C++ to keep it to one expression per line. ↩

Related Terms:

Metal.Serial: Line Information Without Strings

Macro Magic

Enter Inline Assembly

Summary

Appendix: Binaries in memory

Share Your ThoughtsCancel reply