Library

Enforce Correct Integer Arithmetic using the C++ Safe Numerics Library

The Boost Safe Numerics library, created by Robert Ramey aims to enforce correct mathematical operations with the C++ language.

Why are safer numeric operations needed?

C++ inherited the behavior of the integral types (int, unsigned, long, etc.) from the C language, where they were designed to map closely to the underlying processor hardware. The types were originally mapped to a specific number of bits. When the result of an arithmetical operation exceeded the fixed capacity, an overflow occurs and the result is incorrect. Adding to the difficulty is the fact that C/C++ will automatically convert among integral types, for example when implementing binary operations.

I know that in my own programs, I have been bitten by arithmetical overflow numerous times.

The Safe Numerics library provides drop-in replacements for built-in integral types to ensure that mathematical operations on integral types are verified for correctness with as little runtime overhead as possible. Operations are guaranteed to be either arithmetically correct, to emit a compilation error, or to trigger a runtime exception.

#include <boost/safe_numeric/safe_integer.hpp>
using namespace boost::numeric;
safe<int> f(safe<int> x, safe<int> y){
  return x + y; // throw exception if correct
                // result cannot be returned
}

Now, if you're an embedded developer you may have stopped reading at "exception". No need to fear - exceptions are not actually required for this library. You can select or define an exception policy class to:

  • Trap any case which might generate an exception at compile-time (using the trap_exception policy)
  • Specify a custom function to invoke at runtime (pick your favorite variant of panic(), assert(), abort(), exit())

The library has a handful of other features, such as the ability to define promotion policies and enforce ranges on an integer, and to define a safe numeric literal.

The Safe Numerics library is well-documented. Documentation includes tutorials, case studies, and advice for eliminating runtime penalties.

The library requires C++14, as features specific to that version allow for minimization of runtime overhead.

You must install the following Boost Libraries to use this library:

  • MPL
  • Integer
  • Config
  • Concept Checking
  • Tribool
  • Enable_if

You can find this library on GitHub or clone it directly:

$ git clone git@github.com:boostorg/safe_numerics.git

Further Reading

For more information about the Safer Numerics library, check out the following:

Regular Expressions for Embedded C++

Today we have a guest post from Klemens Morgenstern, an embedded C++ consultant. You can learn more about Klemens on his website.

Table of Contents:

  1. Regular Expressions for Embedded C++
  2. A Quick Note on Versioning
  3. How to Get the Library
  4. Regex
    1. Matching a Number
    2. Compile time magic
    3. Matching a Protocol
      1. Frame Detection
      2. Fixed Frame Sizes
      3. Payload Types
    4. Binary Data
  5. Summary
  6. Further Reading

Regular Expressions for Embedded C++

C++11 added regular expression (regex) support to the standard library. However, this implementation does not work well for embedded systems. The regex objects have the regex specification passed in the constructor call, and then they construct an underlying state machine. Not only does this require dynamic memory allocations (and not to forget runtime), but it will also throw an exception if an invalid regex is provided.

There are libraries that try to be more lightweight, e.g. tiny-regex-cpp. But even the tiny-regex-cpp library computes the regex at startup, increasing the startup-time.

But what if new language features of C++ allowed us to build our regex state machine at compile-time? Well there's no need to wonder, since Hana Dusikova implemented a library that does it: Compile time regular expressions v2

In this article we will go through code examples and show the generated ARM assembly output using GCC 8.

A quick note on versioning

C++20 finally allows string to be passed as template parameters. But this is only supported by GCC9, while the newest release of the arm-gcc is version 8.

ctre::match<"REGEX">(subject); // C++20
"REGEX"_ctre.match(subject); // C++17 + N3599 extension

For this article we will stick to the C++17 version.

How to get the library

To use this library, download single_header/ctre.hpp from GitHub.

Regex

Regular expressions are a language to express character sequences. With the help of regular expressions you can find a substring (regex_search) or match a full string regex_match). To get familiar with regex, I highly recommend looking at Regex101.

Matching a number

Let's consider a simple regex that matches a number:

[0-9]+

This translates to: "match any character between 0 and 9 for one or more times (+)".

Here's how we would write that regex expression with CTRE:

#include <ctre.hpp>

using namespace ctre::literals;

//Declare the regex
constexpr auto rx = "[0-9]+"_ctre;

//Test it matches a string at compile time
static_assert(rx.match("1234567890"));

//Test it doesn't match the wrong one
static_assert(!rx.match("foo"));

Above, we have the regex defined and invoke it at compile-time to assert it matches a number, but not a name, as expected. This is already useful, but we might want actually use the result.

So let's match the string corresponding to the integer and then transform it into an actual int value. Note that we won't consider overflows for brevity sake

//return std::nullopt n case it can't match
constexpr std::optional<unsigned int> to_int(std::string_view sv)
{
    if (!rx.match(sv))
        return std::nullopt;

    int res = 0;
    for (auto c : sv)
    {
        res = (c - '0') + res*10;
    }

    return res;
}

//Check it matches
static_assert(to_int("12345"));
//Check the value
static_assert(*to_int("12345") == 12345);

Now we have a function that can convert a string to int at compile time. Thus far no assembly is generated: everything is resolved by the compiler.

To generate assembly output (for demonstration purposes), we can add a non-constexpr function call:

//force the compiler to export something
auto to_int_rt(std::string_view sv)
{
    return to_int(sv);
}

GCC generates the following assembly code:

to_int_rt(std::basic_string_view<char, std::char_traits<char> >):
        push    {r4, lr}
        sub     sp, sp, #8
        add     r3, sp, #8
        stmdb   r3, {r1, r2}
        ldm     sp, {r1, r3}
        add     r1, r3, r1
        cmp     r1, r3
        beq     .L2
        ldrb    r2, [r3]        @ zero_extendqisi2
        sub     r2, r2, #48
        cmp     r2, #9
        bhi     .L2
        mov     r2, r3
        sub     r4, r1, #1
        rsb     lr, r3, #1
.L3:
        cmp     r4, r2
        add     ip, lr, r2
        beq     .L16
        ldrb    ip, [r2, #1]!   @ zero_extendqisi2
        sub     ip, ip, #48
        cmp     ip, #9
        bls     .L3
.L2:
        mov     r3, #0
        strb    r3, [r0, #4]
        add     sp, sp, #8
        pop     {r4, pc}
.L16:
        cmp     ip, #0
        beq     .L2
        mov     r2, #0
.L6:
        ldrb    ip, [r3], #1    @ zero_extendqisi2
        add     r2, r2, r2, lsl #2
        sub     ip, ip, #48
        cmp     r1, r3
        add     r2, ip, r2, lsl #1
        bne     .L6
        mov     r3, #1
        str     r2, [r0]
        strb    r3, [r0, #4]
        add     sp, sp, #8
        pop     {r4, pc}

The fact that we can post the entire function here shows how efficient the code generation is. This regex function requires only 40 lines of assembly, without any function call or memory allocations. We can match regular expressions on the stack and we can use both runtime and compile time matching.

You can see the whole example on Compiler Explorer.

Compile time magic

With the ability of using a regex at compile time, we can make use of information that is already available in the project. Of course, this is highly dependent on the use-case, so this article can only provide very generic examples. Let's take our previous example and enhance it for versioning analysis.

Let's assume we use semantic versioning for our project and we have a VERSION_STRING globally defined. We want to use the version number for some logic.

In this example, we're going to use capture groups. This allows us to get several values out of the regex. Here's the regex for parsing our version number:

([0-9]+)\.([0-9]+)\.([0-9]+)

Here's the compile-time function that can parse the major, minor, and patch numbers from a version string:

#include <ctre.hpp>
#include <optional>

struct version
{
    int major;
    int minor;
    int patch;
};

constexpr std::optional<version> parse_version(std::string_view sv)
{
    using namespace ctre::literals;

    //Declare the regex
    constexpr auto version_rx = "([0-9]+)\\.([0-9]+)\\.([0-9]+)"_ctre;

    //If we use capture groups we get a tuple, so structured bindings work wonders.
    const auto [match, major_str, minor_str, patch_str] = version_rx.match(sv);

    //We need that when running it at compile time, BUT 
    if (!match)
    {
        return std::nullopt;
    }

    //C++17 constexpr lambda
    auto to_int= [](std::string_view sv) constexpr
         {
            int res = 0;
            for (auto c : sv)
                res = (c - '0') + res*10;
            return res;
         };

    //Return the integer values
    return version{
            to_int(major_str),
            to_int(minor_str),
            to_int(patch_str)
        };
}

constexpr auto version = *parse_version("1.3.5");
static_assert(version.major == 1);
static_assert(version.minor == 3);
static_assert(version.patch == 5);

Of course, this function can also be used at runtime.

Let's take our example a step further. Our version string is defined by an external tool, but we have the VERSION_STRING preprocessor variable available. We can make use of it in the code like this:

static_assert(version.major == 1, "This feature is only available in version 1");
static_assert(version.major == 1 && version.minor < 5, 
 "This workaround is a patch, and should be fixed properly with minor .6");

The full example is available on Compiler Explorer.

Matching a protocol

Frame detection

As a more practical example on a controller, let's assume we have a simple protocol with a variadic frame defined in the following way:

  • ~ start frame
  • payload
  • check_sum (4 hex chars)
  • # end frame

The payload itself is not defined in greater details, but we know that ~ and $ are escaped.

We can use a regex to match this simple protocol definition:

~([^~$]+)([0-9A-Fa-f]{4})#

~ //Literally match
[^~$]+ // match any character except ~$
[0-9A-Fa-f]{4} // match any hex character four times

And here's the C++ code that we need to match a frame:

#include <ctre.hpp>
#include <optional>

using namespace ctre::literals;
constexpr auto frame_rx = "~([^~$]+)([0-9A-Fa-f]{4})#"_ctre;

auto match_frame(std::string_view sv)
{
    auto [match, payload, check_sum] = frame_rx.match(sv);
    return std::make_tuple(payload, check_sum);
}

If you look into the assembly, you will notice a few more calls into the CRTE library. This is because the regex needs to look ahead, since [^~$] also matches the checksum. If we don't know the size of the payload, this behavior is unavoidable.

Fixed frame sizes

Let's say we do know the size of the payload. We'll assume that there are three different kinds of messages contained in the payload:

Type Size Regex
f 6 f.{6}
b 2 b.{2}
n 0 n

Combining those with our regex, we can now precisely predict the size and remove the recursive function call:

~(f.{4}|b.{2}|n)([0-9A-Fa-f]{4})#

You can see the resulting assembly on Compiler Explorer.

Payload types

Since we now already know how to detect the payload type, why not make use of it?

We already use alternatives, so we can modify our payload-capturing group to capture each type individually:

~(?:(f.{4})|(b.{2})|(n))([0-9A-Fa-f]{4})#

Note that (?:) represents a non-capturing group.

The regex above gives us four capture groups, f, b, n and check_sum. However only one of the payload groups will contain a value, so the regex allows us to check which one.

We can use this method to obtain the payload data:

constexpr auto frame_rx = "~(?:(f.{4})|(b.{2})|(n))([0-9A-Fa-f]{4})#"_ctre;

enum class payload_type {
    f, b, n
};

auto match_frame(std::string_view sv)
{
    auto [match, f,b,n, check_sum] = frame_rx.match(sv);
    if (f)
    {
        return std::make_tuple(payload_type::f, f.to_view(), check_sum);
    }
    else if (b)
    {
        return std::make_tuple(payload_type::b, b.to_view(), check_sum);
    }
    else
    {
        return std::make_tuple(payload_type::n, n.to_view(), check_sum);
    }
}

This is a very basic approach for demonstration purposes. Instead of using an enumerator, each payload type could be parsed into a class and returned as a std::variant.

See the full example on Compiler Explorer.

Binary data

So far we have only used ASCII data (the library also supports Unicode). What if we are supposed to work with binary data?

Regex does allow matching arbitrary values, but it should be noted that this refers to the index of the encoding. So testing this code for is advised. If values are in the ASCII range (<128) data-preserve-html-node="true" there should be no issues.

So let's take our former protocol, but make it binary:

  • STX start frame
  • payload
  • check_sum 2 bytes
  • ETX end frame

STX and ETX have to be escaped in the payload, but not in the check_sum, yielding us a bit of overhead

\x02([^\x02\x03]+)(.{2})\x03

We can extract the payload & check_sum. Transforming the binary checksum into an integer is also trivial.

constexpr auto frame_rx = "\\x02([^\\x02\\x03]+)(.{2})\\x03"_ctre;
auto match_frame(std::string_view sv)
{
    auto [match, payload, check_sum_str] = frame_rx.match(sv);
    unsigned char check_sum = 0;

    if (check_sum_str)
    {
        check_sum = (check_sum_str.to_view()[0] << 4) | check_sum_str.to_view()[1];
    }

    return std::make_tuple(payload, check_sum);
}

It might seem strange to use regex for binary data. Regular expressions are by far the most popular pattern matching tool, and the behaviour of binary data is well defined. Since we can make use of the powerful engine that compile-time expressions give us, that surely makes it worth our while.

You can see the full example on Compiler Explorer.

Summary

With the increased support for compile-time evaluation that C++ gets in every standard revision, code optimization not only increases, but more powerful tools become available. The compile-time regular expression library allows us to use regular expressions without requiring a heap or exception support, thus making regex usable in the most constrained software environments.

The full power of compile-time string evaluation is apparent in C++20, not C++17, so it is reasonable to expect more than just regular expressions in our future. We should expect to see full parsers implemented in this way.

Further Reading

Embedded Template Library

Updated: 20181219

We've been hard at work on our C++ embedded systems framework. A critical design requirement is that the framework core components do not utilize dynamic memory allocation. Unfortunately, this excludes most of the STL containers, as they get storage from the heap by default. Rather than reinventing the wheel, we spent some time searching for existing solutions to this problem. We were quite lucky to discover the Embedded Template Library, an open-source project created by John Wellbelove at Aster Consulting Ltd.

The Embedded Template Library (ETL) is complementary to the Standard Template Library (STL). The ETL provides fixed-size alternatives to STL container type. The ETL containers specify a maximum capacity at compile-time and make no calls to malloc/free/new/delete. As an added bonus, the containers are all cache friendly: storage for all container types is allocated in contiguous blocks.

Most of the ETL containers mimic, as best as possible, the existing STL containers. Additional container types are also provided, such as for memory pools.

We have been heavily using the following ETL modules while developing our framework:

The ETL goes further than just providing fixed-size STL container alternatives. The entire library does not use dynamic memory allocation. All storage is allocated either at compile time or on the stack. The library also uses no run-time type information (RTTI). The library provides configurable error handling by allowing the user to choose exceptions, assertions, error logging, or to completely disable error checking.

The library also provides a variety of additional algorithms, utilities, and other features that are useful to embedded systems developers, including:

The ETL is designed to work with C++03, since many projects and chipsets cannot support the newer standards. The ETL is an excellent way to gain access to some modern C++ features on C++03 projects.

The project is well-documented, with reference information available on the ETL Website. John also provides free email support for the ETL.

The ETL is released under the MIT License.

Further Reading

Change Log

  • 20181219:
    • Fixed links for ETL modules

Related Articles