Improve volatile Usage with volatile_load() and volatile_store()

Updated: 20190322

A C++ proposal for deprecating the volatile keyword has surfaced. This may surprise our readers, because as Michael Caisse said, "volatile is the embedded keyword."

The original intent of the volatile keyword in C89 is to suppress read/write optimizations:

No cacheing through this lvalue: each operation in the abstract semantics must be performed (that is, no cacheing assumptions may be made, since the location is not guaranteed to contain any previous value). In the absence of this qualifier, the contents of the designated location may be assumed to be unchanged except for possible aliasing.

The problem with its use in C++ is that the meaning is much less clear, as it is mentioned 322 times in the C++17 draft of the C++ Standard.

One problematic and common assumption is that volatile is equivalent to "atomic". This is not the case. All the volatile keyword denotes is that the variable may be modified externally, and thus reads/writes cannot be optimized. This means that the volatile keyword only has a meaningful impact on load and store operations.

Where programmers run into trouble is using volatile variables in a read-modify-write operation, such as with the increment (++) and decrement (--) operators. Such operations create a potential for a non-obvious race condition, depending on how the operation is implemented in the compiler and platform.

volatile int i = 2; //probably atomic
i++; //not atomic ...

Other problematic volatile use cases can be found, such as chained assignments of volatile values:

// is b re-read before storing the value to a, or not?
a = b = c

We recommend using volatile_load<T>() and volatile_store<T>() template functions to encourage better volatile behavior in our programs.

auto r = volatile_load(&i);
volatile_store(&i, r);

You can use these functions to refactor your programs and control volatile use cases. While this implementation does not meet the proposed specification, it's a step toward cleaning up our use of the volatile keyword.

#include <cassert>
#include <type_traits>

/** Read from a volatile variable
 * @tparam TType the type of the variable. This will be deduced by the compiler.
 * @note TType shall satisfy the requirements of TrivallyCopyable.
 * @param target The pointer to the volatile variable to read from.
 * @returns the value of the volatile variable.
template<typename TType>
constexpr inline TType volatile_load(const TType* target)
        "Volatile load can only be used with trivially copiable types");
    return *static_cast<const volatile TType*>(target);

/** Write to a volatile variable
 * Causes the value of `*target` to be overwritten with `value`.
 * @tparam TType the type of the variable. This will be deduced by the compiler.
 * @note TType shall satisfy the requirements of TrivallyCopyable.
 * @param target The pointer to the volatile variable to update.
 * @param value The new value for the volatile variable.
template<typename TType>
inline void volatile_store(TType* target, TType value)
        "Volatile store can only be used with trivially copiable types");
    *static_cast<volatile TType*>(target) = value;

As Odin Holmes pointed out in the comments, refactoring our code to use volatile_load() and volatile_store() can also boost the performance of our programs. This is because we are constraining the optimizer more clearly.

This traditional volatile code:

volatile uint32_t* register_x;
* register_x &= ~mask;
* register_x |= value;

Will not be as performant as this version:

auto r = volatile_load(&register_x);
r &=~mask;
r |= value;
volatile_store(&register_x, r);

Further Reading

Change Log

  • 20190322:
    • Added comments from Odin Holmes regarding optimizations.

Converting between timespec & std::chrono

I was working with some POSIX APIs recently and needed to supply a timespec value. I primarily work with std::chrono types in C++ and was surprised that there were no (obvious) existing conversion methods. Below are a few utility functions that I came up with to handle common conversions.

Table of Contents

  1. timespec Refresher
  2. Conversion Functions
  3. Bonus: timeval conversions
  4. Further Reading

timespec Refresher

As a quick refresher, timespec is a type defined in the ctime header (aka time.h). The timespec type can be used to store either a time interval or absolute time. The type is a struct with two fields:

struct timespec {
   time_t   tv_sec;
   long     tv_nsec;

The tv_sec field represents either a general number of seconds, or seconds elapsed since 1970, and tv_nsec represents the count of nanoseconds.

Conversion Functions

A timespec can represent either an absolute time or time interval. With std::chrono, these are two separate concepts: std::chrono::duration represents an interval, while std::chrono::time_point represents an absolute time.

We need for four functions to convert between the two C++ time concepts and timespec:

  1. timespec to std::chrono::duration
  2. std::chrono::duration to timespec
  3. timespec to std::chrono::timepoint
  4. std::chrono::time_point to timespec

timespec to std::chrono::duration

Converting from a timespec to a std::chrono::duration (nanoseconds below) is straightforward: we convert tv_sec to std::chrono::seconds and tv_nsec to std::chrono::nanoseconds, and then cast the result to our target return type, std::chrono::nanoseconds.

using std::chrono; // for example brevity

constexpr nanoseconds timespecToDuration(timespec ts)
    auto duration = seconds{ts.tv_sec} 
        + nanoseconds{ts.tv_nsec};

    return duration_cast<nanoseconds>(duration);

std::chrono::duration to timespec

Converting from std::chrono::duration to timespec is a two step process. First we capture the portion of the duration which can be represented by a round number of seconds. We subtract this count from the total duration to get the remaining nanosecond count.

Once we have the two components, we can create our timespec value.

using std::chrono; // for example brevity

constexpr timespec durationToTimespec(nanoseconds dur)
    auto secs = duration_cast<seconds>(dur);
    dur -= secs;

    return timespec{secs.count(), dur.count()};

timespec to std::chrono::timepoint

For the std::chrono::time_point examples, I've used the system_clock as the reference clock.

To convert a timespec value to std::chrono::time_point, we first use our timespecToDuration() function to get a std::chrono::duration. We then use a duration_cast to convert std::chrono::duration to our reference clock duration (system_clock::duration).

We can then create a std::chrono::time_point value from our std::chrono::system_clock::duration.

using std::chrono; // for example brevity

constexpr time_point<system_clock, nanoseconds>
    timespecToTimePoint(timespec ts)
    return time_point<system_clock, nanoseconds>{

std::chrono::time_point to timespec

To convert from a std::chrono::time_point to timespec, we take a similar approach to the std::chrono::duration conversion.

First we capture the portion of the duration which can be represented by a round number of seconds. We subtract this count from the total duration to get the remaining nanosecond count.

Once we have the two components, we can create our timespec value.

using std::chrono; // for example brevity

constexpr timespec timepointToTimespec(
    time_point<system_clock, nanoseconds> tp)
    auto secs = time_point_cast<seconds>(tp);
    auto ns = time_point_cast<nanoseconds>(tp) -

    return timespec{secs.time_since_epoch().count(), ns.count()};

Bonus: timeval conversions

Another common time structure with POSIX systems is timeval, which is defined in the sys/time.h. This type is very similar to timespec:

struct timeval
    time_t         tv_sec;
    suseconds_t    tv_usec;

We can convert between timeval and std::chrono types in the same manner shown above, except std::chrono::microseconds is used in place of std::chrono::nanoseconds.

using std::chrono; // for example brevity

constexpr microseconds timevalToDuration(timeval tv)
    auto duration = seconds{tv.tv_sec} + microseconds{tv.tv_usec};

    return duration_cast<microseconds>(duration);

Further Reading

Related Articles

Musings on Tight Coupling Between Firmware and Hardware

Firmware applications are often tightly coupled to their underlying hardware and RTOS. There is a real cost associated with this tight coupling, especially in today's increasingly agile world with its increasingly volatile electronics market.

I've been musing about the sources of coupling between firmware and the underlying platform. As an industry, we must focus on creating abstractions in these areas to reduce the cost of change.

Let's start the discussion with a story.

Table of Contents:

  1. The Hardware Startup Phone Call
  2. Coupling Between Firmware and Hardware
    1. Processor Dependencies
    2. Platform Dependencies
    3. Component Dependencies
    4. RTOS Dependencies
  3. Why Should I Care?

The Hardware Startup Phone Call

I'm frequently contacted by companies that need help porting their firmware from one platform to another. These companies are often on tight schedules with a looming development build, production run, or customer release. Their stories follow a pattern:

  1. We built our first version of software on platform X using the vendor SDK and vendor-recommended RTOS
  2. We need to switch to platform Y because:
    1. X is reaching end of life
    2. We cannot buy X in sufficient quantities because Big Company bought the remaining stock
    3. Y is cheaper
    4. Y's processor provides better functionality / power profile / peripherals / GPIO availability
    5. Y's components are better for our application's use case
  3. Platform Y is based on a different processor vendor (i.e. SDK) and/or RTOS
  4. Our engineer is not familiar with Platform Y's processor/components/SDK/RTOS
  5. The icing on the cake: We need to have our software working on Platform Y within 30-60 days

After hearing the details of the project, I ask my first question, which is always greeted with the same answer:

Phillip: Did you create abstractions to keep your code isolated from the vendor SDK or RTOS?

Company: No. We're a startup and we were focused on moving as quickly as possible

I'll then ask my second question, which is always greeted with the same answer:

Phillip: Do you have a set of unit/functional tests that I can run to make sure the software is working correctly after the port?

Company: No. We're a startup and we were focused on moving as quickly as possible

Then I'll ask the final question, which is always greeted with the same answer:

Phillip: How can I tell whether or not the software is working correctly after I port it?

Company: We'll just try it out and make sure everything works

Given these answers, there's practically no chance I can help the company and meet their deadlines. If there are large differences in SDKs and RTOS interfaces, the software has to be rewritten from scratch using the old code base as a reference.

I also know that if I take on the project, I'm in for a risky business arrangement. How can I be sure that my port was successful? How can I defend myself from the client's claim that I introduced issues without having a testable code base to compare against?

Why am I telling you this story?

Because this scenario arises from a single strategic failure: failure to decouple the firmware application from the underlying RTOS, vendor SDK, or hardware. And as an industry we are continually repeating this strategic failure in the name of "agility" and "time to market".

These companies fail to move quickly in the end, since the consequences of this strategic blunder are extreme: schedule delays, lost work, reduced morale, and increased expenditures.

Coupling Between Firmware and Hardware

Software industry leaders have been writing about the dangers of tight coupling since the 1960s, so I'm not going to rehash coupling in detail. If you're unfamiliar with the concept, here is some introductory reading:

In Why Coupling is Always Bad, Vidar Hokstad brings up consequences of tight coupling, two of which are relevant for this musing:

  • Changing requirements that affect the suitability of some component will potentially require wide ranging changes in order to accommodate a more suitable replacement component.
  • More thought needs to go into choices at the beginning of the lifetime of a software system in order to attempt to predict the long term requirements of the system because changes are more expensive.

We see these two points play out in the scenario above.

If your software is tightly coupled to the underlying platform, changing a single component of the system - such as the processor - can cause your company to effectively start over with firmware development.

The need to swap components late in the program (and the resulting need to start over with software) is a failure to perform the up-front long-term thinking required by tightly coupled systems. Otherwise, the correct components would have been selected during the first design iteration, rendering the porting process unnecessary.

Let's review on a quote from Quality Code is Loosely Coupled:

Loose coupling is about making external calls indirectly through abstractions such as abstract classes or interfaces. This allows the code to run without having to have the real dependency present, making it more testable and more modular.

Decoupling our firmware from the underlying hardware is As Simple As That™.

Up front planning and design is usually minimized to keep a company "agile". However, without abstractions that easily enable us to swap out components, our platform becomes tied to the initial hardware selection.

You may argue that taking the time to design and implement abstractions for your platform introduces an unnecessary schedule delay. How does that time savings stack up against the delay caused by the need to rewrite your software?

We all want to be "agile", and abstractions help us achieve agility.

What is more agile than the ability to swap out components without needing to rewrite large portions of your system? You can try more designs at a faster pace when you don't need to rewrite the majority of your software to support a new piece of hardware.

Your abstractions don't need to be perfect. They don't need to be reusable on other systems. But they need to exist if you want to move quickly.

We need to start producing abstractions that minimize the four sources of tight coupling in our embedded systems:

  1. Processor Dependencies
  2. Platform Dependencies
  3. Component Dependencies
  4. RTOS Dependencies

Processor Dependencies

Processor dependencies are the most common form of coupling and arise from two major sources:

  1. Using processor vendor SDKs
  2. Using APIs or libraries which are coupled to a target architecture (e.g. CMSIS)

Processor-level function calls are commonly intermixed with application logic and driver code, ensuring that the software becomes tightly coupled to the processor. De-coupling firmware from the underlying processor is one of the most important for design portability and reusability.

In the most common cases, teams will develop software using a vendor's SDK without an intermediary abstraction layer. When the team is required to migrate to another processor or vendor, the coupling to a specific vendor's SDK often triggers a rewrite of the majority of the system. At this point, many teams realize the need for abstraction layers and begin to implement them.

In other cases, software becomes dependent upon the underlying architecture. Your embedded software may work on an ARM system, but not be readily portable to PIC, MIPS, AVR, or x86 machine. This is common when utilizing libraries such as CMSIS, which provides an abstraction layer for ARM Cortex-M processors.

A subtler form of architecture coupling can occur even when abstraction layers are used. Teams can create abstractions which depend on a specific feature, an operating model particular to a single vendor, or an architecture-specific interaction. This form of coupling is less costly, as the changes are at least isolated to specific areas. Interfaces may need to be updated and additional files may need to change, but at least we don't need to rewrite everything.

Platform Dependencies

Embedded software is often written specifically for the underlying hardware platform. Rather than abstracting platform-specific functionality, embedded software often interacts directly with the hardware.

Without being aware of it, we develop our software based on the assumptions about our underlying hardware. We write our code to work with four sensors, and then in the second version we only need two sensors. However, you need to support both version one and version two of the product with a single firmware image.

Consider another common case, where our software supports multiple versions of a PCB. Whenever a new PCB revision is released, the software logic must be updated to support the changes. Supporting multiple revisions often leads to #ifdefs and conditional logic statements scattered throughout the codebase. What happens when you move to a different platform, with different revision numbers? Wouldn't it be easier if your board revision decisions were contained in a single location?

When these changes come, how much of your code needs to be updated? Do you need to add #ifdef statements everywhere? Do your developers cringe and protest because of the required effort? Or do they smile and nod because it will only take them 15 minutes?

We can abstract our platform/hardware functionality behind an interface (commonly called a Board Support Package). What features is the hardware platform actually providing to the software layer? What might need to change in the future, and how can we isolate the rest of the system from those changes?

Multiple platforms & boards can be created that provide same set of functionality and responsibilities in different ways. If our software is built upon a platform abstraction, we can move between supported platforms with greater ease.

Component Dependencies

Component Dependencies are a specialization of the platform dependency, where software relies on the presence of a specific hardware component instance.

In embedded systems, software is often written to use specific driver implementations rather than generalized interfaces. This means that instead of using a generalized accelerometer interface, software typically works directly with a BMA280 driver or LIS3DH driver. Whenever the component changes, code interacting with the driver must be updated to use the new part. Similar to the board revision case, we will probably find that #ifdefs or conditionals are added to select the proper driver for the proper board revision.

Higher-level software can be decoupled from component dependencies by working with generic interfaces rather than specific drivers. If you use generic interfaces, underlying components can be swapped out without the higher-level software being aware of the change. Whenever parts need to be changed, your change will be isolated to the driver the declaration (ideally found within your platform abstraction).

RTOS Dependencies

An RTOS's functions are commonly used directly by embedded software. When a processor change occurs, the team may find that the RTOS they were previously using is not supported on the new processor.

Migrating from one RTOS to another requires a painful porting process, as there are rarely straightforward mappings between the functionality and usage of two different RTOSes.

Providing an RTOS abstraction allows platforms to use any RTOS that they choose without coupling their application software to the RTOS implementation.

Abstracting the RTOS APIs also allows for host-machine simulation, since you can provide a pthread implementation for the RTOS abstraction.

Why Should I Care?

It's a fair question. Tight coupling in firmware has been the status quo for a long time. You may claim it still must remain that way due to resource constraints.

Vendor SDKs are readily available. You can start developing your platform immediately. The rapid early progress feels good. Perhaps you picked all the right parts, and the reduced time-to-market will actually happen for your team.

If not, you will find yourself repeating the cycle and calling us for help.

It's not all doom and gloom, however. There are great benefits from reducing coupling and introducing abstractions.

  • We can rapidly prototype hardware without triggering software rewrites
  • We can take better advantage of unit tests, which are often skipped on embedded projects due to hardware dependencies
  • We can implement the abstractions on our host machines, enabling developers to write and test software on their PC before porting it to the embedded system
  • We can reuse subsystems, drivers, and embedded system applications on across an entire product line

I'll be diving deeper into some of these beneficial areas in the coming months.

In the meantime - happy hacking! (and get to those abstractions!)

Related Posts