A Guide to Undefined Behavior in C and C++, Part 1

Many programmers know that certain behaviors in C/C++ are "undefined". Undefined behaviors include common bugs like NULL dereference or malloc(0). However, many developers also trust that bugs around undefined behavior will result in an obvious error, but this is not always the case.

Why worry about undefined behavior at all? The C99 standard alone defines 191 undefined behaviors. The chances are that your code exhibits undefined behavior. Are you fully aware of the consequences to your program?

John Regehr dives into undefined behavior in C, as well as the dangers of relying on undefined behavior. Check out his article to better grok undefined behavior in C and C++.

My Highlights

One might say: Some of these compilers are behaving improperly because the C standard says a relational operator must return 0 or 1. But since the program has no meaning at all, the implementation can do whatever it likes. Undefined behavior trumps all other behaviors of the C abstract machine.

Do not rely on undefined behavior:

Moreover, there are compilers (like GCC) where integer overflow behaved a certain way for many years and then at some point the optimizer got just a little bit smarter and integer overflows suddenly and silently stopped working as expected. This is perfectly OK as far as the standard goes. While it may be unfriendly to developers, it would be considered a win by the compiler team because it will increase benchmark scores.

Why is undefined behavior even included? Shouldn't it all be well-defined?

The good thing — the only good thing! — about undefined behavior in C/C++ is that it simplifies the compiler’s job, making it possible to generate very efficient code in certain situations. Usually these situations involve tight loops. For example, high-performance array code doesn’t need to perform bounds checks, avoiding the need for tricky optimization passes to hoist these checks outside of loops. Similarly, when compiling a loop that increments a signed integer, the C compiler does not need to worry about the case where the variable overflows and becomes negative: this facilitates several loop optimizations. I’ve heard that certain tight loops speed up by 30%-50% when the compiler is permitted to take advantage of the undefined nature of signed overflow. Similarly, there have been C compilers that optionally give undefined semantics to unsigned overflow to speed up other loops.

One suspects that the C standard body simply got used to throwing behaviors into the “undefined” bucket and got a little carried away. Actually, since the C99 standard lists 191 different kinds of undefined behavior, it’s fair to say they got a lot carried away.

John defines three types of functions:

  • Type 1: Behavior is defined for all inputs
  • Type 2: Behavior is defined for some inputs and undefined for others
  • Type 3: Behavior is undefined for all inputs

You won't always be told that undefined behavior is biting you:

This case-collapsing view of undefined behavior provides a powerful way to explain how compilers really work. Remember, their main goal is to give you fast code that obeys the letter of the law, so they will attempt to forget about undefined behavior as fast as possible, without telling you that this happened.

John ends the article with a summary of rules to follow:

  • Enable and heed compiler warnings, preferably using multiple compilers
  • Use static analyzers (like Clang’s, Coverity, etc.) to get even more warnings
  • Use compiler-supported dynamic checks; for example, gcc’s -ftrapv flag generates code to trap signed integer overflows
  • Use tools like Valgrind to get additional dynamic checks
  • When functions are “type 2” as categorized above, document their preconditions and postconditions
  • Use assertions to verify that functions’ preconditions are postconditions actually hold
  • Particularly in C++, use high-quality data structure libraries

Basically: be very careful, use good tools, and hope for the best.