9 January 2017 by Phillip Johnston • Last updated 21 April 2020

Many programmers know that certain behaviors in C/C++ are “undefined”. Undefined behaviors include common bugs like NULL dereference or signed integer overflow. However, many developers also trust that bugs around undefined behavior will result in an obvious error, but this is not always the case.

Why worry about undefined behavior at all? The C99 standard alone defines 191 undefined behaviors. The chances are that your code exhibits undefined behavior. Are you fully aware of the consequences to your program?

John Regehr dives into undefined behavior in C, as well as the dangers of relying on undefined behavior. Check out his article to better grok undefined behavior in C and C++.

My Highlights

One might say: Some of these compilers are behaving improperly because the C standard says a relational operator must return 0 or 1. But since the program has no meaning at all, the implementation can do whatever it likes. Undefined behavior trumps all other behaviors of the C abstract machine.

Do not rely on undefined behavior:

Moreover, there are compilers (like GCC) where integer overflow behaved a certain way for many years and then at some point the optimizer got just a little bit smarter and integer overflows suddenly and silently stopped working as expected. This is perfectly OK as far as the standard goes. While it may be unfriendly to developers, it would be considered a win by the compiler team because it will increase benchmark scores.

Why is undefined behavior even included? Shouldn’t it all be well-defined?

The good thing — the only good thing! — about undefined behavior in C/C++ is that it simplifies the compiler’s job, making it possible to generate very efficient code in certain situations. Usually these situations involve tight loops. For example, high-performance array code doesn’t need to perform bounds checks, avoiding the need for tricky optimization passes to hoist these checks outside of loops. Similarly, when compiling a loop that increments a signed integer, the C compiler does not need to worry about the case where the variable overflows and becomes negative: this facilitates several loop optimizations. I’ve heard that certain tight loops speed up by 30%-50% when the compiler is permitted to take advantage of the undefined nature of signed overflow. Similarly, there have been C compilers that optionally give undefined semantics to unsigned overflow to speed up other loops.

One suspects that the C standard body simply got used to throwing behaviors into the “undefined” bucket and got a little carried away. Actually, since the C99 standard lists 191 different kinds of undefined behavior, it’s fair to say they got a lot carried away.

John defines three types of functions:

Type 1: Behavior is defined for all inputs
Type 2: Behavior is defined for some inputs and undefined for others
Type 3: Behavior is undefined for all inputs

You won’t always be told that undefined behavior is biting you:

This case-collapsing view of undefined behavior provides a powerful way to explain how compilers really work. Remember, their main goal is to give you fast code that obeys the letter of the law, so they will attempt to forget about undefined behavior as fast as possible, without telling you that this happened.

John ends the article with a summary of rules to follow:

Enable and heed compiler warnings, preferably using multiple compilers
Use static analyzers (like Clang’s, Coverity, etc.) to get even more warnings
Use compiler-supported dynamic checks; for example, gcc’s -ftrapv flag generates code to trap signed integer overflows
Use tools like Valgrind to get additional dynamic checks
When functions are “type 2” as categorized above, document their preconditions and postconditions
Use assertions to verify that functions’ preconditions are postconditions actually hold
Particularly in C++, use high-quality data structure libraries

Basically: be very careful, use good tools, and hope for the best.

Related Terms:

3 Replies to “A Guide to Undefined Behavior in C and C++, Part 1”

Shafik Yaghmour says:

22 August 2018 at 17:10

It is not clear whether you mean malloc(0) is undefined behavior or dereferencing the result of malloc(0) is?

Fun to note that in C malloc(0) is allowed to return NULL or a pointer that is a valid pointer but invalid to use https://twitter.com/shafikyaghmour/status/1024212665554558976
Phillip Johnston says:

22 August 2018 at 19:18

Perhaps that is just poor choice of words. I stated that it is "undefined" because the behavior is chosen by the implementer, and as you indicated there are two valid options.

I see now that it doesn’t qualify for "undefined", since there are two clear choices 🙂
Phillip Johnston says:

22 August 2018 at 19:22

And your comment has me thinking about the fact that there are two dangerous categories: undefined behavior and implementation-defined behavior. Wonder why we don’t focus on #2 as much – I doubt many developers are aware of the implementation-specific behaviors they implicitly depend on in their programs.

Share Your ThoughtsCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.