gcc

Demystifying ARM Floating Point Compiler Options

When I first started bringing up new ARM platforms, I was pretty confused by the various floating point options such as -mfloat-abi=softfp or -mfpu=fpv4-sp-d16. I imagine this is confusing to other developers as well, so I'd like to share my ARM floating-point cheat sheet with the world.

An Overview of the ARM Floating-Point Architecture

Before we dive into compiler options, there are a few ARM floating-point details we should familiarize ourselves with: the ARM EABI, VFP, and NEON.

ARM EABI

An ABI is a specification which defines the rules that a generated program must follow to work with a specific platform or interface. The ARM EABI defines the rules for an ARM platform, and your compiler will build your program according to those rules.

The ARM EABI specification defines two incompatible ABIs: one which uses floating-point registers for function arguments, and another which does not. If we are not using hardware floating-point operations, we can simply build our program without using the floating-point compatible ABI.

Since ARM defines a standard floating-point instruction set, we can still utilize the floating-point ABI even if our chip does not support the actual hardware. If floating-point hardware is not present, the instructions will be trapped and executed by a floating-point emulation module instead. The only real difference in functionality is slower execution speed when using software emulation.

Since the ABI defines interfaces for our programs, we must compile and link all of our components and libraries using the same ABI.

Vector Floating Point (VFP)

Vector Floating Point (VFP) is the name for ARM's floating-point extension. Prior to ARMv8, VFP was implemented as a coprocessor extension. The VFP coprocessor supports both single and double-precision floating point operations according to the IEEE 754 standard. For practical purposes, VFP is not useful for vector operations and should be considered a normal scalar floating-point unit (FPU). VFP has been replaced with NEON as of ARMv8.

The VFP extensions are optional parts of the ARM architecture, though the majority of Cortex-A processors do provide a floating-point unit. Some Cortex-A8 devices may utilize a reduced VFPLite module instead of a full VFP module. This VFPLite module requires roughly a 10x increase in clock cycles per floating-point operation.

Some devices such as the ARM Cortex-A8 have a cut-down VFPLite module instead of a full VFP module, and require roughly ten times more clock cycles per float operation.[81]

VFP Versions

Here's a high-level summary of the different VFP versions that have been released throughout the years.

  • VFPv1
    • Obsoleted by ARM
  • VFPv2
    • 16 64-bit FPU registers
    • Optional extension to the ARM instruction set in the ARMv5TE, ARMv5TEJ, ARMv6, and ARMv6K architectures
    • Optional extension to the ARM and Thumb instruction set in the ARMv6T2 architecture
    • Supports standard FPU arithmetic (add, sub, neg, mul, div), full square root
  • VFPv3
    • Backwards compatible with VFPv2, except that it cannot trap floating-point exceptions
    • Adds VCVT instructions to convert between scalar, float and double
    • Adds immediate mode to VMOV such that constants can be loaded into FPU registers.
    • VFPv3-D32
      • 32 64-bit FPU registers
      • Implemented on most Cortex-A8 and A9 ARMv7 processors
    • VFPv3-D16
      • 16 64-bit FPU registers
      • Implemented on Cortex-R4 and R5 processors and the Tegra 2 (Cortex-A9).
    • VFPv3-F16
      • Uncommon
      • Supports IEEE754-2008 half-precision (16-bit) floating point as a storage format
    • VFPv3U
      • A variant of VFPv3 that supports the trapping of floating-point exceptions to support code.
      • Can support single- or half-precision floating point
  • VFPv4
    • Built on VFPv3
    • Adds half-precision support as a storage format
    • Adds fused multiply-accumulate instructions
    • VFPv4-D32
      • 32 64-bit FPU registers
      • Implemented on the Cortex-A12 and A15 ARMv7 processors
      • Cortex-A7 optionally has VFPv4-D32 (in the case of an FPU with NEON)
    • VFPv4-D16
      • 16 64-bit FPU registers
      • Implemented on Cortex-A5 and A7 processors (in case of an FPU without NEON)
    • VFPv4U
      • A variant of VFPv4 that supports the trapping of floating-point exceptions to support code
    • Can support single- or half-precision floating point
  • VFPv5
    • Implemented on Cortex-M7 when single and double-precision floating-point core option exists

NEON

NEON, the "Advanced Single Instruction Multiple Data (SIMD) Extension", is ARM's successor to the VFP coprocessor. NEON is a VFP extension which allows for efficient matrix and vector data manipulation and is commonly used in signal-processing applications. Prior to ARMv8, the ARM architecture distinguished between VFP and NEON floating-point support. NEON was not fully IEEE 754 compliant, and there were instructions that VFP supported which NEON did not. These issues have been resolved with ARMv8.

NEON sports a combined 64- and 128-bit SIMD instruction set and shares the same floating-pointer registers as used in VFP. Some devices, such as the Cortex-A8 and Cortex-A9 lines, support 128-bit vectors but operate on 64 bits at a time. Newer processors such as the Cortex-A15 can operate on 128 bits at a time.

NEON remains an optional part of the ARM architecture. However, NEON is included in all Cortex-A8 devices.

SVE

The Scalable Vector Extension (SVE) is the next-generation ARM SIMD instruction set. Currently it is only targeting ARMv8-A and the aarch64 ISA.

Compiler Options

Now that we have a high level understanding of ARM floating-point technologies, let's take a look at the compiler options we can use. I will be providing information relevant to the GNU and clang toolchains. For more information on the ARM compiler options, please see this reference documentation.

Let's dive into the two major compilation options: -mfloat-abi and -mfpu.

float-abi

The -mfloat-abi=<name> option is used to select which ARM ABI is used. This option also controls whether floating-point instructions may be used.

Here are your float-abi options:

  • soft: full software floating-point support
  • softfp: Allows use of floating-point instructions but maintains compatibility with the soft-float ABI
  • hard: Uses floating-point instructions and the floating-point ABI.

Each target architecture has a default value which is used if no option is supplied.

Note well: the two ARM ABIs (hard-float and soft-float) are not link-compatible. Your entire program must be compiled using the same ABIs. If a pre-compiled library is not supplied with your target floating-point ABI, you will need to recompile it for your own purposes.

soft

The soft option enables full software floating-point support. The compiler will not generate FPU instructions in soft mode. Instead, the compiler generates library calls to handle floating point operations. The compiler also generates prologue and epilogue functions to pass floating-point arguments (float, double) into integer registers (one for float, two fordouble`).

When using the soft option, the -mfpu flag is ignored.

softfp

The softfp option is a hybrid between hard and soft. The compiler is allowed to generate hardware floating-point instructions, but it still uses the soft-float ABI. Like with soft, the compiler generates functions to pass floating-point arguments to integer registers. Depending on the chosen FPU (-mfpu), the compiler can choose when to use emulated or hardware floating-point instructions.

Since both soft and softfp use the same soft-float ABI, code built with either option can be linked together. However, when copying data from integer to floating-point registers, a pipeline stall is incurred for every copy. This additional overhead can impact the performance of your application, since data is being copied back-and-forth from the FPU registers when using floating-point arguments.

hard

The hard option enables full hardware floating-point support. The compiler generates floating-point instructions and uses the floating-point ABI. Floating-point function arguments are passed directly into FPU registers. Since there are no function prologue or epilogue requirements, no pipeline stalls are incurred with floating-point arguments. The hard float option will provide you with the highest performance, but does limit your compiled binary to the selected FPU.

When using the hard option, you must define an FPU using -mfpu.

fpu

When using the hard or softfp float-abi, you should specify the FPU type using the -mfpu compiler flag. This flag specifies what floating-point hardware (or emulation) is available on your target architecture. When using the soft-float ABI, fpu determines the format of the floating-point values.

The -mfpu=<name> option supports the following FPU types: vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, vfpv3xd-fp16, neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16, neon-vfpv4, fp-armv8, neon-fp-armv8, and crypto-neon-fp-armv8.

Each of the FPU options corresponds to the floating-point architectures described above, and some options represent supersets. If you don't care about the specific VFP type, you can select supersets (vfp, neon). You can also generalize VFP versions as supersets (vfpv3, vfpv4).

fp16-format

The-mfp16-format=<name> option allows you to specify the format of the half-precision floating-point type (__fp16). Valid options are none, ieee, and alternative. The default option is none, meaning __fp16 is not defined.

For more information, see the GNU Half-Precision Floating Point documentation.

Performance Impacts

In general, applications relying on floating-point operations will benefit from using the hard-float ABI.

Debian has some notes on VFP performance improvements and cite a proof-of-concept Ubuntu build which noted significant performance improvements with floating-point heavy libraries.

Further Reading

Warnings: -Weverything and the Kitchen Sink

Updated: 20190627

I am a fan of warnings. By highlighting dangerous or ambiguous areas of our code, warnings provide valuable insight and advice for keeping your programs tidy. Since I like warnings so much, I try to turn on as many of them as possible, while sifting out any annoying warnings like -Wunknown-pragmas.

Many developers are familiar with some of the common warning flags like -Wall, but I'd like to give you a quick refresher on the following flags:

  • -Wall
  • -Wextra
  • -pedantic/-Wpedantic
  • -Weverything (clang only)

-Wall

Many programmers know about the -Wall flag already. Given my inclusion of other flags such as -Wextra and -Weverything, I hope it is clear that -Wall does not actually ALL the warning flags. Regardless, enabling -Wall in your code will provide you with a decent amount of warning coverage and boost your program's resiliency.

According to the GCC manual, the -Wall flag "enables all the warnings about constructions that some users consider questionable, and that are easy to avoid (or modify to prevent the warning), even in conjunction with macros. This also enables some language-specific warnings described in C++ Dialect Options and Objective-C and Objective-C++ Dialect Options."

Here is a full list of the 48 flags enabled by -Wall:

  • -Waddress
  • -Warray-bounds=1 (only with -O2)
  • -Wbool-compare
  • -Wbool-operation
  • -Wc++11-compat
  • -Wc++14-compat
  • -Wcatch-value (C++ and Objective-C++ only)
  • -Wchar-subscripts
  • -Wcomment
  • -Wduplicate-decl-specifier (C and Objective-C only)
  • -Wenum-compare (in C/ObjC; this is on by default in C++)
  • -Wformat
  • -Wint-in-bool-context
  • -Wimplicit (C and Objective-C only)
  • -Wimplicit-int (C and Objective-C only)
  • -Wimplicit-function-declaration (C and Objective-C only)
  • -Winit-self (only for C++)
  • -Wlogical-not-parentheses
  • -Wmain (only for C/ObjC and unless -ffreestanding)
  • -Wmaybe-uninitialized
  • -Wmemset-elt-size
  • -Wmemset-transposed-args
  • -Wmisleading-indentation (only for C/C++)
  • -Wmissing-braces (only for C/ObjC)
  • -Wnarrowing (only for C++)
  • -Wnonnull
  • -Wnonnull-compare
  • -Wopenmp-simd
  • -Wparentheses
  • -Wpointer-sign
  • -Wreorder
  • -Wreturn-type
  • -Wsequence-point
  • -Wsign-compare (only in C++)
  • -Wsizeof-pointer-div
  • -Wsizeof-pointer-memaccess
  • -Wstrict-aliasing
  • -Wstrict-overflow=1
  • -Wswitch
  • -Wtautological-compare
  • -Wtrigraphs
  • -Wuninitialized
  • -Wunknown-pragmas
  • -Wunused-function
  • -Wunused-label
  • -Wunused-value
  • -Wunused-variable
  • -Wvolatile-register-var

-Wextra

After seeing the list of warnings provided by -Wall, you may be wondering why you need any others. -Wextra provides warnings that are helpful but much more pedantic, covering topics such as empty function bodies, unused parameters, and sign mismatches in comparisons. These warnings are often viewed as a nuisance, but they also help eliminate bad coding styles and point out potential bugs (maybe you did intend to use that parameter).

Here's the full list of flags enabled by -Wextra:

  • -Wclobbered
  • -Wempty-body
  • -Wignored-qualifiers
  • -Wimplicit-fallthrough=3
  • -Wmissing-field-initializers
  • -Wmissing-parameter-type (C only)
  • -Wold-style-declaration (C only)
  • -Woverride-init
  • -Wsign-compare (C only)
  • -Wtype-limits
  • -Wuninitialized
  • -Wshift-negative-value (in C++03 and in C99 and newer)
  • -Wunused-parameter (only with -Wunused or -Wall)
  • -Wunused-but-set-parameter (only with -Wunused or -Wall)

Enabling -Wextra also enables warnings for the following conditions:

  • A pointer is compared against integer zero with <, data-preserve-html-node="true" <=, data-preserve-html-node="true" >, or >=.
  • (C++ only) An enumerator and a non-enumerator both appear in a conditional expression.
  • (C++ only) Ambiguous virtual bases.
  • (C++ only) Subscripting an array that has been declared register.
  • (C++ only) Taking the address of a variable that has been declared register.
  • (C++ only) A base class is not initialized in the copy constructor of a derived class.

-Wpedantic

-Wpedantic takes our warnings even further. The -pedantic set contains "all the warnings demanded by strict ISO C and ISO C++; reject all programs that use forbidden extensions, and some other programs that do not follow ISO C and ISO C++. For ISO C, follows the version of the ISO C standard specified by any -std option used." -Wpedantic also rejects certain GNU extensions and C/C++ features that are not ISO-compliant.

The GNU manual notes that many will try to use -Wpedantic to check for ISO C conformance, but keep in mind: -Wpedantic only checks for non-ISO practices for which diagnostics are required or have already been added. Even so, -Wpedantic is a useful flag if you are aiming for ISO C/C++ conformance. Primarily, I enable this flag temporarily to find areas where my programs can be improved.

Clang: -Weverything

Clang helpfully provides a flag called -Weverything. Unlike -Wall, the -Weverything flag really will enable all warnings. Literally every warning in clang.

Turning -Weverything can be an eye-opening experience, even for those who religiously squash warnings. I often turn on -Weverything temporarily to review any of the less-common warnings and see what's worth fixing in my code base. I often discover new warning flags this way.

You will be annoyed at new warnings popping up in your CI system (and potentially causing build failures) every time there's a toolchain update. I don't recommend using this flag in your production build rules.

Turning On Specific Warnings

-Wall and -Wextra provide a very comprehensive set of warnings, but many developers are picky about the warnings they want to deal with in their projects.

Rather than cause strife by turning on more warnings than your team can tolerate, I recommend an alternative approach: start with -Wall (something most developers can stomach) and specifically enable warnings that will benefit your team.

You can enable specific warnings by combining the -W prefix with the warning name, such as:

-Wmissing-prototypes
-Wformat-security

This approach allows you to enable valuable warnings in your project without a flood of other minor warnings that come with enabling something like -Wextra.

Turning Off Specific Warnings

Similar to using the -W prefix with a warning name to enable a warning, you can use the -Wno- prefix to disable a warning explicitly. For example:

-Wno-deprecated-declarations
-Wno-unused-parameter

Disabling warnings explicitly can be useful in cases where the warning behavior cannot be suppressed and is intentional.

Clang: Locally Disabling Warnings

Clang provides even further granular control over disabling warnings using the diagnostic pragma. You can disable warnings over a small region of code:

#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wconversion"
    serverAddress->ss_family = host->h_addrtype;
#pragma clang diagnostic pop

I often use this method for including external libraries and headers that have warnings which I won't be fixing.

Isn't -Wall Enough?

Some people struggle even with enabling -Wall, as some of the constructions that cause warnings can be hard to avoid or suppress. At a minimum, we recommend -Wall -Wextra.

Suck it up, it's better to fix these warnings.

Further Reading

Change Log

  • 20190627:
    • Added some clarifying notes about -Weverything

-Werror is Not Your Friend

I have never quite understood the obsession with the -Werror compiler flag. I regularly come across projects with the flag enabled, and it's not uncommon for me to fend off rabid developers who want the flag enabled in projects I work on.

In case you have been living under a rock, -Werror is a compiler flag that causes all warnings to be treated as build errors. On the surface, the stated motivation behind enabling -Werror are benevolent. Developers who enable -Werror are making a statement: we care about our code base, and we won't accept warnings here. I also maintain a 0-warning policy for my projects, and I hate when developers ignore warnings. I understand the motivation for enabling the -Werror flag.

However, from the project maintenance perspective, -Werror is not your friend. I am always frustrated when I find a project with -Werror, because inevitably my first clean build of the project fails due to a spectacular mess of warnings. If I made no changes to the source code, why the hell is not not compiling?

-Werror creates a project dependency on a specific compiler version. Even worse, this toolchain dependency is often not recognized by the development team and is therefore not noted anywhere. I need to scour the web to find the secret dependency link, or I need to start hacking up the project to get the build to finish. Is that really the experience you want your consumers to have when they use your project?

-Werror lays the groundwork for maintenance headaches. When a new compiler version is released, new warnings are added or other risk areas are discovered. These new warnings will now cause your previously working build to fail, often for no good reason. Since many developers have the "never update" mindset, these new warnings go unnoticed until someone on the team eventually updates. These failures are often localized rather than systematic, so the team as a whole tends to overlook the effect of -Werror:

  • Your build server doesn't work since the server software was updated, causing your build guru to spend time investigating and rolling back software
  • A single developer updated and now must assume the burden of fixing new warnings before resuming the actual work
  • Your new hire can't get your software compiling, and time is wasted finding out that it's the toolchain version that matters

Furthermore, there are lots of warnings that don't need to cause build failures, such as -Wunknown-pragmas. I am in the habit of using #pragma mark in my projects to provide nicer editor interactions. If I use an older GNU toolchain then #pragma mark is unrecognized and generates a warning - but it doesn't affect my final binary at all!

To get around issues like that, now you need to start disabling individual errors that you don't want: -Wno-error=unknown-pragmas. You have to maintain these settings for all new benign warnings that get added.

I don't say all of this to support tolerating warnings in your project. In my projects, I fix all warnings and continually drive the teams I work with to get to 0 warnings. My Jenkins builds all have a warning graph so I can see the warning trend over time, when they are introduced, and who regularly introduces them.

Rather than having all warnings turned into errors, I think that warnings that lead to major problems or are often ignored should be selectively promoted into errors. You can do this by specifying Werror=warning-name, which will cause that specific warning name (e.g. unknown-pragmas) to generate an error if it is encountered.

For example, a warning that I promote to an error is -Wreturn-type. This warning seems innocuous on the surface, but you can get into a dangerous situatione easily:

Missing return statement in function with return expected
aws.c:158:1: warning: control reaches end of non-void function [-Wreturn-type]

If your function should return a value but does not, your compiler is going to start picking up random garbage as the return value than the value you intended, leading to weird behavior and tricky bugs. Definitely worth being an error!

If you're still convined that you need to use -Werror, I suggest that you wire up a way to turn this behavior off, such as a make variable. Then, to disable it, developers can simply run:

$ make all WARNINGS_AS_ERRORS=n

This allows you to keep -Werror enabled by default but also enables developers from having to hack up your project if they are using a newer/older toolchain version with different warnings.

Before you enable -Werror on your projects, make sure that you really want to sign up for the maintenace headaches that come with it. You can utilize better strategies instead:

  • Promote specific warnings to errors
  • Track and drive down warning count using build metrics and developer feedback
  • Locally/globally disable benign warnings that you don't need to worry about in your project (e.g. -Wunknown-pragmas)

If you must enable -Werror, at least provide an easy method to disable the -Werror behavior.