Demystifying ARM Floating Point Compiler Options

When I first started bringing up new ARM platforms, I was pretty confused by the various floating point options such as -mfloat-abi=softfp or -mfpu=fpv4-sp-d16. I imagine this is confusing to other developers as well, so I'd like to share my ARM floating-point cheat sheet with the world.

An Overview of the ARM Floating-Point Architecture

Before we dive into compiler options, there are a few ARM floating-point details we should familiarize ourselves with: the ARM EABI, VFP, and NEON.


An ABI is a specification which defines the rules that a generated program must follow to work with a specific platform or interface. The ARM EABI defines the rules for an ARM platform, and your compiler will build your program according to those rules.

The ARM EABI specification defines two incompatible ABIs: one which uses floating-point registers for function arguments, and another which does not. If we are not using hardware floating-point operations, we can simply build our program without using the floating-point compatible ABI.

Since ARM defines a standard floating-point instruction set, we can still utilize the floating-point ABI even if our chip does not support the actual hardware. If floating-point hardware is not present, the instructions will be trapped and executed by a floating-point emulation module instead. The only real difference in functionality is slower execution speed when using software emulation.

Since the ABI defines interfaces for our programs, we must compile and link all of our components and libraries using the same ABI.

Vector Floating Point (VFP)

Vector Floating Point (VFP) is the name for ARM's floating-point extension. Prior to ARMv8, VFP was implemented as a coprocessor extension. The VFP coprocessor supports both single and double-precision floating point operations according to the IEEE 754 standard. For practical purposes, VFP is not useful for vector operations and should be considered a normal scalar floating-point unit (FPU). VFP has been replaced with NEON as of ARMv8.

The VFP extensions are optional parts of the ARM architecture, though the majority of Cortex-A processors do provide a floating-point unit. Some Cortex-A8 devices may utilize a reduced VFPLite module instead of a full VFP module. This VFPLite module requires roughly a 10x increase in clock cycles per floating-point operation.

Some devices such as the ARM Cortex-A8 have a cut-down VFPLite module instead of a full VFP module, and require roughly ten times more clock cycles per float operation.[81]

VFP Versions

Here's a high-level summary of the different VFP versions that have been released throughout the years.

  • VFPv1
    • Obsoleted by ARM
  • VFPv2
    • 16 64-bit FPU registers
    • Optional extension to the ARM instruction set in the ARMv5TE, ARMv5TEJ, ARMv6, and ARMv6K architectures
    • Optional extension to the ARM and Thumb instruction set in the ARMv6T2 architecture
    • Supports standard FPU arithmetic (add, sub, neg, mul, div), full square root
  • VFPv3
    • Backwards compatible with VFPv2, except that it cannot trap floating-point exceptions
    • Adds VCVT instructions to convert between scalar, float and double
    • Adds immediate mode to VMOV such that constants can be loaded into FPU registers.
    • VFPv3-D32
      • 32 64-bit FPU registers
      • Implemented on most Cortex-A8 and A9 ARMv7 processors
    • VFPv3-D16
      • 16 64-bit FPU registers
      • Implemented on Cortex-R4 and R5 processors and the Tegra 2 (Cortex-A9).
    • VFPv3-F16
      • Uncommon
      • Supports IEEE754-2008 half-precision (16-bit) floating point as a storage format
    • VFPv3U
      • A variant of VFPv3 that supports the trapping of floating-point exceptions to support code.
      • Can support single- or half-precision floating point
  • VFPv4
    • Built on VFPv3
    • Adds half-precision support as a storage format
    • Adds fused multiply-accumulate instructions
    • VFPv4-D32
      • 32 64-bit FPU registers
      • Implemented on the Cortex-A12 and A15 ARMv7 processors
      • Cortex-A7 optionally has VFPv4-D32 (in the case of an FPU with NEON)
    • VFPv4-D16
      • 16 64-bit FPU registers
      • Implemented on Cortex-A5 and A7 processors (in case of an FPU without NEON)
    • VFPv4U
      • A variant of VFPv4 that supports the trapping of floating-point exceptions to support code
    • Can support single- or half-precision floating point
  • VFPv5
    • Implemented on Cortex-M7 when single and double-precision floating-point core option exists


NEON, the "Advanced Single Instruction Multiple Data (SIMD) Extension", is ARM's successor to the VFP coprocessor. NEON is a VFP extension which allows for efficient matrix and vector data manipulation and is commonly used in signal-processing applications. Prior to ARMv8, the ARM architecture distinguished between VFP and NEON floating-point support. NEON was not fully IEEE 754 compliant, and there were instructions that VFP supported which NEON did not. These issues have been resolved with ARMv8.

NEON sports a combined 64- and 128-bit SIMD instruction set and shares the same floating-pointer registers as used in VFP. Some devices, such as the Cortex-A8 and Cortex-A9 lines, support 128-bit vectors but operate on 64 bits at a time. Newer processors such as the Cortex-A15 can operate on 128 bits at a time.

NEON remains an optional part of the ARM architecture. However, NEON is included in all Cortex-A8 devices.


The Scalable Vector Extension (SVE) is the next-generation ARM SIMD instruction set. Currently it is only targeting ARMv8-A and the aarch64 ISA.

Compiler Options

Now that we have a high level understanding of ARM floating-point technologies, let's take a look at the compiler options we can use. I will be providing information relevant to the GNU and clang toolchains. For more information on the ARM compiler options, please see this reference documentation.

Let's dive into the two major compilation options: -mfloat-abi and -mfpu.


The -mfloat-abi=<name> option is used to select which ARM ABI is used. This option also controls whether floating-point instructions may be used.

Here are your float-abi options:

  • soft: full software floating-point support
  • softfp: Allows use of floating-point instructions but maintains compatibility with the soft-float ABI
  • hard: Uses floating-point instructions and the floating-point ABI.

Each target architecture has a default value which is used if no option is supplied.

Note well: the two ARM ABIs (hard-float and soft-float) are not link-compatible. Your entire program must be compiled using the same ABIs. If a pre-compiled library is not supplied with your target floating-point ABI, you will need to recompile it for your own purposes.


The soft option enables full software floating-point support. The compiler will not generate FPU instructions in soft mode. Instead, the compiler generates library calls to handle floating point operations. The compiler also generates prologue and epilogue functions to pass floating-point arguments (float, double) into integer registers (one for float, two fordouble`).

When using the soft option, the -mfpu flag is ignored.


The softfp option is a hybrid between hard and soft. The compiler is allowed to generate hardware floating-point instructions, but it still uses the soft-float ABI. Like with soft, the compiler generates functions to pass floating-point arguments to integer registers. Depending on the chosen FPU (-mfpu), the compiler can choose when to use emulated or hardware floating-point instructions.

Since both soft and softfp use the same soft-float ABI, code built with either option can be linked together. However, when copying data from integer to floating-point registers, a pipeline stall is incurred for every copy. This additional overhead can impact the performance of your application, since data is being copied back-and-forth from the FPU registers when using floating-point arguments.


The hard option enables full hardware floating-point support. The compiler generates floating-point instructions and uses the floating-point ABI. Floating-point function arguments are passed directly into FPU registers. Since there are no function prologue or epilogue requirements, no pipeline stalls are incurred with floating-point arguments. The hard float option will provide you with the highest performance, but does limit your compiled binary to the selected FPU.

When using the hard option, you must define an FPU using -mfpu.


When using the hard or softfp float-abi, you should specify the FPU type using the -mfpu compiler flag. This flag specifies what floating-point hardware (or emulation) is available on your target architecture. When using the soft-float ABI, fpu determines the format of the floating-point values.

The -mfpu=<name> option supports the following FPU types: vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, vfpv3xd-fp16, neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16, neon-vfpv4, fp-armv8, neon-fp-armv8, and crypto-neon-fp-armv8.

Each of the FPU options corresponds to the floating-point architectures described above, and some options represent supersets. If you don't care about the specific VFP type, you can select supersets (vfp, neon). You can also generalize VFP versions as supersets (vfpv3, vfpv4).


The-mfp16-format=<name> option allows you to specify the format of the half-precision floating-point type (__fp16). Valid options are none, ieee, and alternative. The default option is none, meaning __fp16 is not defined.

For more information, see the GNU Half-Precision Floating Point documentation.

Performance Impacts

In general, applications relying on floating-point operations will benefit from using the hard-float ABI.

Debian has some notes on VFP performance improvements and cite a proof-of-concept Ubuntu build which noted significant performance improvements with floating-point heavy libraries.

Further Reading

Warnings: -Weverything and the Kitchen Sink

Updated: 20190627

I am a fan of warnings. By highlighting dangerous or ambiguous areas of our code, warnings provide valuable insight and advice for keeping your programs tidy. Since I like warnings so much, I try to turn on as many of them as possible, while sifting out any annoying warnings like -Wunknown-pragmas.

Many developers are familiar with some of the common warning flags like -Wall, but I'd like to give you a quick refresher on the following flags:

  • -Wall
  • -Wextra
  • -pedantic/-Wpedantic
  • -Weverything (clang only)


Many programmers know about the -Wall flag already. Given my inclusion of other flags such as -Wextra and -Weverything, I hope it is clear that -Wall does not actually ALL the warning flags. Regardless, enabling -Wall in your code will provide you with a decent amount of warning coverage and boost your program's resiliency.

According to the GCC manual, the -Wall flag "enables all the warnings about constructions that some users consider questionable, and that are easy to avoid (or modify to prevent the warning), even in conjunction with macros. This also enables some language-specific warnings described in C++ Dialect Options and Objective-C and Objective-C++ Dialect Options."

Here is a full list of the 48 flags enabled by -Wall:

  • -Waddress
  • -Warray-bounds=1 (only with -O2)
  • -Wbool-compare
  • -Wbool-operation
  • -Wc++11-compat
  • -Wc++14-compat
  • -Wcatch-value (C++ and Objective-C++ only)
  • -Wchar-subscripts
  • -Wcomment
  • -Wduplicate-decl-specifier (C and Objective-C only)
  • -Wenum-compare (in C/ObjC; this is on by default in C++)
  • -Wformat
  • -Wint-in-bool-context
  • -Wimplicit (C and Objective-C only)
  • -Wimplicit-int (C and Objective-C only)
  • -Wimplicit-function-declaration (C and Objective-C only)
  • -Winit-self (only for C++)
  • -Wlogical-not-parentheses
  • -Wmain (only for C/ObjC and unless -ffreestanding)
  • -Wmaybe-uninitialized
  • -Wmemset-elt-size
  • -Wmemset-transposed-args
  • -Wmisleading-indentation (only for C/C++)
  • -Wmissing-braces (only for C/ObjC)
  • -Wnarrowing (only for C++)
  • -Wnonnull
  • -Wnonnull-compare
  • -Wopenmp-simd
  • -Wparentheses
  • -Wpointer-sign
  • -Wreorder
  • -Wreturn-type
  • -Wsequence-point
  • -Wsign-compare (only in C++)
  • -Wsizeof-pointer-div
  • -Wsizeof-pointer-memaccess
  • -Wstrict-aliasing
  • -Wstrict-overflow=1
  • -Wswitch
  • -Wtautological-compare
  • -Wtrigraphs
  • -Wuninitialized
  • -Wunknown-pragmas
  • -Wunused-function
  • -Wunused-label
  • -Wunused-value
  • -Wunused-variable
  • -Wvolatile-register-var


After seeing the list of warnings provided by -Wall, you may be wondering why you need any others. -Wextra provides warnings that are helpful but much more pedantic, covering topics such as empty function bodies, unused parameters, and sign mismatches in comparisons. These warnings are often viewed as a nuisance, but they also help eliminate bad coding styles and point out potential bugs (maybe you did intend to use that parameter).

Here's the full list of flags enabled by -Wextra:

  • -Wclobbered
  • -Wempty-body
  • -Wignored-qualifiers
  • -Wimplicit-fallthrough=3
  • -Wmissing-field-initializers
  • -Wmissing-parameter-type (C only)
  • -Wold-style-declaration (C only)
  • -Woverride-init
  • -Wsign-compare (C only)
  • -Wtype-limits
  • -Wuninitialized
  • -Wshift-negative-value (in C++03 and in C99 and newer)
  • -Wunused-parameter (only with -Wunused or -Wall)
  • -Wunused-but-set-parameter (only with -Wunused or -Wall)

Enabling -Wextra also enables warnings for the following conditions:

  • A pointer is compared against integer zero with <, data-preserve-html-node="true" <=, data-preserve-html-node="true" >, or >=.
  • (C++ only) An enumerator and a non-enumerator both appear in a conditional expression.
  • (C++ only) Ambiguous virtual bases.
  • (C++ only) Subscripting an array that has been declared register.
  • (C++ only) Taking the address of a variable that has been declared register.
  • (C++ only) A base class is not initialized in the copy constructor of a derived class.


-Wpedantic takes our warnings even further. The -pedantic set contains "all the warnings demanded by strict ISO C and ISO C++; reject all programs that use forbidden extensions, and some other programs that do not follow ISO C and ISO C++. For ISO C, follows the version of the ISO C standard specified by any -std option used." -Wpedantic also rejects certain GNU extensions and C/C++ features that are not ISO-compliant.

The GNU manual notes that many will try to use -Wpedantic to check for ISO C conformance, but keep in mind: -Wpedantic only checks for non-ISO practices for which diagnostics are required or have already been added. Even so, -Wpedantic is a useful flag if you are aiming for ISO C/C++ conformance. Primarily, I enable this flag temporarily to find areas where my programs can be improved.

Clang: -Weverything

Clang helpfully provides a flag called -Weverything. Unlike -Wall, the -Weverything flag really will enable all warnings. Literally every warning in clang.

Turning -Weverything can be an eye-opening experience, even for those who religiously squash warnings. I often turn on -Weverything temporarily to review any of the less-common warnings and see what's worth fixing in my code base. I often discover new warning flags this way.

You will be annoyed at new warnings popping up in your CI system (and potentially causing build failures) every time there's a toolchain update. I don't recommend using this flag in your production build rules.

Turning On Specific Warnings

-Wall and -Wextra provide a very comprehensive set of warnings, but many developers are picky about the warnings they want to deal with in their projects.

Rather than cause strife by turning on more warnings than your team can tolerate, I recommend an alternative approach: start with -Wall (something most developers can stomach) and specifically enable warnings that will benefit your team.

You can enable specific warnings by combining the -W prefix with the warning name, such as:


This approach allows you to enable valuable warnings in your project without a flood of other minor warnings that come with enabling something like -Wextra.

Turning Off Specific Warnings

Similar to using the -W prefix with a warning name to enable a warning, you can use the -Wno- prefix to disable a warning explicitly. For example:


Disabling warnings explicitly can be useful in cases where the warning behavior cannot be suppressed and is intentional.

Clang: Locally Disabling Warnings

Clang provides even further granular control over disabling warnings using the diagnostic pragma. You can disable warnings over a small region of code:

#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wconversion"
    serverAddress->ss_family = host->h_addrtype;
#pragma clang diagnostic pop

I often use this method for including external libraries and headers that have warnings which I won't be fixing.

Isn't -Wall Enough?

Some people struggle even with enabling -Wall, as some of the constructions that cause warnings can be hard to avoid or suppress. At a minimum, we recommend -Wall -Wextra.

Suck it up, it's better to fix these warnings.

Further Reading

Change Log

  • 20190627:
    • Added some clarifying notes about -Weverything