Simple Fixed-Point Conversion in C

Operating on fixed-point numbers is a common embedded systems task. Our microcontrollers may not have floating-point support, our sensors may provide data in fixed-point formats, or we may want to use fixed-point mathematics control a value’s range and precision.

There numerous fixed-point mathematics libraries around the internet, such as fixed_point or the Compositional Numeric Library for C++. If you are looking for a reliable solution to utilize long-term, spend some time to review these libraries to identify candidates for integration.

However, we don’t always have the time required to select a library. Perhaps you just need to convert a fixed-point number for prototyping purposes, or you need to do a quick implementation for Friday’s demo.

Below is a quick-and-dirty approach for converting between fixed-point and floating-point numbers. If you need to handle mathematical operations on fixed-point numbers, look for a library to integrate.

Lossy Conversion of Fixed-Point Numbers

First, we need to select our fixed-point type. For this example, we’ll be using 16-bit fixed point numbers, in an 11.5 format (11 integral bits, 5 fractional bits):

/// Fixed-point Format: 11.5 (16-bit)
typedef uint16_t fixed_point_t;

We’ll make a quick macro for the number of fractional bits:

#define FIXED_POINT_FRACTIONAL_BITS 5

Then we’ll define two conversion functions:

/// Converts 11.5 format -> double
double fixed_to_double(fixed_point_t input);

/// Converts double to 11.5 format
fixed_point_t double_to_fixed(double input);

Now that we’ve gotten the groundwork out of the way, we’ll write our fixed-point to floating-point conversion function. Converting from fixed-point to floating-point is straightforward. We take the input value and divide it by (2fractional_bits), putting the result into a double:

inline double fixed_to_double(fixed_point_t input)
{
    return ((double)input / (double)(1 << FIXED_POINT_FRACTIONAL_BITS));
}

To convert from floating-point to fixed-point, we follow this algorithm:

  1. Calculate x = floating_input * 2^(fractional_bits)
  2. Round x to the nearest whole number (e.g. round(x))
  3. Store the rounded x in an integer container

Using the algorithm above, we would implement our float-to-fixed conversion as follows:

inline fixed_point_t double_to_fixed(double input)
{
    return (fixed_point_t)(round(input * (1 << FIXED_POINT_FRACTIONAL_BITS)));
}

However, not all of our embedded systems utilize the standard library, and perhaps round() is not supplied. You can also just rely on truncation when converting to an integer. There will be some precision loss, but for a quick-and-dirty solution that may be acceptable:

inline fixed_point_t double_to_fixed(double input)
{
    return (fixed_point_t)(input * (1 << FIXED_POINT_FRACTIONAL_BITS));
}

If you need to support multiple fixed-point styles, you can provide interfaces for various integer widths and add the fractional bit count as an input argument:

// Convert 16-bit fixed-point to double
double fixed16_to_double(fixed_point_t input, uint8_t fractional_bits)
{
    return ((double)input / (double)(1 << fractional_bits));
}

// Equivalent of our 11.5 conversion function above
double r = fixed16_to_double(input, 5);

There you have it: quick-and-dirty fixed-point conversion methods.

Further Reading

6 Replies to “Simple Fixed-Point Conversion in C”

    1. Not surprised, it’s quite simple. Although IEEE 754 describes floating point, which is handled by the compiler in this case. Perhaps I’m missing something obvious! What needs to be modified to make it compliant?

  1. If you define fixed_point_t as a signed 16-bit integer (int16_t not uint16_t) the code can convert negative floating point values to fixed-point integer (and back again).

  2. The conversion functions above seems to work only if the fixed point format has integer part, i.e Qm.n format, where m is non zero. If m is zero (1 sign bit, and all other bits are fractional bits) then the fixed_to_double() results in the sign being inverted.

  3. The conversion functions above seems to work only if the fixed point format has integer part, i.e Qm.n format, where m is non zero. If m is zero (1 sign bit, and all other bits are fractional bits) then the fixed_to_double() results in the sign being inverted.

    I cannot reproduce this in the tests with newly added test cases. Using a 16-bit fixed-point value of 0x1F to represent UQ11.5 or Q10.5, I get the correct conversion with the correct sign: 0.968750. See test cases here: https://github.com/embeddedartistry/embedded-resources/blob/master/examples/c/fixed_point/fixed_point_tests.c

Share Your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.