Firmware Update Support

Firmware update support is an essential capability for contemporary embedded devices, regardless of whether or not they are connected to the internet on a regular basis. Firmware updates are used to add new capabilities after launch, correct errors, and address security vulnerabilities. Firmware update support also ensures that devices can remain useful for a longer period of time, as the development team can respond to changes in the operating environment and customer expectations.

Supporting firmware updates for your system requires a number of supporting device-side and infrastructure capabilities. Update reliability is also significantly aided by adopting supporting processes in your organization.

Firmware updates are a good example of Software Engineering as applied to embedded systems. We have to carefully design the update mechanism, account for failure modes, and make tradeoffs based on our system’s design goals and constraints.

Table of Contents:

  1. Device Capabilities
  2. Infrastructure Capabilities
  3. Supporting Processes
  4. Accounting for the Possibility of Failure
  5. Sub-topics and Variations
  6. Case Studies of Update-related Problems and Vulnerabilities
  7. Related Blog Posts
  8. References

Device Capabilities

Required

  • Device software is split into multiple images.
    • At a minimum, you will need to split the device into a Bootloader and Application.
    • Software may be further refined depending on reliability and update schemes, such as into a Loader, a distinct Updater, or a Fallback Image
  • Fail-safe support in case of an update failure or bad update
  • An integrity check, ensuring that the provided binary has not been corrupted during checksum
  • The device can report its software version
  • The update mechanism is resilient against power and network loss during the update process. This is necessary to avoid bricking devices!
    • Ideally, updates will be “atomic” and
  • A method for receiving firmware updates (whether via USB connection, SD card, or Over-the-Air)
  • Code signing support, which is used to verify both provenance and integrity of an update
  • Support for rolling back to a previous version on command (many implementations only allow you to increase the version)
  • Version data storage, schemas, and communication protocols to support data migrations in response to an update process
  • Ability to specify pre- and post-update actions (e.g., a script) in addition to the firmware update
    • This can be extremely useful for supporting actions like data migrations or file removals, which might need to be executed after an update has completed.
    • This can be useful for implementing post-update sanity checks to make sure that the update processes completed successfully. If the checks do not pass, roll back to the previous version.

Infrastructure Capabilities

Required

  • Minimally, produce a checksum that can be used to verify the binary was transferred without error. Ideally, code signing will be used instead, as you can also verify that the update is coming from an authorized source.
  • Cohort binning of devices enables you to control which devices receive specific firmware updates.
    • This is useful for deploying beta builds to an interested population of beta testers.
    • This is also a common way of implementing staged rollouts.
  • Staged rollouts of firmware updates provides a safer update mechanism than a “deploy to everyone” approach. You start with a small population of devices to make sure that the update succeeds and does not introduce significant new issues. If everything looks good, you continue to roll out the update to increasingly large segments of your population.
  • Ability to roll back firmware to a previous version in the event of a bad update
  • Check-in and heartbeat messages are useful for determining:
    • Whether or not an update was successful (a device will check in with a new firmware version)
    • The distribution of versions throughout the fleet
      • Often, teams are surprised to realize that there’s a distribution of versions, even when a new OTA update is released. Also, you will find that some devices never update.

Supporting Processes

  • Exclusive use of the customer-facing firmware update mechanism to ensure its reliability
    • Many teams leave OTA updates to the end of the project, for example, which is far too late in the process to ensure reliability. A better approach is implementing OTA updates first, and then requiring all development and internal testing to use OTA updates rather than JTAG or USB. This way, the update mechanisms receive significant mileage, and the kinks are worked out before the product is released to customers.
  • Significant testing of the update process, especially with the use of fault injection to ensure that fallbacks and fail-safes work as intended
  • Version Data Storage, Protocols, and Schemas
    • Data migrations are a common challenge that you will need to deal with when updating devices. For example, you might update the “device settings” layout, change a communication protocol, or update an sqlite database schema.
    • Without versioning these items, you cannot safely perform a migration as part of the firmware update process.

Accounting for the Possibility of Failure

Firmware updates can go wrong in many ways. It is important to ensure that your update system is resilient to all of these failures. After all, if you brick devices, you cannot remotely fix them.

To protect against data corruption during the transfer, you should compare the received contents against a checksum to ensure the integrity of the update. Ideally, however, you will use code signing to provide both an integrity check and an assurance that the build comes from an approved source.

Updates should, ideally, be atomic: either the whole update is applied, or no update occurs at all. This is especially important in guarding against corruption due to loss of network connectivity or loss of power during an update.The most common approach to atomic updates is to have dual application partitions in device storage, which we will call “A” and “B”. This approach is akin to the common “double buffering” pattern.

  • The bootloader will boot from partition A, which is currently active.
  • When an update is received, the contents will be placed into partition B.
    • If the update process fails for any reason, the bootloader will continue to boot from partition A.
    • If the update succeeds, the bootloader will boot from partition B.
      • If there is a problem identified during the boot process, this can be indicated to the bootloader, which can automatically fall back to partition A
  • When the next update is received, it will be placed into partition A.

However, memory and storage constraints can make atomic updates difficult to achieve with many embedded devices.

  • RAM may be sufficiently limited such that an entire update payload cannot be received before being applied, but must be streamed to flash instead.
    • This means that the contents of flash could be overwritten with data before it can be determined that the checksum or signature matches the expected value.
  • Flash may be sufficiently limited such that there is not space for a bootloader, two complete applications, and other artifacts.

In cases where the dual partition approach cannot be used, we will create a “fallback” application, which effectively takes the place of the second partition:

  • Updates will always be placed into the main application slot.
  • If an update fails, or some problem is identified during the application boot process, the bootloader will automatically boot into the fallback application.
  • The fallback application contains only the minimal amount of support to configure the processor and its components so that it can connect to the server and request a new update. This allows it to be much smaller than a complete second application.

This fallback firmware must be heavily tested so that it can be trusted to restore a system to a working state in the event of an update failure. Ideally, it will not need any updates once the device is deployed, as there is no fallback in place when the fallback firmware update fails.

If you cannot support any of these schemes with your current resources, you will need to add more storage or avoid OTA updates completely. You run the risk of a power or network failure completely disabling your devices in the field. Wired updates are less sensitive here, as long as you provide a tool that can be used to re-flash the device from a corrupted state (e.g., a DFU utility).

Sub-topics and Variations

  • Another vulnerability in the LPC55S69 ROM / Oxide describes a problem in the LPC55S69 In-System Programming code for the signing mechanism, which allows an attacker to gain non-persistent code execution with a carefully crafted update regardless of whether the update is signed

References

Device Metrics

14 July 2022 by Phillip Johnston • Last updated 18 July 2022Device metric collection is a device-side capability that gives us insight into the operation of a system. Metrics are useful for debugging, data collection, improvement programs, and preventative maintenance activities. A metric is a quantifiable measure that is used to track and assess the status of a specific process or a system as a whole. Unlike logs or crash dumps, metrics are primarily key-value pairs of data. These pairs are typically collected and reported over varying intervals of time. ““ Further reading If you want to learn more about metrics, …

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Persistent Data Storage

14 July 2022 by Phillip Johnston • Last updated 18 July 2022Persistent device storage is a key device-side capability. Persistent storage can be either a dedicated hardware device (e.g., processor NVRAM, dedicated EEPROM, dedicate SPI-NOR), a reserved region in a shared device (e.g., specific region of flash memory), or a file (e.g., on an SD card or hard drive). Persistent storage can be used to track device-specific information (e.g., serial number), user & developer configuration settings, and calibration data. This type of data is usually kept in a key-value store or on-device database. Key debugging information should also be saved …

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Check-in and Heartbeat Messages

14 July 2022 by Phillip Johnston • Last updated 18 July 2022Check-in and heartbeat messages are a fundamental device-side and infrastructure capability that enable you to remotely observe deployed systems. Devices should send periodic messages to a monitoring system (usually a remote server) to indicate that they are alive and well. Messages should be sent on boot (check-in messages) and at a regular interval during operation (heartbeat messages). Aside from indicating that a device is alive and well, these messages can be used to transmit information about the device. For example, check-in messages can include basic device information, such as …

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Reset Reason Detection

14 July 2022 by Phillip Johnston • Last updated 12 December 2023Reset reason detection is a device-side capability that gives us insight into the cause of a system (re)boot. Many processors have a status register that indicates why the system (re)booted, such as “power on”, “watchdog timeout”, “brown-out”, “software reset”. On boot, this information can be read, logged for debugging purposes, and used to determine whether specific fallback behaviors will be executed (e.g., boot into a fail-safe build when an infinite processor fault loop is detected). Without this capability, it becomes difficult-to-impossible to: Detect device reset problems Determine the cause …

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

On-Device Logging

14 July 2022 by Phillip Johnston • Last updated 18 July 2022Device logging is a device-side capability that enables developers to write strings and values in a human-readable format that can be read back later to find error messages or recreate a sequence of events that occurred on the device. Without logging capabilities, debugging sealed or production units with a debugger is often difficult or impossible. Devices that are unable to be retrieved (e.g., they are in another country) are impossible to debug in a systematic way without an RMA process. Implementation Considerations A logging implementation should allow you to …

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Versioning Software

20 January 2022 by Phillip Johnston Last updated 11 February 2026Versioning software projects is important, and especially so on embedded software projects: every device is likely to have a different software version flashed on it (especially during development), and there will be a range of software versions flashed onto customer devices at any given time. Because every binary is slightly different in some way or another, in order to debug the system we need to access the proper debug symbols, addresses, and change logs for that correspond to the version we are working with. For these reasons, we need the …

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Versioning PCBs

We consider hardware versioning an essential device-side capability and process for embedded systems development. Every PCB used in an embedded system is bound to have multiple revisions throughout the product’s development cycle, and it is important that we can easily identify the revision we are currently working with – especially from within the software system.

The version number for a PCB should be incremented every time a new version is released, no matter how trivial the changes may seem.

Version Silkscreen on PCB

The PCB version should be printed on the board via silkscreen so that engineers and developers can unambiguously determine the version. Silkscreen is preferred to a sticker because stickers often come off, and a board may very well be mislabeled.

Version Listed on Schematic

At a minimum, print the PCB version on the schematic’s front page. Ideally, the version will be on every page of the schematic. This makes it quite easy for anyone on the team to check whether the schematic they are looking at refers to the PCB they have in hand.

Software-Interpretable Versioning

While the PCB silkscreen and schematic versions aid developers, they do not give the software access to a method to determine what hardware revision it is running on.

Some PCB revisions will be agnostic to software and merely involve tweaking some board design parameters. However, many revisions end up being significant enough to necessitate corresponding software changes: pinouts are changed, peripheral devices are changed, features are added/removed, etc.

In the “ideal” case, all of the will hardware can be immediately scrapped and the software will only need to support newer hardware. However, these ideal case rarely plays out. Because development hardware is often in limited supply due to its high production cost, teams often need to support multiple hardware revisions concurrently. Likewise, we may already be selling our devices, and now we need to maintain support for existing devices as well as the new variant. If we have a way to read the hardware version from software, we have the ability to support multiple variants with a single revision. We can also use the revision to access the proper firmware variant in the event that we maintain different binaries for different variants.

Frustratingly, the majority of embedded systems projects we work on still do not provide a method for the software to determine the hardware revision. This feature isn’t considered until it is too late, and there are multiple variants that need to be supported with no good way to distinguish them in software. At this point, the problem becomes one of build management: we must manually ensure that existing devices are flashed with the proper software version. This is an error-prone process

Use GPIO Encode PCB Version

The simplest and most common technique for encoding PCB revision so that software can access is by encoding a version using GPIO. The software will read the three GPIO lines and interpret them as a 3-bit binary version number. The GPIO will be configured in software with internal pull-ups. The version number in hardware is then controlled by placing resistors that connect the pin to GND.

Note

The same system works if the polarity is reversed as well. The directionality of the system and the value of the resistors are usually driven by power concerns (to minimize leakage current).

Version GPIO 2 GPIO 1 GPIO 0
1 0 0 1
2 0 1 0
3 0 1 1
4 1 0 0
5 1 0 1
6 1 1 0
7 1 1 1
8 0 0 0
Note

Some teams use 0b000 to represent version 1, and 0b111 will represent version 8, and some just make 0b000 version 0. In other cases, we have seen teams reverse the ordering such that they count down instead of up.

Three GPIO lines are often sufficient, as they provide a possible set of 8 versions before we need to reset. We have worked on teams that used only two GPIO lines due to GPIO constraints, but we have always found this insufficient: almost every system we have worked on has gone through four revisions before or after product launch. Four GPIO lines gives you much more wiggle room, but we have rarely been through that many revisions. In our opinion, it is better to re-start the version numbers once production hardware is ready, scrapping support for all development hardware at that time. Version increments after that point apply to production hardware changes.

Include High-Z as a Pin State

Most GPIO have a high-Z state, and you could leverage this to get 3 bits per pin instead of 2.

High-Z is detected in the following way:

  1. Drive the line low
  2. Convert to input (high-Z) and read
  3. Drive the line high
  4. Convert to input (high-Z) and read

If the two input readings hold different values, the pin is in high-Z state.

Use an ADC to Encode Version

Another method for supporting version numbers is to use an ADC to convert a voltage to a version. A resistor network is used to control the voltage seen at the ADC (the midpoint of the resistor network is connected to the ADC input). Versions correspond to “binned” ADC readings, representing equal divisions of the full ADC range (e.g., 4095 for a 12-bit ADC).

Version ADC Range (12-bit)
1 0-511
2 512-1023
3 1024-1535
4 1536-2047
5 2048-2559
6 2560-3071
7 3072-3583
8 3584-4095

Values for the resistor network should be chosen so that the target voltage/reading falls within the midpoint of the range specified. This scheme also necessitates using high-precision components (1%) that will not de-rate over the course of the product’s lifetime such that the version falls outside of the specified range.

Resistor values need to be appropriately selected to minimize leakage current. In addition, a second GPIO can be assigned to the versioning scheme. This pin will supply VCC to the resistor network, and it can be turned on only when the software needs to read the board version.

Note

A larger range of versions can be supported by tightening the binning on each version. However, tightening bins too much may lead to de-rated resistors causing erroneous version reads later in the product’s lifetime.

You can also combine the ADC and GPIO approaches, using one GPIO pin + one ADC pin. This would double the number of values supported by the ADC approach.

Write Hardware Version in Persistent Memory

Some processors provide user-configurable non-volatile memory. These registers can be written during manufacturing, storing the associated hardware revision directly in processor memory. You can also use a dedicated configuration EEPROM or a reserved region of flash to store this information.

Ideally, these registers/regions could be locked so that the version (and any other configuration information) cannot be accidentally overwritten.

Prototyping New Revisions

With software-readable board versioning, it becomes possible for software to be updated and tested prior to having new hardware produced. Existing boards can be reworked to match the new version as closely as possible, and that board can have its resistor network changed to the next revision. Software developers can use this reworked hardware to add support for the new revision in software.

Note

Not all changes are easily supported via rework, so some changes will need to land once the final hardware is ready. Nonetheless, we have used this strategy on multiple occasions to make significant progress ahead of receiving new hardware.

“Securing” Hardware Revision

Common practice involves stuffing resistors on traces present on the PCB, which enables easier rework. However, for teams that are concerned about CM’s “cheating” by marking old PCBs with a new revision ID, you can hard-code the board revision using the GPIO method by laying it out within the PCB (i.e., wiring the traces internally to encode the 0’s and 1’s of the PVB layout. We do not recommend this during development, as the rework method above is often quite useful, but it is suitable for production.

Further Reading

Device Command and Control Interface (Shell)

20 January 2022 by Phillip Johnston • Last updated 18 July 2022An essential device-side capability to create for our devices is a command and control interface. Most commonly, this interface is implemented as a command-line shell, but it may also be implemented in other ways (such as through an IDL language). For simplicity’s sake, we will generically refer to the command and control interface as the “shell”. Shells are useful in development for testing device behaviors without having to go through full production flows. You can create fine-tuned commands that let you test individual pieces of the system in isolation. …

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.