Use Different Keys for Development and Production

29 March 2023 by Phillip JohnstonWhen using techniques like Code Signing, you need to implement Secure Secret Storage. Your private keys cannot be leaked, or else the whole signing and verification mechanism breaks. A common failure mode with this strategy is sharing private keys for development work. Proper key management is inconvenient, especially when you want to restrict access as much as possible. But developers need to create images and test them on-device, so they need access to the private keys to sign their images. Keys end up in git repositories, passed around via email or Slack, or uploaded to …

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Store and Index Software Build Artifacts

27 March 2023 by Phillip JohnstonEmbedded device populations always have mix of software versions. This is true whether or not you are in development or whether the device has been released to customers. Because function and variable address locations will change from one build to another, you must be able to access build artifacts for all versioned builds to debug them appropriately. Without a foolproof system in place for storing and indexing artifacts, these files can go missing or be mislabeled. Indexing these artifacts is essential from the perspective of being able to find the proper files when needed. You …

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Firmware Update Support

Firmware update support is an essential capability for contemporary embedded devices, regardless of whether or not they are connected to the internet on a regular basis. Firmware updates are used to add new capabilities after launch, correct errors, and address security vulnerabilities. Firmware update support also ensures that devices can remain useful for a longer period of time, as the development team can respond to changes in the operating environment and customer expectations.

Supporting firmware updates for your system requires a number of supporting device-side and infrastructure capabilities. Update reliability is also significantly aided by adopting supporting processes in your organization.

Firmware updates are a good example of Software Engineering as applied to embedded systems. We have to carefully design the update mechanism, account for failure modes, and make tradeoffs based on our system’s design goals and constraints.

Table of Contents:

  1. Device Capabilities
  2. Infrastructure Capabilities
  3. Supporting Processes
  4. Accounting for the Possibility of Failure
  5. Sub-topics and Variations
  6. Case Studies of Update-related Problems and Vulnerabilities
  7. Related Blog Posts
  8. References

Device Capabilities

Required

  • Device software is split into multiple images.
    • At a minimum, you will need to split the device into a Bootloader and Application.
    • Software may be further refined depending on reliability and update schemes, such as into a Loader, a distinct Updater, or a Fallback Image
  • Fail-safe support in case of an update failure or bad update
  • An integrity check, ensuring that the provided binary has not been corrupted during checksum
  • The device can report its software version
  • The update mechanism is resilient against power and network loss during the update process. This is necessary to avoid bricking devices!
    • Ideally, updates will be “atomic” and
  • A method for receiving firmware updates (whether via USB connection, SD card, or Over-the-Air)
  • Code signing support, which is used to verify both provenance and integrity of an update
  • Support for rolling back to a previous version on command (many implementations only allow you to increase the version)
  • Version data storage, schemas, and communication protocols to support data migrations in response to an update process
  • Ability to specify pre- and post-update actions (e.g., a script) in addition to the firmware update
    • This can be extremely useful for supporting actions like data migrations or file removals, which might need to be executed after an update has completed.
    • This can be useful for implementing post-update sanity checks to make sure that the update processes completed successfully. If the checks do not pass, roll back to the previous version.

Infrastructure Capabilities

Required

  • Minimally, produce a checksum that can be used to verify the binary was transferred without error. Ideally, code signing will be used instead, as you can also verify that the update is coming from an authorized source.
  • Cohort binning of devices enables you to control which devices receive specific firmware updates.
    • This is useful for deploying beta builds to an interested population of beta testers.
    • This is also a common way of implementing staged rollouts.
  • Staged rollouts of firmware updates provides a safer update mechanism than a “deploy to everyone” approach. You start with a small population of devices to make sure that the update succeeds and does not introduce significant new issues. If everything looks good, you continue to roll out the update to increasingly large segments of your population.
  • Ability to roll back firmware to a previous version in the event of a bad update
  • Check-in and heartbeat messages are useful for determining:
    • Whether or not an update was successful (a device will check in with a new firmware version)
    • The distribution of versions throughout the fleet
      • Often, teams are surprised to realize that there’s a distribution of versions, even when a new OTA update is released. Also, you will find that some devices never update.

Supporting Processes

  • Exclusive use of the customer-facing firmware update mechanism to ensure its reliability
    • Many teams leave OTA updates to the end of the project, for example, which is far too late in the process to ensure reliability. A better approach is implementing OTA updates first, and then requiring all development and internal testing to use OTA updates rather than JTAG or USB. This way, the update mechanisms receive significant mileage, and the kinks are worked out before the product is released to customers.
  • Significant testing of the update process, especially with the use of fault injection to ensure that fallbacks and fail-safes work as intended
  • Version Data Storage, Protocols, and Schemas
    • Data migrations are a common challenge that you will need to deal with when updating devices. For example, you might update the “device settings” layout, change a communication protocol, or update an sqlite database schema.
    • Without versioning these items, you cannot safely perform a migration as part of the firmware update process.

Accounting for the Possibility of Failure

Firmware updates can go wrong in many ways. It is important to ensure that your update system is resilient to all of these failures. After all, if you brick devices, you cannot remotely fix them.

To protect against data corruption during the transfer, you should compare the received contents against a checksum to ensure the integrity of the update. Ideally, however, you will use code signing to provide both an integrity check and an assurance that the build comes from an approved source.

Updates should, ideally, be atomic: either the whole update is applied, or no update occurs at all. This is especially important in guarding against corruption due to loss of network connectivity or loss of power during an update.The most common approach to atomic updates is to have dual application partitions in device storage, which we will call “A” and “B”. This approach is akin to the common “double buffering” pattern.

  • The bootloader will boot from partition A, which is currently active.
  • When an update is received, the contents will be placed into partition B.
    • If the update process fails for any reason, the bootloader will continue to boot from partition A.
    • If the update succeeds, the bootloader will boot from partition B.
      • If there is a problem identified during the boot process, this can be indicated to the bootloader, which can automatically fall back to partition A
  • When the next update is received, it will be placed into partition A.

However, memory and storage constraints can make atomic updates difficult to achieve with many embedded devices.

  • RAM may be sufficiently limited such that an entire update payload cannot be received before being applied, but must be streamed to flash instead.
    • This means that the contents of flash could be overwritten with data before it can be determined that the checksum or signature matches the expected value.
  • Flash may be sufficiently limited such that there is not space for a bootloader, two complete applications, and other artifacts.

In cases where the dual partition approach cannot be used, we will create a “fallback” application, which effectively takes the place of the second partition:

  • Updates will always be placed into the main application slot.
  • If an update fails, or some problem is identified during the application boot process, the bootloader will automatically boot into the fallback application.
  • The fallback application contains only the minimal amount of support to configure the processor and its components so that it can connect to the server and request a new update. This allows it to be much smaller than a complete second application.

This fallback firmware must be heavily tested so that it can be trusted to restore a system to a working state in the event of an update failure. Ideally, it will not need any updates once the device is deployed, as there is no fallback in place when the fallback firmware update fails.

If you cannot support any of these schemes with your current resources, you will need to add more storage or avoid OTA updates completely. You run the risk of a power or network failure completely disabling your devices in the field. Wired updates are less sensitive here, as long as you provide a tool that can be used to re-flash the device from a corrupted state (e.g., a DFU utility).

Sub-topics and Variations

  • Another vulnerability in the LPC55S69 ROM / Oxide describes a problem in the LPC55S69 In-System Programming code for the signing mechanism, which allows an attacker to gain non-persistent code execution with a carefully crafted update regardless of whether the update is signed

References

Infrastructure Capabilities that Aid Embedded Systems Development

14 July 2022 by Phillip Johnston • Last updated 29 March 2023Every embedded systems project and organization is different, but there are a consistent set of capabilities that are valuable for the majority of embedded software projects. Mature embedded software organizations can be identified by their investments in infrastructure and automation to improve their build, release, debugging, and monitoring capabilities. ““ Note Many of the capabilities described below are built on top of corresponding device-side capabilities, while some are standalone infrastructure. Each entry will identify any dependencies on the device side. Secret Management Secure Secret Storage Key Rotation Use Different …

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Versioning Software

20 January 2022 by Phillip Johnston Last updated 11 February 2026Versioning software projects is important, and especially so on embedded software projects: every device is likely to have a different software version flashed on it (especially during development), and there will be a range of software versions flashed onto customer devices at any given time. Because every binary is slightly different in some way or another, in order to debug the system we need to access the proper debug symbols, addresses, and change logs for that correspond to the version we are working with. For these reasons, we need the …

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.