Embedded Software Development Maturity Model Archives

18 May 202312 June 2023

From Concept to Launch: What It Takes to Build and Ship a New Device

Embedded Artistry and Memfault are joining forces to host a quarterly embedded discussion panel that is focused on the technical aspects of building embedded systems at scale. We will be featuring guest panel members who are at the cutting edge of embedded development. Our goal is to spread beneficial techniques and practices throughout the industry. …

Continue reading "From Concept to Launch: What It Takes to Build and Ship a New Device"

Exclusive Use of the Customer-Facing Firmware Update Mechanism

29 March 2023 by Phillip Johnston • Last updated 6 February 2025Firmware update solutions must be reliable – especially over-the-air updates. A failure in these processes can be catastrophic: completely bricked devices that cannot be remotely recovered, but instead must be RMA’d. In the worst case scenario, one bad update might mean bricking all of the devices in your fleet. This means the goal is to get as many iterations with these mechanisms as possible before releasing devices to your customers. There is one simple step you can take to maximizing the amount of iterations you get with the customer-facing …

Continue reading “Exclusive Use of the Customer-Facing Firmware Update Mechanism”

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Use Different Keys for Development and Production

29 March 2023 by Phillip JohnstonWhen using techniques like Code Signing, you need to implement Secure Secret Storage. Your private keys cannot be leaked, or else the whole signing and verification mechanism breaks. A common failure mode with this strategy is sharing private keys for development work. Proper key management is inconvenient, especially when you want to restrict access as much as possible. But developers need to create images and test them on-device, so they need access to the private keys to sign their images. Keys end up in git repositories, passed around via email or Slack, or uploaded to …

Continue reading “Use Different Keys for Development and Production”

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Mature Teams Prioritize End-to-End Interactions

27 March 2023 by Phillip JohnstonEmbedded systems are increasingly complex, especially those connected to the internet. In many cases, they are better viewed as one part of a larger distributed system involving the device, one or more phone applications, backend servers, fleet management, CI/CD servers, and more. Many teams do not approach product development with this distributed nature in mind. The “old ways” are followed instead: first, make all of the individual components work on their own, then integrate them together. The firmware team will work in their own world until the system is mostly completed, as will the app …

Continue reading “Mature Teams Prioritize End-to-End Interactions”

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Store and Index Software Build Artifacts

27 March 2023 by Phillip JohnstonEmbedded device populations always have mix of software versions. This is true whether or not you are in development or whether the device has been released to customers. Because function and variable address locations will change from one build to another, you must be able to access build artifacts for all versioned builds to debug them appropriately. Without a foolproof system in place for storing and indexing artifacts, these files can go missing or be mislabeled. Indexing these artifacts is essential from the perspective of being able to find the proper files when needed. You …

Continue reading “Store and Index Software Build Artifacts”

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Firmware Update Support

23 March 2023 by Phillip Johnston • Last updated 2 April 2026

Firmware update support is an essential capability for contemporary embedded devices, regardless of whether or not they are connected to the internet on a regular basis. Firmware updates are used to add new capabilities after launch, correct errors, and address security vulnerabilities. Firmware update support also ensures that devices can remain useful for a longer period of time, as the development team can respond to changes in the operating environment and customer expectations.

Supporting firmware updates for your system requires a number of supporting device-side and infrastructure capabilities. Update reliability is also significantly aided by adopting supporting processes in your organization.

Firmware updates are a good example of Software Engineering as applied to embedded systems. We have to carefully design the update mechanism, account for failure modes, and make tradeoffs based on our system’s design goals and constraints.

Table of Contents:

Device Capabilities
Infrastructure Capabilities
Supporting Processes
Accounting for the Possibility of Failure
Sub-topics and Variations
Case Studies of Update-related Problems and Vulnerabilities
Related Blog Posts
References

Device Capabilities

Required

Device software is split into multiple images.
- At a minimum, you will need to split the device into a Bootloader and Application.
- Software may be further refined depending on reliability and update schemes, such as into a Loader, a distinct Updater, or a Fallback Image
Fail-safe support in case of an update failure or bad update
An integrity check, ensuring that the provided binary has not been corrupted during checksum
The device can report its software version
The update mechanism is resilient against power and network loss during the update process. This is necessary to avoid bricking devices!
- Ideally, updates will be “atomic” and
A method for receiving firmware updates (whether via USB connection, SD card, or Over-the-Air)

Code signing support, which is used to verify both provenance and integrity of an update
Support for rolling back to a previous version on command (many implementations only allow you to increase the version)
Version data storage, schemas, and communication protocols to support data migrations in response to an update process
Ability to specify pre- and post-update actions (e.g., a script) in addition to the firmware update
- This can be extremely useful for supporting actions like data migrations or file removals, which might need to be executed after an update has completed.
- This can be useful for implementing post-update sanity checks to make sure that the update processes completed successfully. If the checks do not pass, roll back to the previous version.

Infrastructure Capabilities

Required

The build system must produce unique software versions
Store and index software build artifacts
A mechanism for pushing a new update or indicating that a new update is available

Minimally, produce a checksum that can be used to verify the binary was transferred without error. Ideally, code signing will be used instead, as you can also verify that the update is coming from an authorized source.
Cohort binning of devices enables you to control which devices receive specific firmware updates.
- This is useful for deploying beta builds to an interested population of beta testers.
- This is also a common way of implementing staged rollouts.
Staged rollouts of firmware updates provides a safer update mechanism than a “deploy to everyone” approach. You start with a small population of devices to make sure that the update succeeds and does not introduce significant new issues. If everything looks good, you continue to roll out the update to increasingly large segments of your population.
Ability to roll back firmware to a previous version in the event of a bad update
Check-in and heartbeat messages are useful for determining:
- Whether or not an update was successful (a device will check in with a new firmware version)
- The distribution of versions throughout the fleet
  - Often, teams are surprised to realize that there’s a distribution of versions, even when a new OTA update is released. Also, you will find that some devices never update.

Supporting Processes

Exclusive use of the customer-facing firmware update mechanism to ensure its reliability
- Many teams leave OTA updates to the end of the project, for example, which is far too late in the process to ensure reliability. A better approach is implementing OTA updates first, and then requiring all development and internal testing to use OTA updates rather than JTAG or USB. This way, the update mechanisms receive significant mileage, and the kinks are worked out before the product is released to customers.
Significant testing of the update process, especially with the use of fault injection to ensure that fallbacks and fail-safes work as intended
Version Data Storage, Protocols, and Schemas
- Data migrations are a common challenge that you will need to deal with when updating devices. For example, you might update the “device settings” layout, change a communication protocol, or update an sqlite database schema.
- Without versioning these items, you cannot safely perform a migration as part of the firmware update process.

Accounting for the Possibility of Failure

Firmware updates can go wrong in many ways. It is important to ensure that your update system is resilient to all of these failures. After all, if you brick devices, you cannot remotely fix them.

To protect against data corruption during the transfer, you should compare the received contents against a checksum to ensure the integrity of the update. Ideally, however, you will use code signing to provide both an integrity check and an assurance that the build comes from an approved source.

Updates should, ideally, be atomic: either the whole update is applied, or no update occurs at all. This is especially important in guarding against corruption due to loss of network connectivity or loss of power during an update.The most common approach to atomic updates is to have dual application partitions in device storage, which we will call “A” and “B”. This approach is akin to the common “double buffering” pattern.

The bootloader will boot from partition A, which is currently active.
When an update is received, the contents will be placed into partition B.
- If the update process fails for any reason, the bootloader will continue to boot from partition A.
- If the update succeeds, the bootloader will boot from partition B.
  - If there is a problem identified during the boot process, this can be indicated to the bootloader, which can automatically fall back to partition A
When the next update is received, it will be placed into partition A.

However, memory and storage constraints can make atomic updates difficult to achieve with many embedded devices.

RAM may be sufficiently limited such that an entire update payload cannot be received before being applied, but must be streamed to flash instead.
- This means that the contents of flash could be overwritten with data before it can be determined that the checksum or signature matches the expected value.
Flash may be sufficiently limited such that there is not space for a bootloader, two complete applications, and other artifacts.

In cases where the dual partition approach cannot be used, we will create a “fallback” application, which effectively takes the place of the second partition:

Updates will always be placed into the main application slot.
If an update fails, or some problem is identified during the application boot process, the bootloader will automatically boot into the fallback application.
The fallback application contains only the minimal amount of support to configure the processor and its components so that it can connect to the server and request a new update. This allows it to be much smaller than a complete second application.

This fallback firmware must be heavily tested so that it can be trusted to restore a system to a working state in the event of an update failure. Ideally, it will not need any updates once the device is deployed, as there is no fallback in place when the fallback firmware update fails.

If you cannot support any of these schemes with your current resources, you will need to add more storage or avoid OTA updates completely. You run the risk of a power or network failure completely disabling your devices in the field. Wired updates are less sensitive here, as long as you provide a tool that can be used to re-flash the device from a corrupted state (e.g., a DFU utility).

Sub-topics and Variations

Over-the-Air Update [OTA] refers to sending updates to a system wirelessly
Delta Update is an optimization of a full firmware update process that only involves transmitting pieces of the program that have changed. It is useful for systems with limited communication bandwidth or high communication costs.
The Update Nightmare: Bricking Devices in the Field looks at real-world examples of failed updates as a motivation to support Staged Rollouts.

Another vulnerability in the LPC55S69 ROM / Oxide describes a problem in the LPC55S69 In-System Programming code for the signing mechanism, which allows an attacker to gain non-persistent code execution with a carefully crafted update regardless of whether the update is signed

References

Not Invented Here Syndrome is a Business Problem

8 December 2022 by Phillip Johnston • Last updated 14 February 2024“Not Invented Here” (NIH) syndrome is a significant problem for technology companies. NIH is a tendency to avoid code, products, standards, or techniques that come from outside of an organization. With software development, NIH is often associated with the idea that your internal team could do a better job, with a higher quality result, and incur a lower overall cost than any existing solution. In today’s market, you can purchase or find open source solutions for large portions of your system, including supporting infrastructure. You can buy pre-certified radio …

Continue reading “Not Invented Here Syndrome is a Business Problem”

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Software is a cost, not an asset

6 December 2022 by Phillip Johnston • Last updated 28 March 2024Software companies often think of their code (or their software application) as an asset. Given one perspective, this makes sense: software has value, and you can buy it or sell it. This applies to organizations whose product is the code that they are selling. But for most teams, the code is not the product. Therefore, it is not an asset – it is a cost. The real asset is the business capability/value the software provides. Augmented by software, the business (or customer) can now do something they couldn’t do …

Continue reading “Software is a cost, not an asset”

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

Mature Organizations Include Quality Assurance

18 November 2022 by Phillip JohnstonA strong indicator of a mature organization is the presence of a quality assurance (QA) role, and ideally a team. Lack of QA is Common Most organizations we have encountered neglect this role, especially startups and smaller teams. With startups, the focus is on building – the idea of releasing to customers can feel impossibly far away, and thus QA can seem not important. Small teams with limited resources may take a slightly different view, focusing their limited resources on development to try to generate more fund. One common justification for the lack of dedicated …

Continue reading “Mature Organizations Include Quality Assurance”

To access this content, you must purchase a Membership - check out the different options here. If you're a member, log in.

18 November 20222 March 2023

Building Embedded Teams in the Modern Era

Continue reading "Building Embedded Teams in the Modern Era"

Category: Embedded Software Development Maturity Model

From Concept to Launch: What It Takes to Build and Ship a New Device

Exclusive Use of the Customer-Facing Firmware Update Mechanism

Use Different Keys for Development and Production

Mature Teams Prioritize End-to-End Interactions

Store and Index Software Build Artifacts

Firmware Update Support

Device Capabilities

Required

Recommended

Infrastructure Capabilities

Required

Recommended

Supporting Processes

Accounting for the Possibility of Failure

Sub-topics and Variations

References

Not Invented Here Syndrome is a Business Problem

Software is a cost, not an asset

Mature Organizations Include Quality Assurance

Building Embedded Teams in the Modern Era

Device Capabilities

Required

Recommended

Infrastructure Capabilities

Required

Recommended

Supporting Processes

Accounting for the Possibility of Failure

Sub-topics and Variations

Case Studies of Update-related Problems and Vulnerabilities

Related Blog Posts

References