Product Development

How Much Should I Trust My CM?

Today we have a guest post by Pete Staples about finding the right levels of trust and responsibility with your contract manufacturing (CM) partner.

Pete Staples has been building embedded devices for many years. He is the President and Co-Founder of Blue Clover Devices, a full-service electronics ODM with offices in San Francisco, Hong Kong, and Shenzhen. Pete also publishes articles on the BCD blog. You can contact Pete via email.

How Much Should I Trust My CM?

Outside of the food and aerospace industries, you would be hard-pressed to find an entrepreneur who is planning to build the factory that builds their product. Everyone else relies on a contract manufacturer (CM) of some kind to put hammer to metal and bang out their product.

While many hardware entrepreneurs are close to their lawyers, investors, and accountants, often they have trouble rattling off the names of key people at their CM. While the services of an attorney are not inexpensive, they are a sliver of the spending pie next to the major wedge for the CM. Ben Einstein, now at Eclipse Ventures, wrote a useful blog about CM selection with rules-of-thumb about the economics around manufacturing. In his example, a typical hardware startup has an early valuation of $4.5M to cover the product launch and $1M (22% of their entire valuation) goes to the CM. Given the hefty investment put into CMs, why don’t hardware company leaders spend more time learning how their CM operates?

There are a couple of common reasons. One is wishful thinking. Retailers have perfected the art of making their merchandise look so plentiful and organized that we imagine the factories supplying them to be stainless steel cathedrals connected to sophisticated software stacks that calculate demand with AI and effortlessly order all of the parts to arrive at the right place at the right time. The optimistic entrepreneur thinks CMs are a lot more efficient than they really are and figures they are eager to ship anyway. The assumption is made that alignment of interests will power the project forward. It’s true that factories, like wild stallions, want to run. But factories have limited engineering resources to tackle the thorny issues that appear in the early stages of production, and this can significantly delay the launch. Product launches are a struggle and they always will be.

The second big reason for the lack of a personal connection is the effect of distance. Entrepreneurs tend to congregate around the sources of capital, professional services, and one another. Factories need more space to maneuver equipment, pallet jacks, inventory, trucks, and workers. CMs are located somewhere cheaper, and the world they live and work in is influenced by their environment. In 2016, Flex tried running a factory in downtown San Francisco (by scooping up the ashes of Quirky) but it did not survive.

Factories are run by people. We are still many decades away from robots running the show. The enduring health of your company rests upon finding CMs that you can maintain a healthy long-term relationship with. Like the pilot in command of an aircraft, someone is running your production line and their decisions will have an impact on your product. It’s expensive to keep moving production, and even if you don’t do all production with a single CM, you’ll need to invest in these important relationships.

There are many ways to manage this relationship, and the chosen style depends on how much you intend to lean on your CM’s operational capabilities.


The graph above is a way of looking at trust scenarios that we will consider. In reality there are an infinite number of possible relationships, but I’ll simplify this article by describing 3 very different approaches:

  • A: No trust
  • B: Medium trust
  • C: Full trust

In scenario A, you are calling all of the shots and essentially managing the factory yourself. You’re deciding which parts to buy and when to buy them. You provide the complete design and instruct workers how to assemble and test your product. The CM is essentially giving you a workforce, equipment, and space. With this arrangement the output is 100% your own responsibility. If you are new to manufacturing, you will learn a ton with this option, but you should also not make other plans during the build as it’s all encompassing.

In scenario B, there is a handoff generally defined by the design package. You are responsible for the design package (more on this later) and the CM is responsible for building the product per the spec. When products are returned (known as “RMA”s for return material authorization), there is an assessment of whether the return is due to a design issue or a manufacturing issue (sometimes there is no issue, often dubbed NTF for “no trouble found” or NDF for “no defect found”). The CM is off the hook if they built the returned unit according to the design. The sharing of responsibility is demonstrated by the CM managing their upstream suppliers and guaranteeing their work.

In scenario C, you are really counting on the CM to do right by you. You may be simply rebranding their existing product. You are providing some high-level requirements and acceptance criteria, and then letting the CM work out both design and manufacturing details. Even in this scenario, you will be signing a ‘Golden Sample’ but you won’t have visibility into the detailed bill of materials (BOM) or the supply chain that is supporting production. Obviously, it is nearly impossible to move the build to another CM in this scenario, but you also don’t have much responsibility to shoulder. If units are found to be defective or visibly different from the Golden Sample, it’s clearly the CM’s job to fix or replace the units.

These are all viable ways of operating, and I can think of successful companies for each arrangement described. It really comes down to being self-aware about your firm’s strengths and how much responsibility you can profitably manage.

However, there are also companies who do not see the trade-off between trust and responsibility. These companies try to push the relationship into one of two danger zones. At one extreme is the zero-zero scenario in which the brand holder does not give any trust to the CM by dictating every aspect of production. Then, the customer suddenly forgets all of these instructions when something goes wrong and claims that this was all at the discretion of the factory. This isn’t fair to a CM, who is naturally assuming that if their input is not valued that they can’t be held responsible when things backfire. At the other extreme, there are CMs who admonish their customers to “trust us, we are the experts” and work in a lot of their preferred suppliers and processes, and then don’t admit that they have co-opted the design in important ways. It’s not fair to you for a CM to insist on blind trust and then fail to step up when issues emerge. There can be higher margins for the CM in scenario C, but they must accept the heightened risk as well.

How do you even know which kind of relationship you are in? This should be discussed in the early stages of the relationship and hopefully be hammered out in a Manufacturing Services Agreement (MSA). We recommend that you have an MSA in place once the order size gets over 100,000 USD. There are complexities about warranties, excess materials, and returns that are spelled out in agreements like this, and it’s best to sort these out before the unpleasant day arrives when you have a recall on your hands.

The design package is another key element in defining responsibility. Different companies have different definitions of the design package, but here is a list of how we (at Blue Clover Devices) do it, which may be more thorough than average:

  • PRD - Product Requirements Document, overall requirements
  • CAD - mechanical and PCB design files including 2-D drawings
  • BOM - Bill of Materials, list of components in spreadsheet format
  • FMEA - Failure Modes and Effects Analysis, a risk assessment tool
  • SOP - Standard Operating Procedure, assembly instructions
  • Test Plan with specific acceptance criteria

Each document needs version control, a responsible engineer, and an approver. In smaller firms, the documentation is often managed with shared drives and a naming convention, but this approach should be abandoned like training wheels on a bike. It’s an easier way to get started, but maintenance discipline tends to break down and drags down faith in the documentation with it. We suggest using a Product Lifecycle Management (PLM) tool like Arena Solutions, Duro Labs, or Propel PLM for storing, sharing, and versioning the design package. Like starting a major construction project without blueprints, it’s reckless to begin production without a reviewed design package.

As a bonus, let me also share some of the tricks CMs play, so you can go into the relationship as an alert gazelle rather than a slow-moving calf. Smaller CMs generally have a different bag of tricks than large CMs, so I’ll group them accordingly. Note: most of these tricks are not acts of malice, but they may still have an unpleasant impact on your project.

Tricks of Small CMs

  1. Substitution - This one is really offensive but when you can’t get the component you love there is a strong magnetism to the one you’re with. When the CM is far away, there is a real risk of substituting the invisble internal parts with materials that are cheaper or have shorter lead times or -- sad to say -- are parts that came from some form of bribery.
  2. Procedure Drift - If a procedure is conducted by a person, there is a risk that it will be done differently on different days. This happens because workers are put into different positions all the time. Clearly written instructions (called a Standard Operating Procedure or “SOP” or work instructions “WI”) and a good line manager are supposed to ensure consistency, but intricate procedures are very hard to maintain and the line manager cannot simultaneously watch everyone.
  3. Skipping Tests - You might notice if a screw is missing but when you look at a product, it is impossible to tell if it’s been tested. If a tester gets up for a restroom break, there could be some units that actually skip this station. I received excellent advice from a factory manager when I was starting out. He said, “When you’re watching the line, don’t look at the assemblers; look at the testers. Really watch them, and see if they are filtering correctly.” I’ve seen lines where it was evident the tester didn’t have training to know what to inspect and just passed products randomly!
  4. WIP Growth - Work in Progress, or “WIP”, is pretty awful for a manufacturing organization. You cannot sell a half-built car. Taiichi Ohno impressed its horrors upon me in the Toyota Way, as it is an egregious form of waste. So why is WIP so persistent? In a production line, especially when people are short-handed, there is a natural desire to pick a simple process and simply “make a big batch of it”. Then we can tackle the trickier processes tomorrow. Imagine you’re the production manager and you come in and you’re tired and you have some new workers on the floor. You’d prefer to keep them busy with something that doesn’t lead to 5,000 questions and lots of line stoppages. So you put them on some easier task and they build WIP! Then when you have to integrate the whole product and make sure it all works, you might find out that the easy process actually has a little nuance you forgot to highlight and now you’ve got a giant batch of rework….more waste. You may think this isn’t your problem but it can be if the goods are no longer reworkable.
  5. Packaging - Smaller CMs have a tendency to not push for clear specifications on the packaging design and then throw it together at the last minute. The package is the protector of your precious product, and poor packaging can make an otherwise good product fail to sell through. Nobody wants goods in a crappy banged-up package, but if the packaging design isn’t done early enough, that is what you’re going to get. If you’re new to product design, you should know that retail packaging has surprisingly long lead times and may need multiple revisions. Treat it with as much care as any other part of the BOM.

Tricks of Large CMs

Large CMs usually got big by putting a lot of effort into the unglamourous job of contract assembly. If you are curious about their origin stories, check out our blog post on it. That said, they have all been around for a pretty long time and have settled into some habits that can be dangerous, if you are not a significant chunk of their revenue. If you don’t work at Apple, Google, or Amazon; then I’m talking to you.

  1. MSA - I encourage the use of an MSA, but some CMs use these as a weapon to absolve themselves of responsibility. Don’t assume that the agreement is fair because the author says it is. It might be, but you do have to read it carefully.
  2. NRE - Non-recurring expenses (NRE) can get out of hand with the large CMs. Sometimes they put together Cadillac solutions before the volumes justify such a big outlay with the attitude that “it’s not my money”. Definitely probe any line items in excess of $25K.
  3. Ownership of IP - Large CMs are powerful, and they generally build for you as well as your competitors. You need to think about which IP you want to protect as your own. ODMs (Original Design Manufacturer) are often specialists of a particular kind of product and might intend to compete with you in the future. Another term in this space is JDM (Joint Design Manufacturer), which implies the client and the CM develop some technology together. IP ownership is typically a section of the MSA.
  4. “D” Team - Big CMs have a lot of moving parts so it can be more difficult to track who is actually making decisions about ordering parts and allocating capacity for you. Talk to some other customers first to understand who is really in a position of power at the CM. You don’t want to find out too late that the only people you know there are on the “D” team, which receives lowest priority of all of the resources. All of the nimble low-volume experiments at these large CMs have started and failed several times over. They make their money from large orders from large companies period.

In summary, you can trust your CM as much or as little as you want. To build a successful relationship with your CM, regardless of how much you trust them, it is important to remember that the levels of trust should correspond to the level of responsibility you want to place on yourself or your CM. The more you trust your CM, the more you need to hold them accountable. The less you trust your CM, the more you need to hold yourself accountable. The best way to ensure success is clear communication and planning ahead of time. There are different ways to go about this, but usually MSAs (and other written agreements) provide useful guidelines that can be referenced by both parties. Remember, CMs are businesses too, and they want to succeed as much as you do!

Further Reading

Embedded Systems Architecture Resources

Updated: 20190717

After a decade spent building and shipping hardware products, I became convinced that many of the problems and schedule delays I experienced could have been avoided with a little bit of planning and thought. Repeatedly, we painted ourselves into corners with code that seemed to work well initially but caused problems months later when we finally started end-to-end system testing. Serious problems resulted in major software rewrites, changes in the technology stack, and delayed the ship date. Even worse, as I migrated from one product team to another, I noticed that we were repeating the same basic mistakes.

I started pondering this situation. Why were we dealing major design problems and risk areas at the end of the project instead of the beginning? How could we describe ourselves as "agile" if we weren't able to quickly adapt our programs to change? Why did none of the teams I was on use anything resembling design activity before starting to build a system?

These questions led me to a deep immersion in the topics of software architecture, systems thinking, and system design. I've applied countless lessons to our internal projects, client projects, and business development efforts. Value exploration, visual modeling, and minimalistic architecture efforts have significantly improved our work quality and derisked many projects.

"Architecture" and "design" seem to be words that send programming teams running for the hills. However, I've had multiple embedded developers share their frustrations with me - the same that started me on my journey - and expressed their interest in learning more about software architecture but not knowing where to start. So, here are all the resources I've collected on software architecture. I hope they help guide you in your own journey.

Table of Contents:

Where to Start?

There's a lot of material here! You don't need to read all of it to get started with architecture.

For general architecture exposure, I recommend picking 1-2 books from this list:

If you are focused on embedded systems, I highly recommend Real-Time Software Design for Embedded Systems. This book provides a blueprint for modeling and architecting embedded systems. You will be introduced to UML and a variety of modeling approaches that you can use when architecting an embedded system.

The next step is to actually practice! There is no need for a long, drawn-out architecture stage. Allocate 2-4 weeks for value exploration and architecture efforts before starting any new project. Perform stakeholder interviews and explore the value you expect the system to provide. Then focus on answering core questions, like:

  • What qualities and behaviors are most important?
  • What requirements do they place on the design?
  • What are the biggest risk areas?
  • How can we reduce risk?
  • What are we unsure about that might change?
  • How can we make sure to support those changes without requiring a system redesign?
  • What parts of the system will we buy, license, outsource, and build in house?

Those questions will inform the architecture effort. Model the system and begin prototyping the riskiest areas. As you develop the system, you will explore and refine the system architecture.

General Software Architecture

Before diving into embedded systems specifics, it is helpful to have a solid foundation in general software architecture techniques.

We've broken down our reading recommendations into the following categories:

What is Architecture?

Before diving into the how of architecture, it's helpful to know what it is.

Why Should We Architect?

Perhaps you're not convinced that architecture is valuable. Or perhaps you need to prepare yourself to advocate for architecture efforts on your projects. These articles will give you some insights into why we architect.

The Architect Role

These articles discuss the architect role itself, particularly the qualities and skillsets that are valuable to an architect.


We recommend the following architecture books:

These articles from around the web provide countless insights into the practice of software architecture:

Phil Koopman has a selection of lectures which are generally applicable to architecture and design:

Additionally, the slides and course notes from Hassan Gomaa are a useful introduction:

Here are talks which relate to the subject of architecture:


Here are some practical technique guides related to the architecture process, ideation, brainstorming, and value exploration.


Architecture work and documentation go hand in hand. Here are valuable resources on the that discuss architecture documentation:

Visual Architecture Process

These guides relate to Bredemeyer Consulting's Visual Architecture Process. They provide a practical blueprint for architecting your systems.

C4 Process

Simon Brown created the C4 architecture model, which focuses on four areas of architecture: Context, containers, components, and code. This is another practical blueprint for architecting your system.

Embedded Systems Architecture

Even just a little exposure to software architecture will reveal how deep the rabbit hole goes. We're focused on embedded systems, so here are embedded-specific resources.

Our favorite books on the subject of embedded systems architecture are:

Hassan Gomaa, a professor at George Mason University, published course notes for two courses which discuss embedded systems architecture and modeling:

Phil Koopman published the following course notes which are useful for embedded systems architects:

Safety and Critical Systems

Here are lectures, coures notes, and essays related to architecting for safety and for critical systems:


Here are lectures, coures notes, and essays related to architecting for security:

Systems Thinking

I would be remiss to talk about architecture without mentioning systems thinking. These two topics are intertwined: we must develop a habit of thinking about the system as a whole if we are to work at an architectural level.

Here are some of my favorite books and essays on systems thinking:

Design Patterns

Design patterns are extremely useful to learn and familiarize yourself with. These are non-obvious solutions to common scenarios and problems. For generally useful software architecture patterns, see:

Embedded systems often work well with event-driven architectures and/or state machines. For more information, see:

Embedded systems are often under tight memory constraints. A useful reference for embedded developers is:

Layered or Hexagonal architectures are common abstractions that work well for embedded systems. Here are some links on both types of design:

Here are design patterns related to safety and critical systems:

Here are anti-patterns to avoid:

Visual Modeling


UML is frequently trashed by development teams (even those with no experience using it), but I find "UML-light" to be extremely useful for documenting and modeling my systems.

These books are wonderful resources for learning and applying UML:

Here are lectures related to UML:

As far as UML tools go, there are many options. We recommend three:

  • Visual Paradigm is our tool of choice due to its support of SysML and the ability to tweak the models to support our needs
  • StarUML is a UML modeling tool recommended to us by Grady Booch, who says he uses this tool on a regular basis
  • PlantUML is a great tool which generates UML diagrams from textual descriptions, enabling you to store UML diagrams under revision control and to include them in source-code comments


If you prefer the C4 model, we recommend the following:

Who to Follow

You've already seen these names quite a bit throughout the article. I recommend keeping up with these folks:

Architecture on Embedded Artistry

We publish articles related to Architecture and Systems Thinking on on this website.

Architecture Articles

Systems Thinking Articles

Books Mentioned Above

Documenting Software Architectures: Views and Beyond (2nd Edition)
By Paul Clements, Felix Bachmann, Len Bass, David Garlan, James Ivers, Reed Little, Paulo Merson, Robert Nord, Judith Stafford
Design Patterns: Elements of Reusable Object-Oriented Software
By Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides
Pattern-Oriented Software Architecture Volume 1: A System of Patterns
By Frank Buschmann, Regine Meunier, Hans Rohnert, Peter Sommerlad, Michael Stal, Michael Stal

Change Log

What can software organizations learn from the Boeing 737 MAX saga?

Updated: 20190524

One of the largest news stories over the past month was the grounding of Boeing 737 MAX-8 and MAX-9 aircraft after an Ethiopian Airlines crash resulted in the deaths of everyone on board. This is the second deadly crash of involving a Boeing 737 MAX. A Lion Air Boeing 737 MAX-8 crashed in October 2018, also killing everyone on board. As a result of these two crashes, Boeing 737 MAX airplanes have been temporarily grounded in over 41 countries, including China, the US, and Canada. Boeing also paused delivery of these planes, although they are continuing to produce them.

I have been following the Boeing 737 MAX story closely. It serves as an interesting case study on software and systems engineering, human factors, corporate behavior, and customer service.

*Note: Both the Lion Air and Ethiopian Airlines crashes are still under investigation. Ultimately, everything you are reading about these crashes and that I discuss here is still in the realm of speculation. However, the situation is serious enough and well-enough understood that Boeing is addressing the problem immediately.*

Table of Contents:

Brief Background on the 737 MAX

Before diving into the suspected problem with the 737 MAX, I need to set the stage with some background information about the aircraft.

The Boeing 737 is the best-selling aircraft in the world, with over 15,000 planes sold. After Airbus announced an upgrade to the A320 that provided 14% better fuel economy per seat, Boeing responded with the 737 MAX. Boeing sold the 737 MAX as an "upgrade" to the famed 737 design, using larger engines for improved fuel efficiency (also by 14%). Boeing claimed that the 737 MAX operated and flew in the same way as the 737 NG, so pilots licensed to fly the 737 NG did not need additional training and simulator time for the 737 MAX.

Because Boeing increased the engine size to improve fuel efficiency, the engines needed to be positioned higher up on the plane's wings and slightly forward of the old position. Higher nose landing gear was also added to provide the same ground clearance as the 737NG.

The larger engines and new positions destabilized the aircraft, but not under all conditions. The engine housings were designed so they do not generate lift in normal flight. However, if the airplane is in a steep pitch (e.g., takeoff or a hard turn), the engine housings generate more lift than on previous 737 models. Depending on the angle, the airplane's inertia can cause the plane to over-swing into a stall.

To address the increased stall risk, Boeing developed a software solution: the Maneuvering Characteristics Augmentation System (MCAS). No other commercial plane uses a system like the MCAS, though Boeing uses a similar MCAS system on the KC-46 Pegasus military aircraft.

The MCAS is part of the flight management computer software. The pilot and co-pilot each have their own flight computer, but only one has control at a time. The MCAS takes readings from the angle of attack (AoA) sensor to determine how the plane's nose is pointed relative to the oncoming air. The MCAS monitors airspeed, altitude, and AoA. When the MCAS determines that the angle of attack is too great, it automatically performs two actions to prevent a stall:

  1. Command the aircraft's trim system to adjust the rear stabilizer and lower the nose
  2. Push the pilot's yoke in the down direction

The movement of the rear stabilizer varies with the speed of the plane. The stabilizer moves more at slower speeds and less at higher speeds.

By default, the MCAS is active when:

  • AoA is high (ascent, steep turn)
  • Autopilot is off
  • Flaps are up

The MCAS will deactivate once:

  • The AoA measurement is below the target threshold
  • The pilot overrides the system with a manual trim setting
  • The pilot engages the CUTOUT switch, which disables automatic control of the stabilizer trim

If the pilot overrides the MCAS with trim controls, it will activate again within five seconds after the trim switches are released if the sensors still detect an AoA over the threshold. The only way to completely disable the system is to use the CUTOUT switch and take manual trim control.

Note this important point: Boeing designed the MCAS to not turn off in response to a pilot manually pulling the yoke. Doing so would defeat the original purpose of the MCAS, which is to prevent the pilot from inadvertently entering a stall angle.

I highlight this point because a natural reaction to a plane that is pitching downward is to pull on the yoke. You are applying a counter-force to correct for the unexpected motion. For normal autopilot trim or runaway manual trim, pulling on the yoke does what you expect and triggers trim hold sensors.

We are under the impression that the column, yoke, steering wheel, gas pedal, and brakes fully control the response of the mechanical system. This is an illusion. Modern aircraft, like most modern cars, are "fly-by-wire". Gone are the days of direct mechanical connections involving cables and hydraulic lines. Instead, most of the connections are purely electrical and typically mediated by a computer. In many ways we are being continually "guarded" by the computers that mediate these connections. It can be a terrible shock when the machine fights against you.

The Suspected Problem

The MCAS is suspected to have played a significant role in both crashes.

During Lion Air flight JT610, MCAS repeatedly forced the plane's nose down, even when the plane was not stalling. The pilots tried to correct by pointing the nose higher, but the system kept pushing it down again. This up-and-down oscillation happened 21 times before the crash occurred. The Ethiopian Airlines crash shows a similar pattern. The Ethiopian Airlines CEO said that they believed that the MCAS was active during the Ethiopian Airlines crash.

Image from the Lion Air crash preliminary report. Notice how the Automatic Trim (yellow line) was forcing the aircraft down, and the pilots countered by pointing it back up (light blue line above Automatic Trim).

If the plane wasn't actually stalling, or even close to a stall angle, why was MCAS engaged?

AoA sensors can be unreliable, which is a suggested factor in the Lion Air crash, where there was a 20-degree discrepancy in AoA sensor readings. The MCAS only reads the AoA sensor on its corresponding side of the plane. The MCAS reacts to the reading faithfully and does not cross-check the other sensor to confirm the reading. If a sensor goes haywire, the MCAS has no way of knowing.

If the MCAS was enabled erroneously, why did the pilots not disable the system?

This is where the situation becomes muddled. The likeliest explanation for the Lion Air pilots is that they had no idea that the MCAS existed, that it was active, or how they could disable it.

Remember, the MCAS is a unique piece of software among commercial airplanes; it only runs on the 737 MAX. Boeing sold and certified the 737 MAX as a minor upgrade to the 737 body, which would not require pilots to re-certify or spend time training in simulators. As a result, it seems that the existence of the MCAS was largely kept quiet.

“We do not like the fact that a new system was put on the aircraft and wasn’t disclosed to anyone or put in the manuals."

  • Jon Weaks, president of Southwest Airlines Pilots Association

"This is the first description you, as 737 pilots, have seen. It is not in the AA 737 Flight Manual Part 2, nor is there a description in the Boeing FCOM (flight crew operations manual). It will be soon."

  • Message to APA from Capt. Mike Michaelis

After the Lion Air crash, Boeing released a bulletin providing details on how the system worked and how to counter-act it in case of malfunction. Boeing announced that the MCAS could move the stabilizer by 2.5 degrees. This movement limit applies separately for each time the MCAS is activated. Boeing confirmed that MCAS can move the stabilizer to its full downward position if the pilot did not counteract it with manual trimming or completely cutting out the system. With a limit of 2.5 degrees, two cycles of MCAS without pilot correction is enough to reach full downward position.

Boeing also said that emergency procedures that applied to earlier 737 models would have corrected the problems observed in the Lion Air crash.

The Lion Air pilots likely fought against an automated system that was working against them. The system is most likely to activate at low altitudes, such as during takeoff, leaving the pilots little time to react. Their search through the technical manuals proved unsuccessful.

The Ethiopian Airlines pilots had heard about MCAS thanks to the bulletin, although one pilot commented, "we know more about the MCAS system from the media than from Boeing". Ethiopian Airlines installed one of the first simulators for the 737 MAX, but the pilot of the doomed flight had not yet received training in the simulator. All we know at this time is that the pilot reported "flight control problems" and wanted to return to the airport and that the Ethiopian Airlines crash resembles the Lion Air crash. We must wait for the preliminary report for more details.

Compounding Factors

Based on our current knowledge, the first-level analysis leads us to believe that the MCAS system was poorly designed and caused two plane crashes.

It's not quite that simple. This is a complex situation, involving many people and organizations. Other pilots have struggled against the MCAS system and safely guided their passengers to their destination.

The following contributing factors play out time-and-again in other systems.

Poor Documentation

As I mentioned, after the Lion Air crash, pilots complained that they were not told about the MCAS or trained in how to respond when the system engages unexpectedly. This lack of documentation or training is especially dangerous when you are fighting against an automated system and your previous training does not fully apply (recall that pulling on the yoke to hold against the trim does not work against the MCAS). Even worse, Lion Air pilots attempted to find answers in their manuals before they crashed.

Pilots take their documentation extremely seriously. Below are three reports from the Aviation Safety Reporting System (ASRS), which is run by NASA to provide pilots and crews with a way to report safety issues confidentially.

The reports highlighted below focus on the insufficiency of Boeing 737 MAX documentation. I've bolded some sentences for emphasis.

ACN 1593017


B737MAX Captain expressed concern that some systems such as the MCAS are not fully described in the aircraft Flight Manual.

Highlights from the narrative:

This description is not currently in the 737 Flight Manual Part 2, nor the Boeing FCOM, though it will be added to them soon. This communication highlights that an entire system is not described in our Flight Manual. This system is now the subject of an AD.

I think it is unconscionable that a manufacturer, the FAA, and the airlines would have pilots flying an airplane without adequately training, or even providing available resources and sufficient documentation to understand the highly complex systems that differentiate this aircraft from prior models. The fact that this airplane requires such jury rigging to fly is a red flag. Now we know the systems employed are error prone--even if the pilots aren't sure what those systems are, what redundancies are in place, and failure modes.

I am left to wonder: what else don't I know? The Flight Manual is inadequate and almost criminally insufficient. All airlines that operate the MAX must insist that Boeing incorporate ALL systems in their manuals.

ACN 1593021


B737MAX Captain reported confusion regarding switch function and display annunciations related to "poor training and even poorer documentation".

Highlights from narrative:

This is very poorly explained. I have no idea what switch the preflight is talking about, nor do I understand even now what this switch does.

I think this entire setup needs to be thoroughly explained to pilots. How can a Captain not know what switch is meant during a preflight setup? Poor training and even poorer documentation, that is how.

It is not reassuring when a light cannot be explained or understood by the pilots, even after referencing their flight manuals. It is especially concerning when every other MAINT annunciation means something bad. I envision some delayed departures as conscientious pilots try to resolve the meaning of the MAINT annunciation and which switches are referred to in the setup.

ACN 1555013


B737 MAX First Officer reported feeling unprepared for first flight in the MAX, citing inadequate training.

Highlights from narrative:

I had my first flight on the Max [to] ZZZ1. We found out we were scheduled to fly the aircraft on the way to the airport in the limo. We had a little time [to] review the essentials in the car. Otherwise we would have walked onto the plane cold.

My post flight evaluation is that we lacked the knowledge to operate the aircraft in all weather and aircraft states safely. The instrumentation is completely different - My scan was degraded, slow and labored having had no experience w/ the new ND (Navigation Display) and ADI (Attitude Director Indicator) presentations/format or functions (manipulation between the screens and systems pages were not provided in training materials. If they were, I had no recollection of that material).

We were unable to navigate to systems pages and lacked the knowledge of what systems information was available to us in the different phases of flight. Our weather radar competency was inadequate to safely navigate significant weather on that dark and stormy night. These are just a few issues that were not addressed in our training.

Even worse, it appears that the FAA's System Safety Analysis document was also incorrect:

The original Boeing document provided to the FAA included a description specifying a limit to how much the system could move the horizontal tail — a limit of 0.6 degrees, out of a physical maximum of just less than 5 degrees of nose-down movement. [...] That limit was later increased after flight tests showed that a more powerful movement of the tail was required to avert a high-speed stall, when the plane is in danger of losing lift and spiraling down.

After the Lion Air Flight 610 crash, Boeing for the first time provided to airlines details about MCAS. Boeing’s bulletin to the airlines stated that the limit of MCAS’s command was 2.5 degrees. That number was new to FAA engineers who had seen 0.6 degrees in the safety assessment.

“The FAA believed the airplane was designed to the 0.6 limit, and that’s what the foreign regulatory authorities thought, too,” said an FAA engineer. “It makes a difference in your assessment of the hazard involved.”

I understand the pilots' concern, given that the MCAS could move the tail 4x farther than stated in the official safety analysis. What else is undocumented or documented incorrectly?

Rushed Release

I would bet that all engineers are familiar with rushed releases. We cut corners, make concessions, and ignore or mask problems - all so we can release a product by a specific date. Any problems are downplayed, and those that are observed by the customer can be fixed later in a patch.

Apparently, the 737 MAX was subject to the same treatment. Here are some key highlights from the article:

  • The FAA delegates some certification and technical assessments to airplane manufacturers, citing lack of funding and resources to carry out all operations internally
    • FAA managers have final authority on what gets delegated to the manufacturer
  • Boeing was under time pressure, because development of the 737 MAX was nine months behind the new A320neo
  • FAA technical experts said in interviews that managers prodded them to speed up the process
  • FAA safety engineer who was involved with certifying the 737 MAX was quoted saying that halfway through the certification process:
    • “We were asked by management to re-evaluate what would be delegated. Management thought we had retained too much at the FAA.”
    • “There was constant pressure to re-evaluate our initial decisions. And even after we had reassessed it […] there was continued discussion by management about delegating even more items down to the Boeing Company.”
    • “There wasn’t a complete and proper review of the documents. Review was rushed to reach certain certification dates.”
  • If there wasn't time for FAA staff to complete a review, FAA manages either signed off on the documents themselves or delegated the review to Boeing
  • As a result of this rushed process, a major change slipped through the process:
    • The System Safety Analysis on MCAS claims that the horizontal tail movement is limited to 0.6 degrees
    • This number was found to be insufficient for preventing a stall in worst-case scenarios
    • The number was increased 4x to 2.5 degrees
    • The FAA was never told about this changed, and FAA engineers did not learn about it until Boeing released the MCAS bulletin following the Lion Air crash

The New York Times corroborates this rushed released:

  • "The pace of the work on the 737 Max was frenetic, according to current and former employees who spoke with The New York Times."
    • “The timeline was extremely compressed,” the engineer said. “It was go, go, go.”
  • "One former designer on the team working on flight controls for the Max said the group had at times produced 16 technical drawings a week, double the normal rate."
  • "Facing tight deadlines and strict budgets, managers quickly pulled workers from other departments when someone left the Max project."
  • "Roughly six months after the project’s launch, engineers were already documenting the differences between the Max and its predecessor, meaning they already had preliminary designs for the Max — a fast turnaround, according to an engineer who worked on the project."
  • "A technician who assembles wiring on the Max said that in the first months of development, rushed designers were delivering sloppy blueprints to him. He was told that the instructions for the wiring would be cleaned up later in the process, he said."
    • "His internal assembly designs for the Max, he said, still include omissions today, like not specifying which tools to use to install a certain wire, a situation that could lead to a faulty connection. Normally such blueprints include intricate instructions."
  • "Despite the intense atmosphere, current and former employees said, they felt during the project that Boeing’s internal quality checks ensured the aircraft was safe"
  • “This program was a much more intense pressure cooker than I’ve ever been in,” he added. “The company was trying to avoid costs and trying to contain the level of change. They wanted the minimum change to simplify the training differences, minimum change to reduce costs, and to get it done quickly.”

I've worked on many fast-paced engineering projects. I've observed and personally made compromises to meet deadlines, and there are many that I disagreed with. All of these points are familiar and hit home. I was quite surprised to find that the culture that builds aircraft would be so similar to the culture that builds consumer electronics.

Delayed Software Updates

Weeks after the Lion Air crash, Boeing officials told the Southwest Airlines and American Airlines pilot's unions that they planned to have software updates available around the end of 2018.

“Boeing was going to have a software fix in the next five to six weeks,” said Michael Michaelis, the top safety official at the American Airlines pilots union and a Boeing 737 captain. “We told them, ‘Yeah, it can’t drag out.’ And well, here we are.”

The FAA told The Wall Street Journal that FAA work on the new MCAS software was delayed for five weeks by the government shutdown. However, the "enhancement" was submitted to the FAA for certification on 21 January, only four days before the shutdown ended.

The official software update was announced four months later than the initial estimate. It will still take many more months to approve and deploy.

We are all conditioned to waiting for fixes and updates. Teams are prone to giving idealistic estimates. Problems take longer than expected to diagnose, correct, and validate. Schedules are repeatedly overrun.

However, it's not going to comfort the families of those who lost their lives on Ethiopian Airlines Flight 302 that Boeing released a software fix for certification seven weeks before the fatal crash. There is a real cost to the delay of software updates, and that cost increases significantly with the impact of the issue. It is always better to take the necessary time to implement a robust design in order to avoid needing a patch at all.

Humans Were Out of the Loop

One uncomfortable computing fact remains true: humans are superior at dynamically receiving and synthesizing data.

Computers can only perform actions they were already programmed to do. A computer cannot take in additional data which it wasn't already programmed to read. The MCAS was designed to use a single data point, that of the AoA sensor on the corresponding side of the plane. The initial NTSC report on the Lion Air crash tells us that a single faulty AoA sensor triggered the MCAS.

If a pilot or co-pilot noticed a strange AoA reading (such as a 20-degree difference between the left and right AoA sensors), he or she could perform a "cross check" by glancing at the reading on the other side of the plane. Additional sensors and gauges can be read to corroborate or disprove a strange AoA reading. Hell, a pilot could even look out the window to get a sense of the plane's angle. The pilots could have a discussion and collectively determine which sensor they trusted. Our brains can take in any combination of this information and confirm/disprove a sensor reading.

What is even more troubling is that the system's behavior was opaque to the pilots. According to Boeing, the MCAS is (counter-intuitively) only active in manual flight mode, and is disabled when under autopilot. MCAS controls the trim without notifying the pilots that it is doing so.

Boeing did provide two optional features that would provide more insight into the situation:

  • An AoA indicator, which displays the sensor readings
  • An AoA disagree light, which lights up if the two AoA sensors disagree

But because these were optional, many carriers did not elect to buy them.

In a fight between an unaware human pilot and the MCAS, the MCAS has a fair chance of winning. Even if the pilot disables MCAS by setting a manual trim, MCAS would automatically kick back in if the high AoA reading was still detected. Combined with the fact that the MCAS could move the stabilizer 2.5 degrees per activation, it could continue to push the aircraft nose down until the stabilizer's force could no longer be overcome by the pilot's input.

Because of our superiority at dynamic information synthesis, humans must maintain the ability to override or overpower an automated process. At present, nothing in the world is as skilled at dealing with complexity and chaos as the human mind.

Boeing's Response

We've pointed a lot of fingers at Boeing, let’s take a moment to review what they are doing in response.

An MCAS software update has been announced:

Boeing has developed an MCAS software update to provide additional layers of protection if the AOA sensors provide erroneous data. The software was put through hundreds of hours of analysis, laboratory testing, verification in a simulator and two test flights, including an in-flight certification test with Federal Aviation Administration (FAA) representatives on board as observers.

The following changes will be made:

  • Flight control system will now compare inputs from both AOA sensors
  • If the sensors disagree by 5.5 degrees or more with the flaps retracted, MCAS will not activate
  • An indicator on the flight deck display will alert the pilots to AoA Disagree condition
    • This was previously a paid upgrade, but now will now ship as a standard feature
  • MCAS will also be disabled and if the AoA Disagree displayed with the AoA differs more than 10° for over 10 seconds during flight
  • If MCAS is activated in non-normal conditions, it will only provide one input for each elevated AOA event
    • There are no known or envisioned failure conditions where MCAS will provide multiple inputs.
  • MCAS can never command more stabilizer input than can be counteracted by the flight crew pulling back on the yoke.
    • The pilots will continue to always have the ability to override MCAS and manually control the airplane

In addition to the software changes, there are extensive training changes. Pilots will have to complete 21+ days of instructor-led academics and simulator training. Computer-based training will be made available to all 737 MAX pilots, which includes the MCAS functionality, associated crew procedures, and related software changes. Pilots will also be required to review the new documents:

  • Flight Crew Operations Manual Bulletin
  • Updated Speed Trim Fail Non-Normal Checklist
  • Revised Quick Reference Handbook

Boeing and the FAA participated in an evaluation of the software and 12 March test flight. Boeing will now work on getting the update approved for installation by the various airworthiness authorities around the world. I expect this to be a long road to approval after Boeing and the FAA destroyed their store of trust.

All of these actions seem correct to me as an engineer and systems builder. But I am crestfallen that they weren't included in the initial release.

Is This the Result of Bad Software?

It's very tempting to label the 737 MAX crashes as "caused by software." At some level, this is true. However, the MCAS appears to be a software patch applied to a larger systems problem (and a hastily assembled patch at that).

Let's walk through the chain that appears to have led us here:

  1. Fuel is expensive, and we want more efficient engines to reduce that burden
  2. Airbus was improving their aircraft, which placed pressure on Boeing to respond with their own improved platform
    1. The timeline was largely dictated by Airbus, not the time Boeing engineers needed to complete the project
  3. Boeing wanted to stick to the 737 platform for a variety of reasons:
    1. Faster time to market
    2. Lower cost for producing and certifying a new plane
    3. Pilot familiarity, leading to reduced training requirements for airlines
  4. Boeing sold the 737 MAX to airlines on the ideals of increased fuel efficiency, platform familiarity, and lower upgrade costs
  5. Bigger engines did not fit on the existing 737 platform, so modifications were needed:
    1. Move the engines forward
    2. Mount the engines higher
    3. Increase the height of the front landing gear
  6. These modifications changed the aerodynamics of the airplane, which should have changed certification requirements and required more training
  7. Instead Boeing created the MCAS to address the aerodynamic impact of the new design
  8. Boeing downplayed the MCAS system, which resulted in:
    1. Improper/insufficient certification
    2. Insufficient documentation
    3. Pilots received no training for handling the new 737 MAX

This is a systems engineering problem created by the company's design goals. Boeing's guiding light was to reuse the 737 platform so they could keep up with Airbus and minimize training requirements. Redesigning the airplane was entirely out of the question because it would give Airbus a significant time advantage and necessitate expensive training. To meet the design goals and avoid an expensive hardware change, Boeing created the MCAS as a software band-aid.

This scenario is quite familiar to me. As a firmware engineer, applying software workarounds for silicon or hardware design flaws is a major part of my work. Fixing hardware is "expensive" in terms of both time and money. At some point it's too late to change the hardware (or so I've been repeatedly told). The schedule drives the decision to move forward with known hardware design flaws.

The next line is predictable: "The problem will just have to be fixed in software." But software fixes do not always work. When the software workaround fails, we seem to forget that we were already attempting to hide a problem.

I am not alone in the view that this is not a "software problem". Trevor Sumner had an excellent Twitter Thread where he summarized the thoughts of Dave Kammeyer. Trevor's take extends beyond the Boeing analysis and even includes non-software factors leading to the Lion Air crashes (re-formatted for easier reading):

On both ill-fated flights, there was a:

  • Sensor problem. The AoA vane on the 737MAX appears to not be very reliable and gave wildly wrong readings. On #LionAir, this was compounded by a:
  • Maintenance practices problem. The previous crew had experienced the same problem and didn't record the problem in the maintenance logbook. This was compounded by a:
  • Pilot training problem. On LionAir, pilots were never even told about the MCAS, and by the time of the Ethiopian flight, there was an emergency AD issued, but no one had done sim training on this failure. This was compounded by an:
  • Economic problem. Boeing sells an option package that includes an extra AoA vane, and an AoA disagree light, which lets pilots know that this problem was happening. Both 737MAXes that crashed were delivered without this option. No 737MAX with this option has ever crashed. All of this was compounded by a:
  • Pilot expertise problem. If the pilots had correctly and quickly identified the problem and run the stab trim runaway checklist, they would not have crashed.

His closing point is austere (emphasis mine):

Nowhere in here is there a software problem. The computers & software performed their jobs according to spec without error. The specification was just shitty. Now the quickest way for Boeing to solve this mess is to call up the software guys to come up with another band-aid.

I've watched the "fix it in software" cycle play out repeatedly when developing iPhones. Should we be surprised that the same happens for an airplane too? What would prevent it, the idea of a safety culture? Can you ever be truly safe when you are optimizing for time-to-market and reduced costs.

After the resulting deaths, loss in market cap, and destruction of trust, one must wonder if Boeing will ever realize the cost and time savings they hoped the software fix would provide.

Note: We should leave open the possibility that there is a compounding software issue at play, since there are ASRS reports which indicate problems that occurred with autopilot on, a scenario where MCAS is supposed to be inactive.

Lessons We Can Apply to Our Systems

A complex system operated in an unexpected manner, and 346 people are dead as a result of two tragic and catastrophic accidents. Though the lives cannot be restored, if many systems and software engineers can learn as much as possible about this case, such deaths can be prevented in the future.

These are the lessons that I've learned from this investigation so far:

You Cannot Bend Complex Systems To Your Will

Boeing took an existing complex system and tried to change that system to force a specific outcome. Systems thinkers everywhere are cringing at this, because all changes to complex systems have unintended consequences.

Donna Meadows said in "Dancing with Systems":

But self-organizing, nonlinear, feedback systems are inherently unpredictable. They are not controllable. They are understandable only in the most general way. The goal of foreseeing the future exactly and preparing for it perfectly is unrealizable. The idea of making a complex system do just what you want it to do can be achieved only temporarily, at best. We can never fully understand our world, not in the way our reductionistic science has led us to expect. Our science itself, from quantum theory to the mathematics of chaos, leads us into irreducible uncertainty. For any objective other than the most trivial, we can’t optimize; we don’t even know what to optimize. We can’t keep track of everything. We can’t find a proper, sustainable relationship to nature, each other, or the institutions we create, if we try to do it from the role of omniscient conqueror.

Donna continues:

Systems can’t be controlled, but they can be designed and redesigned. We can’t surge forward with certainty into a world of no surprises, but we can expect surprises and learn from them and even profit from them. We can’t impose our will upon a system. We can listen to what the system tells us, and discover how its properties and our values can work together to bring forth something much better than could ever be produced by our will alone.

These thoughts are echoed by Dr. Russ Ackoff in a short talk titled "Beyond Continual Improvement". The points he makes in that brief fifteen minutes repeatedly echoed in my head while writing this essay.

A system is not the sum of the behavior of its parts, it is a product of their interactions. The performance of a system depends on how the parts fit, not how they act taken separately.

Boeing changed a few individual parts of the plane and expected the overall performance to be improved. But the effect on the overall system was more complex than the changes led them to expect.

When you get rid of something you don’t want (remove a defect), you are not guaranteed to have it replaced with what you do what.

We are all familiar with the experience of fixing a bug, only to have a new bug (or several) appear as a result of our fix.

Finding and removing defects is not a way to improve the overall quality or performance of a system.

The larger engines on the 737 airframe resulted in undesirable flight characteristics (excessive upward pitch at steep AoA). Boeing responded by attempting to address this defect with the MCAS. It's clear that the MCAS does not unilaterally improve the overall quality or performance of the aircraft.

What aspects of your system are you trying to force? Perhaps you can broaden your perspective and look at different approaches. The answer will reveal itself if you listen, though you might have to head in a different direction than you orginally intended.

Where You are Aiming is the Most Important Thing

There is an idea that I've been holding in the forefront of my mind: nothing has more of an impact on where you will eventually end up as where you are aiming. Setting the right aim is the most important thing.

It seems to me that Boeing's aim was to keep up with Airbus, leading to an aggressive time-to-market. They also wanted to minimize changes to ease certification and ensure that pilots did not need to receive new training. Those are the principles that appear to have guided their actions. Safety was still a concern, but that is not what the organization, system, or schedule focused on.

Dr. Ackoff echoes this idea in "Beyond Continual Improvement"

Basic principle: an improvement program must be directed at what you want, not at what you don’t want

At one level, we can say that Boeing wanted a new aircraft with improved fuel efficiency to compete with Airbus.

At another level, what Boeing wanted was to design a new aircraft with improved fuel efficiency, but in such a way as to not require a new airframe design, to not require a timeline that delayed them significantly with regards to the Airbus launch, and to not require pilots to receive training on the new airplane.

Boeing seems to have focused heavily on the things they did not want out of the improved design.

If you stick to the base level of desire (wanting a new aircraft with improved fuel efficiency), it seems that the system needed to be largely redesigned with a new airframe to support larger engines.

Your company’s aim is a truly powerful force. Your organization is headed in only that direction.

Ask yourselves often: is it the proper aim?

Treat Documentation as a First Class Citizen

If other people will use your product, you need to treat documentation as a first class citizen. Useful and comprehensive documentation and training is extremely important to your users and the engineers and managers that come after you.

Pilots are fanatical about their documentation, as well they should be. There is clear and documented outrage that details were kept from them.

In this case, improved documentation would have led to better understanding of the system forces at work. Improved documentation alone could have potentially saved hundreds of lives.

We try to hold back because we think our users don't need (or can't handle) the details:

One high-ranking Boeing official said the company had decided against disclosing more details to cockpit crews due to concerns about inundating average pilots with too much information - and significantly more technical data - than they needed or could digest.

Software teams often take this view of their users. Perhaps it is simply a rationalization for not wanting to put the effort into creating and maintaining documentation. How can we predict what information people need to know? What is too technical, and what is enough information? Won't the details change as the system evolves? How will we keep it maintained?

When we leave out documentation or fudge the explanations of how things work, we hinder our users. What could your users accomplish with your system if they had a full understanding of how it worked? I guarantee they can handle and achieve much more than you expect.

Software teams also hinder themselves when they neglect documentation. When we document, we are acting as explorers, mapping uncharted territory. New team members can learn how the system is designed. Ideas for simplification will jump out at you. You'll start thinking about novel ways to use your software and the edge cases that will be encountered. Poorly understood system aspects are suddenly obvious - "here be dragons".

It's a popular adage: if you can't explain something in simple terms, you don't understand it. And if you don't explain something, nobody else has a chance of understanding it.

Keep Humans in the Loop

I stated earlier that humans must maintain the ability to override or overpower an automated process. Because of our superiority at dynamic information collection and synthesis, we can improvise and make novel decisions in response to new situations. A computer, which has been preprogrammed to read from a limited amount of information and perform a set of specific responses, is not (yet) capable of improvising.

“What we have here is a ‘failure of the intended function,’ going back to your recent piece [on SOTIF — Safety of the Intended Functionality]. Barnden said, “A plane shouldn’t fight the pilot and fly into the ground. This is happening after decades of R&D into aviation automation, cockpit design and human factors research in planes

System designers and programmers are not all-knowing. Make sure that humans are kept in the loop - let them override your automated processes. Perhaps they know better after all.

Testing Doesn't Mean You Are Safe

Phil Koopman recently wrote about a concept he calls The Insufficient Testing Pitfall:

Testing less than the target failure rate doesn't prove you are safe. In fact you probably need to test for about 10x the target failure rate to be reasonably sure you've met it. For life critical systems this means too much testing to be feasible.

No doubt about it: the airplane and software were tested. Probably significantly. Certainly in simulators and in test flights. But it seems that Boeing did not test the system enough to encounter these problems. And even if they did - what other problems would still be missed?

We need a plan for proving that our software works safely. Testing is not enough.

Could This Happen in Your Organization?

It's easy for us to read about the Boeing 737 MAX saga, or other similar human-caused disasters, and think that we would never have walked down the path that led there. I implore you to have sympathy and understanding. Humans committed those actions. You are also human. You (and the organizations you are a part of) are capable of the same actions, for the same reasons. Keep the possibility of catastrophe in mind when you are tempted to let standards slide.

All of this is familiar to me as an engineer. I've worked on many fast-paced engineering projects. I've observed and personally made compromises to meet deadlines: some I proposed myself, and others that I disagreed with. I've seen these compromises work out, and I've seen them fail spectacularly. I got lucky. I don't work on safety critical software, and I have never watched people die at the hands of my systems. I have deep sympathy for the engineers who will be forever plagued by their creation.

After the Lion Air crash, Boeing offered trauma counseling to engineers who had worked on the plane. “People in my group are devastated by this,” said Mr. Renzelmann, the former Boeing technical engineer. “It’s a heavy burden.”

We must also remember that nobody at Boeing wanted to trade human lives for increased profits. All human organizations - families, companies, industries, governments - are complex systems and have a life of their own. The organization can make and execute a decision which none of the participants truly want, such as shipping a compromised product or prioritizing profits over safety.

What I see with Boeing is an organization that made the same kind of decisions that I regularly see made at every organization I've been a part of. Like at all of these other organizations, they did not escape the consequences of their decisions. The difference for Boeing is that they were playing for bigger stakes, and the result of their misplaced bet is more painful.

There was not villainous a CEO who forced his minions to compromise the product. There was not an entire organization whose individuals decided to collectively disregard safety. The organization rallied around the goals of time-to-market and minimizing required pilot training. Momentum and inertia kept the company marching toward their aim, even if individuals disagreed. And perhaps nobody explicitly noticed that safety was de-prioritized as a result.

I want to repeat this: Boeing made the same decisions that are being made everywhere else.

We all have a duty to aim higher.

Further Reading

For more on the Boeing 737 MAX Saga:

Commentary on the situation:

Thoughts on Autonomy and Safety:


Our creations are never the result of a single mind.

I want to thank Rozi Harris and Stephen Smith for reviewing early drafts of this essay. Their feedback, conversation, and exploration of the topics at hand has been extremely helpful. Many of their discussion points were incorporated into the essay.

Thanks to Nicole Radziwil for reviewing the article and making edits and corrections.

Thank you to the hard-working journalists and aviation fanatics who have published brilliant coverage and analysis for the 737 MAX saga. I know only a fraction of what others know about the problems discussed herein.

I also want to thank all of my colleagues who stood beside me over the years. It takes a monumental effort to build something new, and it rarely works out. We should all be amazed at our combined human triumph.

The lessons I present are hard-won, collectively generated, and the result of long debates. I hope the next generation of creators can use them to move beyond our current capabilities.