Minimize Time to Defect Discovery

24 October 2022 by Phillip Johnston • Last updated 26 February 2024A core axiom of our software engineering philosophy is that we want to minimize time to defect discovery. We have observed that the longer an error remains in a program, the harder it is to fix. The best time to fix an error is immediately after you introduced it. You have the full context loaded into your mind. You understand what you’re trying to do, why you just wrote the code the way you wrote it, and what the target behavior is. The defect is often obvious – and …

Monitoring Memory Usage

24 October 2022 by Phillip JohnstonMemory is usually viewed as a limited resource in embedded systems. As a result, monitoring memory usage is a common analysis and quality enforcement activity. This is often done for both resource management reasons and preventing critical errors, such as stack overflows, heap exhaustion, and heap fragmentation. Stack Usage A linker can usually check whether or not your global variables will fit into the available RAM, but it cannot tell how much stack you need. This means that there is guesswork involved in sizing stacks. With limited memory, we are also forced to tune our …

What to Do After Fixing a Bug

10 October 2022 by Phillip JohnstonThere is an often overlooked step in the debugging process: following up after the fix is made. Many developers see the value in truly verifying the fix, but we can often do even more than that. Once a bug is fixed, here are additional activities that should take place: Create a regression test to ensure that a) the problem is caught, and b) the test passes with the fix in place. This will ensure that the same problem doesn’t return in the future.. Identify why this case wasn’t tested (what was overlooked?), and expand other …

Do Not Gate Debugging on Local Reproduction of an Issue

6 October 2022 by Phillip Johnston • Last updated 10 October 2022One of our debugging rules is to Reproduce the Problem. However, it is extremely important to understand that failing to reproduce a problem does not mean that it doesn’t exist! ““ Quote Q: How many engineers does it take to change a lightbulb? A: None, they all say “the lightbulb in my office works” We’ve all experienced this with software we used. The person on the other end can’t reproduce it, so they close the issue or stop looking into it. The problem is that we still have the …

Don’t Forget the Basic Sanity Checks

6 October 2022 by Phillip Johnston • Last updated 10 October 2022When trying to debug a problem, we recommend checking the basics before you get too wrapped up in the debugging process. Sometimes we can spin our wheels trying to debug a problem, but our lack of detachment means that we’re missing an obvious basic problem: Is it powered on? (“check the plug”) Is it running the right version? Are your changes actually being built and included in the binary? Does the system have valid configuration settings? This idea is closely related Quit Thinking and Look. You need to step …

Keep a Debug Audit Trail

6 October 2022 by Phillip Johnston • Last updated 10 October 2022Perhaps our most important advice for debugging a problem is to keep an audit trail. When you’re stuck on a problem, it is extremely easy to go in circles, testing the same four changes over and over again. You can also waste time by getting input from others, going back to test cases that you already tested but have since forgotten about. Keeping an audit trail is just like instrumenting the software or the hardware, but this time you’re instrumenting your debug process. As you investigate a problem, write …

Change One Thing at a Time

6 October 2022 by Phillip Johnston • Last updated 10 October 2022A useful operational rule, especially if you are attempting to employ scientific mindset, is to change only one thing at a time. It doesn’t matter if you are debugging a problem, implementing new code, or following a test driven development (TDD) process – changing one thing at a time is the sanest way to work. The reason is that you are minimizing the surface area for introducing a problem. If something goes wrong and you’ve only changed one thing, you immediately know the source of the problem. If you’ve …

Perform a Binary Search to Narrow the Problem Space

6 October 2022 by Phillip Johnston • Last updated 10 October 2022When trying to debug a problem, practice the essential strategy of divide and conquer to reduce the potential problem space. The best method for this is a binary search pattern. Start dividing the system into halves, and figure out where the problem is. This is a repeated process – once you have isolated the problem to one half of the system, divide that half again and continue narrowing down the search. Ideally, this search strategy will result in you pinpointing the source of the problem. Even in the worst …

Quit Thinking and Look

6 October 2022 by Phillip Johnston • Last updated 10 October 2022Perhaps the most important rule when debugging a problem is to quit thinking and look. This is the reminder we most often give to other engineers. You can tell that someone is thinking, not looking whenever you hear the phrase “that can’t happen”. If someone utters that phrase, they must immediately lose speaking privileges. It is obvious that the problem can happen, because it did! ““ Quote That can’t happen” is a statement made by someone who has only thought about why something can’t happen and hasn’t looked at …

If You Didn’t Fix It, It Isn’t Fixed

6 October 2022 by Phillip Johnston • Last updated 10 October 2022When debugging a problem is important that you actually take the time to confirm a problem is actually fixed. This is why it is especially helpful to – reproduce the problem locally. You can take the fix out, make sure it’s broken again, put the fix back in, and make sure it’s fixed again. You should do this a few times, as well as stress test the system with the fix in place, to make sure that the problem is actually fixed. If you cannot reproduce the problem locally, …