Code Reviews

GitNStats: A Git History Analyzer to Help Identify Code Hotspots

GitNStats is a cross-platform git history analyzer. GitNStats is used to identify files within a git repository which are frequently updated. High churn can be used as a proxy for identifying files which may have poor implementation quality, lack tests, or are missing a layer of abstraction.

Below I will provide basic instructions for getting and using GitNStats. We'll also look at two of my projects to review high-churn files and their git history. By reviewing the history of these files, we can identify potential problem areas, refactoring projects, and development process improvements.

Table of Contents:

  1. Getting GitNStats
  2. Usage
  3. Client Project Analysis
  4. Jenkins Pipeline Library Analysis
  5. Further Reading

Getting GitNStats

Best place to download the software is the repository Releases Page. Pre-packaged 64-bit releases are provided for OSX 10.12, Ubuntu 14.04, Ubuntu 16.04, and Windows.

To install GitNStats:

  1. Download one of the pre-packaged releases
  2. Create a home for GitNStats, such as within /usr/local/share or your home directory.
  3. Unzip the release package to the target directory
  4. Link the gitnstats binary to a location in your path, such as /usr/local/bin or /bin.
    1. Alternatively, you can add the target directory to your PATH variable

Example workflow included in the README:

# Download release (replace version and runtime accordingly)
cd ~/Downloads
wget <archive-for-your-platform.zip>

# Create directory to keep package
mkdir -p ~/bin/gitnstats

# unzip
unzip osx.10.12-x64.zip -d ~/bin/gitnstats

# Create symlink
ln -s /Users/rubberduck/bin/gitnstats/gitnstats /usr/local/bin/gitnstats

Usage

The primary method of using gitnstats is simply to run it in a repository without arguments. You will see the repository path, the branch, and a list of file & commit pairs.

$ gitnstats

Repository: /Users/pjohnston/src/ea/templates
Branch: master

Commits    Path
3    oss_docs/CONTRIBUTING.md
3    oss_docs/PULL_REQUEST_TEMPLATE_CCC.md
3    oss_docs/PULL_REQUEST_TEMPLATE.md
3    oss_docs/ISSUE_TEMPLATE.md
2    oss_docs/CODE_OF_CONDUCT.md
1    README_template.md
1    PULL_REQUEST_TEMPLATE_example.md
1    PULL_REQUEST_TEMPLATE_CCC.md
1    Jenkinsfile
1    ISSUE_TEMPLATE_example.md
1    CONTRIBUTING_template.md
1    CODE_OF_CONDUCT_template.md
1    CI.jenkinsfile
1    .github/PULL_REQUEST_TEMPLATE.md
1    .github/ISSUE_TEMPLATE.md
1    oss_docs/README.md
1    jenkins/Jenkinsfile
1    jenkins/CI.jenkinsfile

You can also supply the repository path as a command-line argument, allowing you to invoke gitnstats from outside of a repository:

~$ gitnstats /Users/pjohnston/src/ea/templates
Repository: /Users/pjohnston/src/ea/templates
Branch: master

…

You can specify a branch name to analyze using the -b or --branch arguments:

$ gitnstats -b avoid-failing-when-delete-a-branch
Repository: /Users/pjohnston/src/ea/scm-sync-configuration-plugin
Branch: avoid-failing-when-delete-a-branch

…

You can also limit the search to all commits after a certain date using the -d or --date arguments:

$ gitnstats -d 1/1/18
Repository: /Users/pjohnston/src/ea/embedded-framework
Branch: master

Commits    Path
8    docs/development/libraries.md
5    docs/development/tools.md
4    docs/architecture/architecture.md
3    docs/development/testing.md
2    docs/development/quality.md

Those are the basic operations supported by gitnstats, and they can be combined together:

$ gitnstats ~/src/ea/libc -b pj/stdlib-test -d 10/30/17
Repository: /Users/pjohnston/src/ea/libc
Branch: pj/stdlib-test

Commits    Path
1    src/stdlib/strtof.c
1    src/stdlib/strtod.c
1    src/gdtoa
1    premake5.lua
1    .gitmodules
1    src/stdlib/strtoll.c
1    src/stdlib/strtol.c

For further instruction, refer to gitnstats --help

Client Project Analysis

I recently worked on a short-term project for a client, so let's take a look at that project and see how the file churn maps to problems I encountered along the way.

10:38:13 (master) power-system-fw$ gitnstats
Repository: /Users/pjohnston/src/projects/power-system-fw
Branch: master

Commits    Path
34    src/lib/powerctrl/powerctrl.c
34    src/main.c
33    Makefile
29    README.md
26    src/lib/commctrl/commctrl.c
19    src/_config.h
18    src/drivers/i2c/i2c_slave.c
17    src/drivers/can/can.c
13    src/lib/powerctrl/powerctrl.h
13    src/drivers/bmr456/bmr456.c
11    src/drivers/gpio/gpio_interrupt_handler.c
11    src/lib/commctrl/commctrl.h
10    src/drivers/i2c/i2c.c

There are 8 files that have been changed a significant number of times, and the top 3 files were changed 3 times more than the files below the top 10.

That's a pretty huge gap, so let's look at the history to see what's going on with our top three files:

  • main.c was updated every time a new library or driver was added and required initialization.
    • The abort and error handling functions are included in main.c and received multiple functionality updates (stopping threads, sending a UART message, LED error code)
      • These handlers should be split into a different file
    • Static functions received doxygen updates in separate commits - I can clearly be better about documenting WHILE writing a function
  • powerctrl.c is the library which provides power control abstractions and power-state management
    • Timing parameters have been updated multiple times after validation efforts
      • These values should be configurable and moved into _config.h - churn should happen there
    • Due to timing problems, the library was overhauled to add in a thread which managed power state changes
      • Significantly less churn happens after this change
    • As new parts and drivers were brought up, they were added into the power control library individually
  • Makefile was updated every time a new source file was created.
    • Significant churn happened when bringing up the project on Linux, as differences between gcc versions and case-sensitive file systems identified a series of changes that needed to be made
      • These changes weren't made on a branch, but instead committed and tested with a new build on the build server.
      • This is terrible development practice on my end. I should have been testing locally in a VM or by using a branch.

By looking at the statistics, I can uncover some design work and refactoring efforts that will improve the project. I also see the results of some expedient choices I made, resulting in terrible development practices and unnecessary file churn. Now these facts are logged in git history forever.

What About Recent Changes?

The project was officially delivered on 6/1/18, so let's see what modifications have been made after client feedback:

$ gitnstats -d 6/2/18
Repository: /Users/pjohnston/src/projects/power-system-fw
Branch: master

Commits    Path
1    src/drivers/gpio/gpio_interrupt_handler.c
1    src/lib/powerctrl/powerctrl.c

Not too bad after all, though both gpio_interrupt_handler.c and powerctrl.c are in the high-commit list in the overall history analysis. If these libraries continue to show edits, I know I need to spend more time thinking about the structure and interfaces of these files.

Jenkins Pipeline Library Analysis

The Jenkins Pipeline Library is an open-source library for use by Jenkins multi-branch pipeline projects. I use this library internally to support complex Jenkins behaviors, as well as with some client Jenkins implementations.

Let's see what the highest-churn files for this project are:

10:41:59 (master) jenkins-pipeline-lib$ gitnstats
Repository: /Users/pjohnston/src/ea/jenkins-pipeline-lib
Branch: master

Commits    Path
15    vars/sendNotifications.groovy
11    vars/gitTagPreBuild.groovy
10    vars/slackNotify.groovy
5    vars/gitTagCleanup.groovy
4    vars/gitTagSuccess.groovy
4    vars/setGithubStatus.groovy
4    vars/emailNotify.groovy
4    vars/gitBranchName.groovy

…

Wow, the top three files have been edited more than 10 times.

Clearly there is a problem, which is made even worse by the fact that sendNotifications.groovy was split off into two separate functions: slackNotify.groovy and emailNotify.groovy. The fact that sendNotifications.groovy was managing two separate notification paths was cause for the initial churn on that file, and certainly caused overly complex logic. Splitting the file into two separate functions was A Good Thing.

Diving into the slackNotify.groovy changes, I can see that I was very thoughtless in my initial implementation and committing strategy.

Two commits were actual feature extensions:

  1. Add an option to use blueOcean URLs for slack notifications
  2. Improve output for builds with no changes or first-builds: The commit that was built will be indicated in the message

The rest of the changes were formatting errors, typos, and other fixes for easily-identified errors.

There are some clear lessons here:

  1. I can identify and address problematic files long before 25 total changes (sendNotifications.groovy + slackNotify.groovy)
  2. To avoid high-churn on a file, follow good development processes. Expediency creates terrible history and higher-than-necessary churn. I would be embarrassed to do this on a professional project, so why did I take the expedient route on a personal (and public!) project?

Further Reading

Improving Our Software With 5 Lightweight Processes You Can Adopt This Month

This article comes from the March 2018 edition of our embedded systems newsletter.

I typically use the newsletter as a way to share interesting embedded news, summaries of new parts, libraries I've been investigating, and other miscellanea. However, I thought that the topic I discussed in this newsletter was worth sharing with a wider audience.

It is my sincere hope that your teams adopts at least one new process to help improve code quality.

Improving Our Software With 5 Lightweight Processes You Can Adopt This Month

It's that time of year when the Barr Group releases their yearly Embedded Systems Safety & Security Survey results. Last year's results were eye opening, as nearly 50% of respondents reported not using static analysis and 36% reported that they do not perform code reviews. The 2018 results were no better:

  • 38% of safety-critical products don't comply with a formal safety standard
  • 43% of teams working on safety critical products don't perform code reviews
  • 41% of of teams working on safety critical devices don't perform regression testing (54% for IoT product teams)
  • 33% of teams working on safety-critical products don't perform static analysis (49% for IoT product teams)

These numbers are even more alarming given the fact that 25% of the reported "internet-connected devices" could kill or injure people if hacked. 22% of respondents mentioned that security for connected devices wasn't even on their to-do list. We're becoming increasingly connected, but our standards for safety, testing, and verification are not keeping pace. I want to be clear: this is not acceptable.

Many teams skip crucial development processes and justify them with scheduling pressure or by blaming the boss. There will always be scheduling pressure, so we must adjust our approaches or we will continue to flounder. Bugs are expensive in time, money, and morale. Debugging accounts for 50% of most project schedules. We all make mistakes. Anything we can do to keep bugs out of our code or catch them as early as possible will save money and time.

I've selected five simple processes to improve quality that your team can adopt over the next month. These processes are cheap or free, apply across languages and platforms, and best of all - they work. While each process requires a bit of time to get up and running, there is little-to-no maintenance involved in continuing to use them.

Here are five lightweight processes for improving code quality and identifying problems early:

  1. Fix all of your warnings
  2. Set up a static analysis tool for your project
  3. Measure and tackle complexity in your software
  4. Create automated code formatting rules
  5. Have your code reviewed

Fix All of Your Warnings

The first thing I do when working on a new project is fix all of the compiler warnings. It's amazing to me how developers will ignore warnings or rationalize their presence. Occasionally you even run into a team who will fight tooth and nail to prevent you from fixing them!

The compiler knows the programming language better than you ever will. You should not ignore the compiler when it is alerting you to an issue. The way you are using the language is dangerous and likely has unintentional side effects. Depending on the warning, you might be introducing undefined behavior into your software. That's not our idea of quality software.

If you have warnings in your code base, fixing them is one of the fastest ways to improve quality. You will fix bugs and flaws in your program, regardless of whether or not they are currently problematic.

For more on compiler warnings:

Set Up Static Analysis Support

Static analysis tools provide us with even better feedback than the compiler. Your compiler will happily allow some problematic cases which are legal in language, such as out-of-bounds pointer accesses or missing initialization values. Your analyzer will catch these problems, and also report red flags such as unused or redundant code. When used throughout the development cycle, your static analysis tool can help you catch & prevent latent problems even earlier than your testing cycle.

Some governmental and industrial organizations are starting to require static analysis data for certification processes. Companies such as PRQA provide tools that can check for compliance with safety critical standards in a variety of industries.

There are many free static analysis tools available and their commercial counterparts are also inexpensive (most are less than $1000). At Embedded Artistry, we use Clang's static analyzer alongside clang-tidy.

Here are some resources you can use to find a static analysis tool that fits your needs:

Measure and Tackle Complexity

By the time you've eliminated warnings on your project and cleaned up glaring problems exposed by your static analysis tool, you've already made significant progress with software quality. The next goal is to measure complexity in your software. Because highly complex functions tend to be hard to understand, test, and maintain, these functions are prime candidates for refactoring and simplification.

By using a metric to measure complexity, we have a quantitative way to evaluate our code and identify pieces that need special attention. We can see how our changes are impacting the code base over time and trigger automatic alerts and reviews whenever a threshold is exceeded. We can focus our code reviews on functions with high complexity scores, making sure they get the most of our limited attention. Metrics aren't perfect, but they increase our insight into our software quality.

These are the simplest and most popular metrics for measuring code complexity:

  • Lines of code (LOC): a count of the non-blank, non-comment source lines in a function, module, or project
  • McCabe cyclomatic complexity (MCC): provides a complexity score based on the number of branches (e.g. conditional statements)
  • Strict cyclomatic complexity (SCC, CC2): expands MCC by considering the number of conditions within each branch, which provides an approximation for the number of test cases needed for full coverage

These free tools will calculate complexity metrics for C/C++. We currently use Lizard at Embedded Artistry.

For more on software complexity:

Create Auto-formatting Rules

Automated code formatting might seem like a strange recommendation to put into the top five, but it serves three purposes:

  1. Automated formatting reduces a programmer's cognitive load by eliminating an entire category of details and decisions they need to keep in mind
  2. Automated formatting improves the quality of our peer code reviews (the next recommendation) by eliminating arguments about style
  3. Automated formatting is the first step toward implementing and enforcing a coding standard

Every team that I've encountered with a written style guide inevitably ignores those guidelines, and multiple programming styles run rampant. Instead of relying on developers to constantly keep an arbitrary set of rules in mind, we can automate the process to make it simple and impersonal. At the very least, it's worth eliminating the pointless, time-wasting arguments that cause friction within our teams.

We use clang-format on Embedded Artistry projects. Uncrustify and Astyle are other popular code formatting tools.

For more on automated code formatting:

Have Your Code Reviewed

Writers accept the fact that first drafts are generally garbage and need heavy editing. Before I send this newsletter out into the world, it has usually gone through 2-3 self-editing sessions and 1-2 peer reviews. Along the way, the newsletter is trimmed and restructured. The result is a much better product than the initial draft.

Yet, for some reason, programmers seem to think that perfect code is produced on the first try. The 2018 Barr Group survey results showed that 54% of IoT product teams don't perform regular code reviews. The survey results also show that a painfully scary 43% of teams working on safety-critical software don't perform regular reviews.

Perfect code on the first try might be possible if you're a prodigious programmer. But remember: even Hemingway had an editor. A second set of eyes can identify flaws that you missed in your first pass. Another developer may have different experiences that provide insight into the merits or risks of your approach. The architect on your team probably has input on how a module should interface with other pieces of the program. The "ego effect" also comes into play: knowing that our code will be presented and reviewed by another human can dramatically improve the overall quality. We will spend time cleaning up and checking the logic before putting code up for judgment.

Code reviews can waste time and become unproductive if poorly implemented. Best Practices for Peer Code Review provides some excellent tips for getting started. Notably, a lightweight review process is more efficient and practical than long, in-depth reviews with multiple developers. Even performing reviews on only 20-33% of the submissions provides benefits due to the "ego effect". While 20% may seem low, remember that we are aiming for achievable: reviewing 20% of the source code is definitely better than none.

I highly recommend implementing peer code reviews after setting up automated code-formatting. This helps constrain code review discussions by preventing them from devolving into style nit-picking. If you've set up static analysis, make sure the tools are used prior to code reviews.

For more information on code reviews:

The Keys to Success When Adopting a New Processes

When adopting new processes, it's important to focus on one at a time. Adopting new processes in stages ensures that you have time to correctly implement each new technique before moving on to the next one. By implementing too many changes at once, you are likely to overwhelm your team and evoke a mutiny.

If you're leading a team, it helps to find someone who is excited and can help you champion the idea. Empower that person so they can demonstrate the benefits of the new process to your team. Back them up when there is pushback. Change is always hard. Expect the resistance, but don't let it stop you.

The key to making new processes stick is to make them as automated as possible. There is never a case where it isn't worth the time it takes to automate a development process. Automation ensures that the process is easy to follow and always happens, rather than trusting individual contributors to remember to follow a process. Automation also makes the process less personal. The rules are clearly defined and are being enforced by a tool. Depersonalization helps us view the situation dispassionately, rather than as an attack on our abilities.

To recap, when you implement new processes:

  1. Adopt one new process at a time
  2. Empower a process champion on your team
  3. Automate!

Newsletter

If you liked this article, sign up for our monthly newsletter to receive interesting embedded tidbits, articles I've enjoyed reading, summaries of interesting new parts, companies looking for engineers, and other miscellanea.

A Github Pull Request Template for the CCC Process

Updated: 20190302

Previously I shared a Code Change Control commit template. The CCC commit template provides much more rigor and documentation than a normal commit message. When you need stability and strict testing, such as when attempting to stabilize for a customer release, the CCC process helps ensure that we include changes that are thoroughly tested and justified.

An alternative approach to using a commit template is to use a GitHub pull request template. When a new PR is created, the description box is auto-populated with the template contents.

When your team enters a critical development stage, you can switch from the standard PR template to the more rigorous CCC PR template shown below.

Table of Contents:

  1. CCC PR Template
  2. Example Template File
  3. Further Reading

CCC PR Template

The objective of the CCC process is to ensure that all proposed changes need to be included and that they have been properly tested.

Developers should provide the background and justification for their change. The impact to the customer and any potential performance impacts should be noted. Tests should be thoroughly described and all test conditions (e.g. hardware and software versions) noted. Each change should have two reviewers to ensure that the change is high quality.

This template requires extra rigor that can be quite draining. I recommend enforcing the CCC template only during critical stages of development. For normal development cycles, consider using my basic GitHub pull request template.

# Description

Please include a summary of the change and which issue is fixed. Please provide the motivation for why this change is necessary at this stage of the product development cycle.

Fixes # (issue)

## Customer Impact

Please describe any customer facing impact of this change. This can be positive or negative impact.

## Performance Impact

Please describe any relevant performance impact of this change. This can be positive or negative impact. How did you characterize/test the performance impact?

# How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so others can reproduce. Please also list any relevant details for your test configuration.

- [ ] Test A
- [ ] Test B

**Hardware Configuration**:
- [ ] Dev Board
- [ ] Test rack
- [ ] DVT Boards
- [ ] Production Boards

**Software Configuration**:
* Operating System:
* Software version:
* Branch:
* Toolchain version:
* SDK version:

# Reviews

Please identify two developers to review this change

- [ ] @personA
- [ ] @personB

# Checklist:

- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged and published in downstream modules

Example Template File

You can find an example implementation file in templates GitHub repository. The PULL_REQUEST_TEMPLATE_CCC.md file contains the example template shown above. To use this file in your own project, copy that it into your project and remove _CCC from the template name.

Further Reading

Change Log

  • 20190302:
    • Improved grammar
    • Updated links to refer to template repository
    • Added table of contents

Related Articles