GitNStats: A Git History Analyzer to Help Identify Code Hotspots

GitNStats is a cross-platform git history analyzer. GitNStats is used to identify files within a git repository which are frequently updated. High churn can be used as a proxy for identifying files which may have poor implementation quality, lack tests, or are missing a layer of abstraction.

Below I will provide basic instructions for getting and using GitNStats. We'll also look at two of my projects to review high-churn files and their git history. By reviewing the history of these files, we can identify potential problem areas, refactoring projects, and development process improvements.

Table of Contents:

  1. Getting GitNStats
  2. Usage
  3. Client Project Analysis
  4. Jenkins Pipeline Library Analysis
  5. Further Reading

Getting GitNStats

Best place to download the software is the repository Releases Page. Pre-packaged 64-bit releases are provided for OSX 10.12, Ubuntu 14.04, Ubuntu 16.04, and Windows.

To install GitNStats:

  1. Download one of the pre-packaged releases
  2. Create a home for GitNStats, such as within /usr/local/share or your home directory.
  3. Unzip the release package to the target directory
  4. Link the gitnstats binary to a location in your path, such as /usr/local/bin or /bin.
    1. Alternatively, you can add the target directory to your PATH variable

Example workflow included in the README:

# Download release (replace version and runtime accordingly)
cd ~/Downloads
wget <archive-for-your-platform.zip>

# Create directory to keep package
mkdir -p ~/bin/gitnstats

# unzip
unzip osx.10.12-x64.zip -d ~/bin/gitnstats

# Create symlink
ln -s /Users/rubberduck/bin/gitnstats/gitnstats /usr/local/bin/gitnstats

Usage

The primary method of using gitnstats is simply to run it in a repository without arguments. You will see the repository path, the branch, and a list of file & commit pairs.

$ gitnstats

Repository: /Users/pjohnston/src/ea/templates
Branch: master

Commits    Path
3    oss_docs/CONTRIBUTING.md
3    oss_docs/PULL_REQUEST_TEMPLATE_CCC.md
3    oss_docs/PULL_REQUEST_TEMPLATE.md
3    oss_docs/ISSUE_TEMPLATE.md
2    oss_docs/CODE_OF_CONDUCT.md
1    README_template.md
1    PULL_REQUEST_TEMPLATE_example.md
1    PULL_REQUEST_TEMPLATE_CCC.md
1    Jenkinsfile
1    ISSUE_TEMPLATE_example.md
1    CONTRIBUTING_template.md
1    CODE_OF_CONDUCT_template.md
1    CI.jenkinsfile
1    .github/PULL_REQUEST_TEMPLATE.md
1    .github/ISSUE_TEMPLATE.md
1    oss_docs/README.md
1    jenkins/Jenkinsfile
1    jenkins/CI.jenkinsfile

You can also supply the repository path as a command-line argument, allowing you to invoke gitnstats from outside of a repository:

~$ gitnstats /Users/pjohnston/src/ea/templates
Repository: /Users/pjohnston/src/ea/templates
Branch: master

…

You can specify a branch name to analyze using the -b or --branch arguments:

$ gitnstats -b avoid-failing-when-delete-a-branch
Repository: /Users/pjohnston/src/ea/scm-sync-configuration-plugin
Branch: avoid-failing-when-delete-a-branch

…

You can also limit the search to all commits after a certain date using the -d or --date arguments:

$ gitnstats -d 1/1/18
Repository: /Users/pjohnston/src/ea/embedded-framework
Branch: master

Commits    Path
8    docs/development/libraries.md
5    docs/development/tools.md
4    docs/architecture/architecture.md
3    docs/development/testing.md
2    docs/development/quality.md

Those are the basic operations supported by gitnstats, and they can be combined together:

$ gitnstats ~/src/ea/libc -b pj/stdlib-test -d 10/30/17
Repository: /Users/pjohnston/src/ea/libc
Branch: pj/stdlib-test

Commits    Path
1    src/stdlib/strtof.c
1    src/stdlib/strtod.c
1    src/gdtoa
1    premake5.lua
1    .gitmodules
1    src/stdlib/strtoll.c
1    src/stdlib/strtol.c

For further instruction, refer to gitnstats --help

Client Project Analysis

I recently worked on a short-term project for a client, so let's take a look at that project and see how the file churn maps to problems I encountered along the way.

10:38:13 (master) power-system-fw$ gitnstats
Repository: /Users/pjohnston/src/projects/power-system-fw
Branch: master

Commits    Path
34    src/lib/powerctrl/powerctrl.c
34    src/main.c
33    Makefile
29    README.md
26    src/lib/commctrl/commctrl.c
19    src/_config.h
18    src/drivers/i2c/i2c_slave.c
17    src/drivers/can/can.c
13    src/lib/powerctrl/powerctrl.h
13    src/drivers/bmr456/bmr456.c
11    src/drivers/gpio/gpio_interrupt_handler.c
11    src/lib/commctrl/commctrl.h
10    src/drivers/i2c/i2c.c

There are 8 files that have been changed a significant number of times, and the top 3 files were changed 3 times more than the files below the top 10.

That's a pretty huge gap, so let's look at the history to see what's going on with our top three files:

  • main.c was updated every time a new library or driver was added and required initialization.
    • The abort and error handling functions are included in main.c and received multiple functionality updates (stopping threads, sending a UART message, LED error code)
      • These handlers should be split into a different file
    • Static functions received doxygen updates in separate commits - I can clearly be better about documenting WHILE writing a function
  • powerctrl.c is the library which provides power control abstractions and power-state management
    • Timing parameters have been updated multiple times after validation efforts
      • These values should be configurable and moved into _config.h - churn should happen there
    • Due to timing problems, the library was overhauled to add in a thread which managed power state changes
      • Significantly less churn happens after this change
    • As new parts and drivers were brought up, they were added into the power control library individually
  • Makefile was updated every time a new source file was created.
    • Significant churn happened when bringing up the project on Linux, as differences between gcc versions and case-sensitive file systems identified a series of changes that needed to be made
      • These changes weren't made on a branch, but instead committed and tested with a new build on the build server.
      • This is terrible development practice on my end. I should have been testing locally in a VM or by using a branch.

By looking at the statistics, I can uncover some design work and refactoring efforts that will improve the project. I also see the results of some expedient choices I made, resulting in terrible development practices and unnecessary file churn. Now these facts are logged in git history forever.

What About Recent Changes?

The project was officially delivered on 6/1/18, so let's see what modifications have been made after client feedback:

$ gitnstats -d 6/2/18
Repository: /Users/pjohnston/src/projects/power-system-fw
Branch: master

Commits    Path
1    src/drivers/gpio/gpio_interrupt_handler.c
1    src/lib/powerctrl/powerctrl.c

Not too bad after all, though both gpio_interrupt_handler.c and powerctrl.c are in the high-commit list in the overall history analysis. If these libraries continue to show edits, I know I need to spend more time thinking about the structure and interfaces of these files.

Jenkins Pipeline Library Analysis

The Jenkins Pipeline Library is an open-source library for use by Jenkins multi-branch pipeline projects. I use this library internally to support complex Jenkins behaviors, as well as with some client Jenkins implementations.

Let's see what the highest-churn files for this project are:

10:41:59 (master) jenkins-pipeline-lib$ gitnstats
Repository: /Users/pjohnston/src/ea/jenkins-pipeline-lib
Branch: master

Commits    Path
15    vars/sendNotifications.groovy
11    vars/gitTagPreBuild.groovy
10    vars/slackNotify.groovy
5    vars/gitTagCleanup.groovy
4    vars/gitTagSuccess.groovy
4    vars/setGithubStatus.groovy
4    vars/emailNotify.groovy
4    vars/gitBranchName.groovy

…

Wow, the top three files have been edited more than 10 times.

Clearly there is a problem, which is made even worse by the fact that sendNotifications.groovy was split off into two separate functions: slackNotify.groovy and emailNotify.groovy. The fact that sendNotifications.groovy was managing two separate notification paths was cause for the initial churn on that file, and certainly caused overly complex logic. Splitting the file into two separate functions was A Good Thing.

Diving into the slackNotify.groovy changes, I can see that I was very thoughtless in my initial implementation and committing strategy.

Two commits were actual feature extensions:

  1. Add an option to use blueOcean URLs for slack notifications
  2. Improve output for builds with no changes or first-builds: The commit that was built will be indicated in the message

The rest of the changes were formatting errors, typos, and other fixes for easily-identified errors.

There are some clear lessons here:

  1. I can identify and address problematic files long before 25 total changes (sendNotifications.groovy + slackNotify.groovy)
  2. To avoid high-churn on a file, follow good development processes. Expediency creates terrible history and higher-than-necessary churn. I would be embarrassed to do this on a professional project, so why did I take the expedient route on a personal (and public!) project?

Further Reading

modm: Moduler Object-Oriented Development for Microcontrollers

modm (Modular Object-oriented Development for Microcontrollers) is a C++14 framework built by Niklas Hauser and Fabian Grief. The modm project uses vendor-provided chip data with a library builder, enabling modm to automatically generate startup code, chip-specific drivers, external drivers, and BSPs for your platform. Since modm provides a portable HAL, you can easily migrate your software from one processor to another supported processor with no effort.

modm provides a framework which is suitable for bare-metal systems ranging from the ATtiny to a 32-bit ARM Cortex-M. The HAL features no memory allocations, low RAM consumption, and lightweight stack-less threads. The framework also provides useful algorithms suitable for bare-metal systems, as well as drivers for a wide variety of SPI and I2C peripherals. Multitasking is supported through protothreads, a stackless threading implementation targeted for memory constrained systems - each task only requires 2 bytes!

modm is well-tested, featuring 64 test groups with 343 test cases and over 4000 assertions. While the HAL is not fully tested in an automated manner, a variety of example hardware projects are regularly checked by the CI server.

The modm framework currently supports ~1350 AVR and ARM Cortex-M microcontrollers from Atmel, ST, and NXP. If you are using a processor from of those vendors, modm can provide your team with a stable foundation of drivers and the advantage of being able to quickly migrate your software to another processor.

Further Reading

For more on modm:

Simulating Open-Drain GPIO in Software

In today's day and age, it's rare to find a modern microcontroller that does not support configurable GPIO. We can easily take these configuration options for granted, especially when interacting with circuits and communication busses (e.g. I2C) that require GPIO to be configured in open-drain mode.

As embedded developers we do not always get to work with the latest-and-greatest parts. Sometimes we end up working with tiny or cheap processors (e.g. AVR), and other times we need to support a legacy part.

Fortunately, it's quite simple to recreate open-drain GPIO behavior in software.

Table of Contents:

  1. Open-Drain Support in Software
    1. Defining Types
    2. Setting Pin States
    3. Reading Pin Values
    4. Initial Configuration
  2. Further Reading

Open-Drain Support in Software

To enable open-drain support, we'll need to support three primary operations:

  1. Actively drive output low
  2. Put the port in input mode with a pull-up (logical 1)
  3. Put the port in high-impedance mode (floating input with no pull-up or pull-down enabled)

If the open-drain circuit has an external pull-up resistor, operation #2 is not necessary. However, in situations where there is not an external pull-up resistor, you will need to rely on the microcontroller's internal pull-ups.

We will create simple wrapper functions to enable these modes. I'll be using pseudo-code for these examples, since GPIO interfaces vary widely.

Defining Types

First, rather than using a plain 0 or 1 value, we'll define a custom type to describe the three possible states:

typedef enum
{
    OD_LOW = 0,
    OD_HIGH = 1,
    OD_HIGH_Z = 2
} opendrain_state_t;

As an alternative to the less-descriptive bool, we'll also create a helper type that can be used for initial configuration:

typedef enum
{
    OD_CONFIG_NO_PULLUP = 0,
    OD_CONFIG_PULLUP = 1,
} opendrain_config_t;

Setting Pin States

Now that we have our states defined, we can create a function to manage an open-drain pin:

void setOpenDrainPin(void* port, unsigned pin, opendrain_state_t state)
{
    switch(state)
    {
        case OD_LOW:
            setPinMode(port, pin, OUTPUT);
            setOutput(port, pin, 0);
            break;
        case OD_HIGH:
            setPinMode(port, pin, INPUT_PULLUP);
            break;
        case OD_HIGH_Z:
            // No pull-up in hi-z mode
            setPinMode(port, pin, INPUT);
            break;
    }
}

We can also create helper functions:

void setOpenDrainHigh(void * port, unsigned pin)
{
    setOpenDrainPin(port, pin, OD_HIGH);
}
void setOpenDrainLow(void * port, unsigned pin)
{
    setOpenDrainPin(port, pin, OD_LOW);
}
void setOpenDrainHiZ(void * port, unsigned pin)
{
    setOpenDrainPin(port, pin, OD_HIGH_Z);
}

Reading Pin Values

To read the state of an open-drain pin, no special behavior is necessary. Simply follow your normal procedures for reading the value of the pin.

Initial Configuration

We can then create a function which we will use to configure our open-drain pins:

void configureOpenDrainPin(void* port, unsigned pin, 
    opendrain_config_t config)
{
    switch(config)
    {
        case OD_CONFIG_NO_PULLUP:
            setOpenDrainHiZ(port, pin);
            break;
        case OD_CONFIG_PULLUP:
            setOpenDrainHigh(port, pin);
            break;
    }
}

This allows you to configure your pin in a straightforward way that is easier for other users to understand.

You could also expand support to limit the operations of the setOpenDrainPin function based on the initial configuration of that pin.

Further Reading