GitHub

Seeing Intermittent GitHub Clone Failures on Jenkins? Check Your Repo Size

One of my clients noticed occasional build failures while using Jenkins. It was a strange situation, as their builds would suddenly see a burst of failures with no apparent change. I have been using the same Jenkins setup internally for the past year, and I have never observed such behavior.

Their software builds for three different configurations using the same repository. To support these configurations, the build server runs 3 different nightly builds and three continuous integration builds. Nightly builds are run from scratch, including the clone cycle. CI builds will utilize an existing environment where possible (e.g. CI for master). CI builds will also perform a clone if it is a new PR that is being built.

While digging in to the failures, I noticed that it tied to multiple PRs being submitted within a short period of time. Since each build failure occurred as a git clone timeout, I was suspicious of GitHub throttling.

At first I thought we were making too many API requests, but we were well within GitHub's generous limit. I then noticed that their repository was 245MB in size, and became worried about GitHub throttling our downloads. Each new PR triggers three CI builds, which results in 245MB downloads on each server. If multiple PRs are submitted in a short span of time, I could definitely see GitHub cutting off our bits.

Further research led me to this GitHub issue which described a very similar situation, also due to large repo sizes and downloads.

To combat throttling problems with large repositories, I recommend the following settings for each build:

  1. Increase the timeout for clone/checkout operations to give yourself leeway in throttling situations (30-45min)
  2. Enable shallow clone with a depth of 1 to reduce download sizes

By applying these two changes, the intermittent clone failures were eliminated.

Screen Shot 2018-01-30 at 09.24.31.png

Safely Storing Secrets in Git

Updated: 2018-11-05

I've worked on many projects where sensitive information is stored in a git repository. This includes SSH keys, SSL private keys, API keys, passwords, and client secrets. This practice is easily labelled Not Good.

You might be thinking, "Why should I worry? Our GitHub repository is private, and only our team can access the secrets!"

A private repository on GitHub or GitLab may seem like a safe storage area, since access is limited to a small set of specified parties. However, many teams don't think twice about allowing third party services to access these private repositories. Once you've enabled access, these services now have permission to read your secrets. Even if the service isn't malicious, thereis the chance that there is a vulnerability in the services that could be exploited to expose the files. Or perhaps there is a configuration flaw in your build server which grants everyone on the internet read access to your projects.

Storing secrets in our repositories is alluring because of its convenience, and I found myself recently tempted by this practice. I spent time investigating solutions to the secret storage problem and came across three easily incorporated projects:

  1. BlackBox
  2. git-secret
  3. git-crypt

I currently use BlackBox, as it was the first solution I discovered. However, I will be migrating to git-crypt because it offers a superior workflow with little user involvement.

BlackBox

BlackBox was created by Stack Overflow, initially to support secret storage for use by Puppet. Unlike the other two solutions I'll discuss, BlackBox is the only one that supports a multiple version control systems.

Protected files are encrypted with the public keys of all trusted users. If access needs to be revoked, delete the public key and re-encrypt the files.

BlackBox can encrypt and decrypt individual files. If you are using Puppet, you can also encrypt/decrypt individual strings.

I recommend BlackBox for developers who do not use git or work with a variety of version control systems.

git-secret

git-secret is a bash script based on gpg. git-secret allows you to encrypt individual files for storage inside of a git repository. git-secret provides commands to encrypt secret files before pushing to the server and to decrypt them for local use.

Protected files are encrypted with the public keys of all trusted users. If access needs to be revoked, delete the public key and re-encrypt the files. They will no longer be able to decrypt the secrets.

You can install git-secret with apt, yum, and brew.

git-crypt

git-crypt is a program that operates similarly to git-secret. Files that should be encrypted by git-crypt are tracked in the .gitattributes file, similarly to git-lfs.

Access permissions can be managed by adding individual gpg keys, similarly to git-secret. git-crypt also allows you to export a symmetric key which can be provided to collaborators.

Aside from an initial unlock command that needs to be used after cloning the repository, git-crypt encryption and decryption operations happen transparently. I find this workflow to be superior to git-secret and BlackBox.

You can install git-crypt through brew if you are on OSX. Otherwise you can manually compile git-crypt for your system.

Further Reading

Change Log

  • 20181105:
    • Fixed git-crypt and git-secret links

Jenkins: Kick off a CI Build with GitHub Push Notifications

When creating a Jenkins multi-branch pipeline job, builds will be triggered based on the rules set for each job. By default, each repository is scanned on a timer (e.g. every 30min, once a day) and builds are triggered if new commits or pull requests have been made since the last scan. However, what I really want is for my continuous integration builds to get kicked off immediately after someone opens a new PR or pushes a new commit.

I was initially stumped as to why this wasn't working - I had gone through quite a few GitHub setup steps when configuring Jenkins. My web searches didn't turn up anything obvious, but eventually I stumbled across a helpful Cloudbees article describing web hook configuration. Hopefully someone else who is stumped will find this helpful guide!

In order for builds to be triggered automatically by PUSH and PULL REQUEST events, a Jenkins Web Hook needs to be added to each GitHub repository or organization that interacts with your build server. You (or someone who can help) will need admin permissions on that repository.

Step-by-Step Guide

For each GitHub repository or organization that you need to configure, perform the following steps:

  1. Navigate to the "Settings" tab.
  2. Select the "Webhooks" option on the left menu
  3. Click "Add Webhook"
  4. For "Payload URL":
    • Use the address for the Jenkins server instance (e.g. http://myjenkins.com)
    • Add /github-webhook/ to the end of it.
    • Make sure to include the last /!
      • example: http://myjenkins.com/github-webhook/
  5. Select "application/json" as the encoding type
  6. Leave "Secret" blank (unless a secret has been created and configured in the Jenkins "Configure System -> GitHub plugin" section)
  7. Select "Let me select individual events"
    • Enable PUSH event
    • Enable Pull Request event
  8. Make sure "Active" is checked
  9. Click "Add Webhook"

Jenkins will now receive push and pull request notifications for that repository, and related builds will be automatically triggered.

Further Reading