One of my clients noticed occasional build failures while using Jenkins. It was a strange situation, as their builds would suddenly see a burst of failures with no apparent change. I have been using the same Jenkins setup internally for the past year, and I have never observed such behavior.
Their software builds for three different configurations using the same repository. To support these configurations, the build server runs 3 different nightly builds and three continuous integration builds. Nightly builds are run from scratch, including the clone cycle. CI builds will utilize an existing environment where possible (e.g. CI for
master). CI builds will also perform a clone if it is a new PR that is being built.
While digging in to the failures, I noticed that it tied to multiple PRs being submitted within a short period of time. Since each build failure occurred as a git clone timeout, I was suspicious of GitHub throttling.
At first I thought we were making too many API requests, but we were well within GitHub's generous limit. I then noticed that their repository was 245MB in size, and became worried about GitHub throttling our downloads. Each new PR triggers three CI builds, which results in 245MB downloads on each server. If multiple PRs are submitted in a short span of time, I could definitely see GitHub cutting off our bits.
Further research led me to this GitHub issue which described a very similar situation, also due to large repo sizes and downloads.
To combat throttling problems with large repositories, I recommend the following settings for each build:
- Increase the timeout for clone/checkout operations to give yourself leeway in throttling situations (30-45min)
- Enable shallow clone with a depth of 1 to reduce download sizes
By applying these two changes, the intermittent clone failures were eliminated.