An Experiment: Develop in a Monorepo and Distribute to Standalone Repositories

25 January 2023 by Phillip Johnston • Last updated 27 January 2023

I am ideologically aligned with distributing code in repositories that serve a single purpose. As a software consumer, I think that smaller, single purpose repositories and components are more approachable and easier to use.

Of course, every decision comes with its tradeoffs, and this one has caused us significant pain. The overhead in developing and maintaining many single-purpose repositories has become great enough that we have effectively halted development. This is not an ideal outcome.

In this article, I want to discuss the problems we have faced, our thoughts on an ideal model for our work, and what we’re going to do to get there.

Table of Contents:

The Problems with the Many Repos Approach
Our Ideal Process
The Current State of Affairs
Monorepo Development has Already Made Life Easier
References

The Problems with the Many Repos Approach

We now have nearly 100 repositories under management across our various organizations. After working with this many repositories, we have identified three significant problem areas:

Maintenance Burden
Easily Broken Dependencies
Diffused Attention

Maintenance Burden

As the number of repositories we manage has increased, maintenance has become increasingly expensive.

Most of our repositories share a significant amount of development infrastructure. This is great for bootstrapping projects with advanced capabilities. But it also means that updates need to be applied every repository we manage. Distributing updates across many repositories is time-consuming work, so updates have stopped being made with the frequency that is required. As a result, there has been lack of movement on many of our projects. At best, they have become frozen in place. But many have rotted and no longer successfully compile on our systems because of various external changes.

In the early days, we created scripts and aliases to make propagating infrastructure updates easier. This worked well when the total repository count was much smaller. But these solutions are brittle, too, and they require frequent updates of their own to remain useful. It is just as easy for these tools to become out-of-date as it is for the various repositories. This adds additional friction to the development process.

Perhaps this is simply another demonstration of the idea that software is a cost, not an asset. It is a significant burden to maintain this much code, especially in the current model.

Easily Broken Dependencies

Many of our single-purpose repositories have dependencies on other components within our ecosystem. This, in itself, is not a problem. But the distributed nature of the repositories makes it easy to break dependencies and not notice that you’ve done so.

You can make a change in one repository that works well and passes all CI checks. So you merge it into the main branch. But without knowing it, you’ve broken another repository. You weren’t working within that repository, so you didn’t notice any problems. In many CI setups, the CI server also didn’t run on the downstream project, because there was no pull request or new commits to trigger a build.

You won’t notice the new error until one of a few things happens:

You decide to work within that project
A pinned dependency is moved to a newer commit
The CI server builds downstream project using the latest dependency version
A user discovers the problem and files an issue

These scenarios involve an undesirably long feedback loop for us to learn about the problem. Diligence would help ease the problem, but it is difficult to remember all dependencies across all projects. You can certainly build supporting infrastructure to make these problems easier to discover, but we’re now back to maintaining additional tools and complicating our CI processes.

It would be better if we could shorten that feedback loop. Ideally, it would be an atomic operation, and we could not commit or merge without noticing the dependency break. This way, the component and all of its downstream dependencies are updated together.

Diffused Attention

Attention is a precious commodity. As a single developer, it is impossible for me to focus on 100 different repositories. I can only focus on one at a time – leaving the rest neglected. Issues linger, planned updates don’t happen, infrastructure falls behind, and rot sets in.

After years of working in this way, it seems to me that successful management of distributed repositories requires a team. Someone has to focus on each repository – or, at least, a small enough handful that they can successfully manage. But even teams are not enough to truly solve the problem. You still have to deal with communication and coordination between developers and the repositories they manage. The number of repositories under management can easily outpace the team size that your company can financially sustain, landing you in the same boat that we are in now.

Our Ideal Process

We need to reduce the friction when making updates across projects, shorten the feedback cycle for discovering dependency breaks, and reduce the need to diffuse focus over many repositories. After much thought, the simplest path forward appears to be “develop code in a monorepo”.

All the dependencies will be in one place, making it easy to:
- Create atomic commits to land cross-project changes at the same time
- Use one version to tag all projects, making it easier to pin distributed dependencies
- Identify and block breaking changes before we merge them
We can update shared infrastructure in a single operation
We can focus our internal development efforts on a single repository (and a single issue tracker)

However, our ideals didn’t change just because of the difficulties we’re facing. Even if we develop code in a monorepo, we still want to release projects for public consumption in single-purpose repositories. Many of these repositories have users, so we can’t just eliminate them. Hundreds of articles on our website link to these various repositories as well.

This means the ideal state is doing all of our development within a monorepo, but automatically distributing the latest changes to the appropriate individual repositories for users.

The Current State of Affairs

I started collecting thoughts for this article in 2022, when I finally faced the fact that I had painted myself in a corner. But as of this article’s publication, we are already well on our way to making this model a reality.

The adventure began with shopsys/monorepo-tools, which is a set of scripts developed by a company that is using the development model proposed above. These scripts were an excellent starting point, though they did not quite work out-of-the-box for us. We have since changed them to add support for git-lfs artifacts (which we rely on heavily) and to improve performance significantly.

After dozens of false starts, we have finally created a monorepo project and are working within it. However, it is still the early days, and we do not yet have a fully working ecosystem. There are still several tasks ahead of us:

Develop a consistent Meson build file pattern that works regardless of whether we are building as a monorepo, as individual Meson projects, or as Meson subprojects
[Figure out how to deal with signed commits(https://embeddedartistry.com/blog/2023/01/26/dealing-with-signed-commits-when-creating-and-splitting-a-monorepo/)%%[[Dealing with Signed Commits When Creating and Splitting a Monorepo]]%%
- Many of our repositories were initialized with a GitHub commit (e.g., to create a placeholder README and license file), or have GitHub-triggered merge commits.
- Signatures are not preserved in the current approach, as noted in the original shopsys/monorepo-tools as well as the documentation of tools like filter-branch and filter-repo
- The missing signatures have resulted in different commit hashes when exporting changes from the monorepo back to the original repositories
- Force pushing is not an option, as it would affect too many people and break all existing links on our website
Continue to develop supporting infrastructure to make distributed file updates easier

I’ll be sharing more posts as I continue working in this way to share any insights or processes I am using. I certainly am not the only one wrestling with these kinds of issues. If any readers work similarly, I’d love to connect with you and share tips and horror stories.

Monorepo Development has Already Made Life Easier

I have resisted monorepos for much of my career. I was still cautious about taking this approach, but I finally convinced myself during my initial exploration efforts.

We use Catch2 for testing C++ code. In May 2022, they released a new major version. Catch v3.x required changes to the header includes and how the test program is built. Version 3.x provided significant build time improvements compared to v2.x, so we wanted to switch. But we never found a suitable chunk of time to go through and update the dozens of repositories using Catch2.

Once we successfully created a workable monorepo, we moved all projects to Catch 3.x within ten minutes. It was a straightforward and painless process, largely relying on find-and-replace and file copies.

Update the Catch2 Meson wrap file to use the latest release
Update the reusable Catch2 build module to:
- Eliminate the placeholder main.cpp file from build targets
- Link to the new dependency provided by the Catch2 build system
Perform a find-and-replace across the whole repository to change the included header path
Build the full monorepo project and verified that all test programs were successful
Run support scripts to update build infrastructure and Meson wrap dependency files in the individual projects (for future distribution)
Make a single commit so that all the changes land together
Created a tag to mark the update

It is difficult for me to convey the sense of relief I felt when we made this update. We finally completed something that has been weighing on me for 7 months – all in a matter of minutes.

References

Related Terms:

3 Replies to “An Experiment: Develop in a Monorepo and Distribute to Standalone Repositories”

Johan Tunhag says:

15 February 2023 at 22:10

Back when you were still multi-repo, did you look at (and discard?) https://gerrit.googlesource.com/Git-repo?
Phillip Johnston says:

16 February 2023 at 08:29

I did, as well as Sourcegraph’s Batch Changes. git-repo results in a similar working style. But I do not use gerrit and my own automated workflows would be easier to manage via PRs and a monorepo.
Christian Schilling says:

7 March 2023 at 14:42

I was in pretty much the same spot a few years ago. Even in a similar field (embedded C++ @ https://esrlabs.com) and implemented a similar solution, with great success: https://github.com/josh-project/josh
A lot has happened since then and the project got developed quite a bit further. I’d be happy to chat and share some experiences. (Contact me via email)

Share Your ThoughtsCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.