Dealing with Signed Commits When Creating and Splitting a Monorepo

A major roadblock on the way to the goal of developing in a monorepo and distributing to standalone repositories arose with signed commits. This article logs the problems I faced, options for addressing the issue, and my thoughts on the matter. This is still an open investigation, and I am logging my ideas and results. I will eventually update the article with the final resolution to this problem.

Table of Contents:

  1. Background
  2. Options for Rewriting History With Signed Commits
  3. First-Class Support in git and filter-repo
  4. Storing and Recreating Signatures in Commit Messages
  5. References

Background

The shopsys/monorepo-tools repository that I used as the basis of my tools noted clearly that it didn’t work with signed commits. I thought little of this when starting my monorepo efforts, since I didn’t think I was using signed commits at all.

I successfully merged my repositories together, and then later (after moving to filter-repo) I successfully split up history again for individual projects. That’s when I ran into a problem – on some projects, this splitting-and-pushing routine worked just fine. But on others, the splitting process recreated the same history, but each commit had a different commit hash. So, whenever I tried to push, I would (rightly) get rejected from the server due to deviating histories.

It confused me that the splitting process worked for some repositories, but not others. Analyzing the rewritten history tree after the split process, I found that all the proper commit values were restored, save one. The initial commit had a different hash, causing a cascading “hash failure”. After a deeper look, this was the case for all the problematic repositories.

So, why was the initial commit different? I was using signed commits without realizing it. When I created the repositories, I had GitHub helpfully create README and LICENSE files. And, it turns out, GitHub signs all the commits made through the web UI. So not only are initial commits problematic, but any merge commits and edits committed directly from the UI will also break.

Here’s the proper initial commit object:

(printf"commit %s\0" $(git --no-replace-objects cat-file commit HEAD | wc -c); git cat-file commit HEAD)
commit 661tree f374393bc68231c9107ceeca3550076938407d1a
author Phillip Johnston  1659540832 -0700
committer GitHub  1659540832 -0700
gpgsig -----BEGIN PGP SIGNATURE-----

 wsBcBAABCAAQBQJi6pVgCRBK7hj4Ov3rIwAAQIAIAAZ8wJS81BJuPXWjh63gEk/j
 mP/29+QN8KtJzUzlBJHyUv499DXCITup7DO4b6ww+E1lGDBeNOnv4ZCz2juzn9Jx
 1ySHtLtNsOHcWtwlZ6WLXKS3fXTIMAAYqM4zF2pe5qsegKKn7P24kJipROd9JTX+
 EEFGBZ2fLrHoyhr+YQOBggtapeW5b7eWtVCPueYxdkgbsSQNhV4bpGKdxWUelnG8
 Fj1KSISK84OK5UDc85KQaeUdo2B/DdRSjUzz8/3k5cWo+1hOEC9R0YqJTfgxzizp
 34przZncDJhbQ0lCHkRWIz/IagRwT/wVFVOaQlEF5kkp9XTONggG+vZnBNhDLTE=
 =x4Rx
 -----END PGP SIGNATURE-----


Initial commit%

After a rewrite, this signature information is missing:

(printf"commit %s\0" $(git --no-replace-objects cat-file commit HEAD | wc -c); git cat-file commit HEAD)
commit 188tree f374393bc68231c9107ceeca3550076938407d1a
author Phillip Johnston  1659540832 -0700
committer GitHub  1659540832 -0700

Initial commit%

Options for Rewriting History With Signed Commits

I researched and hacked around for a few days trying to make headway on this problem. I confirmed that filter-repo or filter-branch were actively stripping out signature information. The reasoning makes sense: we’re changing history, and thus the signatures are invalid because they apply to the original commit. Except in our use case, we are specifically recreating the signed commit in their original form, and so the signature contents would be valid with the restored commit.

Quote

If the tag has a signature attached, the signature will be stripped. It is by definition impossible to preserve signatures. The reason this is “nearly” proper, is because ideally if the tag did not change (points to the same object, has the same name, etc.) it should retain any signature. That is not the case, signatures will always be removed, buyer beware.

– filter-branch documentation

From the perspective of the monorepo, I don’t care at all about signed commits. It would be fine to strip them off. If we have signed commits in the monorepo, made through the GitHub UI, it will also be fine for signatures to be stripped off and pushed to the distributed repositories, which need no knowledge of them.

But we need those original signed commits for preserving commit history in the distributed repositories. Without signatures, we’re left with a few undesirable options:

  • Force push to the distributed repositories, resetting the history and removing the signed commits
    • This has the benefit of being a one-time event, allowing us to successfully push updates to the distribution repository in the future
    • This would break all the commit-specific links in articles on the website, which would require a significant investment to fix
    • Many of these repositories have users, and this would create significant problems:
      • Submodules would break, as previously working commits would no longer be valid
      • Dependency management schemes that reference a specific commit would break, as those commits would no longer be valid
      • Users with the repositories checked out would need to do a hard reset to the main branch on the server to get updates
  • Rebase new changes onto the original history in the distribution repository
    • We can push new commits without drama, preserving the signed commits through the original repository
    • All tags will be stripped from the commits – losing valuable versioning information in the process and making it impossible to pin dependency versions consistently
  • Mitigate the “force push problem” by preserving the original branch and commits
    • This keeps existing commit references valid for submodules and dependency management tools
    • This would be required to avoid breaking existing links from the website.
    • This increases the size of the repository and makes history more confusing

I would not hesitate to force push a single repository. But this problem occurs in most of our repositories. This makes it an unpalatable path, and one I would like to avoid.

First-Class Support in git and filter-repo

In my research, I found a link in the filter-repo issue for this problem to a set of patches for git’s fast-export and fast-import tools that would add more refined options for handling signed commits – one of which would be to include the signature verbatim. I could deal with this, as I don’t really care about signature validity for these commits in the monorepo. The only problem is that these patches have sat idle since April 2021, and Elijah Newren confirmed for me that the effort was dropped.

The first steps on this path are to verify whether this approach would work for me:

  1. Fork git, apply the existing patches, and build the tool
  2. Fork git-filter-repo, adjust the tool to use the new arguments (and add any other required support)
  3. Verify that the changes actually do what I want – preserve commit signatures and allow me to reproduce the same commit at the other end
  4. Figure out whether I can create a method with git-filter-repo that would allow me to preserve signatures without introducing invalid signatures into the monorepo

As of publishing this article, I have successfully created a proof-of-concept involving steps 1-4. I can merge histories from multiple repositories into a monorepo, force signatures to be preserved (which are then invalid in monorepo form), and recreate the original commit hashes during the splitting process. I have successfully completed a proof-of-concept for step 4 by creating a command line switch that will move commit signatures to a commit message, which can then be recovered at a later time. This is described in more detail in the next section.

From here, I’m have three paths forward:

  • Champion the changes and get them merged into git, then git-filter-repo
    • This would allow me to abandon my fork after all changes land and are released
    • I need to make additional changes to the git code to support filter-repo‘s expectations, as noted in this issue comment
    • I will need to make my required changes to filter-repo in such a way that they are palatable for general consumption
  • Continue to use my forked versions on my build server (and, perhaps, my local development machine)
    • Only the build server would need the forked tools so that it could automatically perform the split operations
    • It’s possible that I land changes in git and git-filter-repo, but not in a way that lets me eliminate my fork(s)
  • Decide it’s not worth it to continue and choose a different path forward
    • The email-based patch and feedback flow could be a nightmare that breaks me down and causes me to abandon the effort

Ideally, there would be a feasible long-term solution to this problem – it has been common enough for folks treading a similar path that I am on. This currently seems like the most achievable path and the one I am actively pursuing.

Storing and Recreating Signatures in Commit Messages

In the Shopsys issue about this problem, one of the Shopsys team members notes:

I suppose it is theoretically possible to store all data about the signature into the commit msg during building the monorepo and then use this data to recreate exactly the same signed commit during the splitting. I haven’t looked into it yet, but it should be possible to store all raw data of the commit encoded in the commit msg (it’s possible to store 100MiB there) and during splitting restore the exact commit…

This is an intriguing option. Admittedly, I have absolutely no idea how to achieve it without modifying git or git-filter-repo. Before my forays into git modifications, I spent a few hours investigating this path but made no headway.

However, this idea inspired exploratory point 4 in the previous section, which I have now successfully prototyped. Forcing the preservation of commit signatures is not ideal, as these signatures will be invalid in the rewritten monorepo history. So I created a command-line flag for git-filter-repo that would allow saving original signature information in the commit message. When signatures are moved to a commit message, I rename the gpgsig prefix to original_gpgsig, making it a) clear that this is an original signature for a rewritten commit, and b) easy to find these values in the future. I then created an additional command line switch that can be used during the splitting process. This switch looks for commit messages that contains original_gpgsig, extracts that from the commit message (restoring the original value), and properly restoring the signature in the commit object.

This path means that I can safely perform a full monorepo merge operation while preserving signatures. I don’t know whether or not this type of change will eventually be accepted within filter-repo. But at the very least I know that I have a workable path forward with a forked filter-repo.

Deprecated Investigatory Path

Prior to my focus on adding the necessary support to git , I had another idea. It seemed simple on the surface, but I did not invest much time in figuring out how to achieve it.

  • Create a per-repository mapping of signed commit hashes to split-and-unsigned hashes
  • Copy the contents of the signed commit hash into the proper location
  • Rewrite the tree to have a parent of the signed version, not the unsigned version

References

Share Your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.