MP

Maintaining Git History when Spinning Out a Subdirectory

I have been going through the process of open sourcing some of our previously internal crates at Spec. At Spec we have a large monorepo, which contains lots of small, modular Rust crates. The modularity makes open sourcing quite easy, since all we need to do is copy over the crate directory, remove any internal dependencies, and Bing Bang Boom, we're done.

Initially, I was naïvely just copying files out into new repos, but of course that approach sucks because you lose the entire multi-year history. I didn't mind that so much on crates where I was the sole author, but it seemed unsportsmanlike to allow my GitHub user to take full credit for things that other people worked on.

So, I looked up how to retain the history. This post summarizes what I learned, which includes addressing some additional points of complexity due to our internal repo having been in GitLab and thus not necessarily having a direct, 1:1 mapping in the commit history with GitHub users.

Pulling in the History

First, it is easier to do this on a fresh branch, so ideally you are reading this before you have already copied anything in. If not, no biggie, you'll just need to do some extra rebasing.

First, we need to make a branch on the source repo that contains only those commits that touch the subdirectory we care about. The git subtree command makes this easy, but it is quite slow (5-10 minutes) for a relatively large repository (7k commits) so go ahead and get it started before you do anything else:

# In the source repo
> git subtree split -P <path_to_subdirectory> -b <source_branch_name>

If your target repository is not brand new (i.e. if you already have some commits on your main branch), you will want to create an empty orphan branch. Otherwise, you can skip this step and just work on your main branch.

# In the target repo

# Create the orphan branch
> git checkout --orphan root
# Remove all tracked files
> git rm -rf .
# Remove all untracked files
> git clean -fd

Once the subtree command finishes, we just pull it on in:

# In the target repo
> git pull <filesystem_path_to_source_repo> <source_branch_name>

If you were working on a fresh orphan branch, you will now want to rebase your main branch onto it:

> git checkout main
> git rebase --onto root

There you go, you've now got your history, ezpz. In the next section, we'll talk about how to clean it up.

Cleaning Up the History

Likely, your monorepo commits contain stuff you don't want to go into the new repo's history, whether that be internal ticket references, information about other internal services, etc.

If you are moving from one forge to another (GitLab to GitHub in our case), you may also want to reassign authorship to the appropriate GitHub user. As an aside, I personally have largely moved off of GitHub for my own stuff, because of a longstanding lack of trust for Microsoft, but it remains unambiguously the forge with the best network effects, which is why we put our stuff there.

Anyway, first, cleaning up actual commit messages:

> git rebase -i --root

Update the command to reword any commits you want to update the messages of (probably safest to check them all), then go through the process of completing the rebase.

To assign authorship to the right users, you'll first want to figure out the email address of any users that you need to convert on the forge in question. The easiest way to do this that I have found to do this on GitHub is to:

From there, we'll do another round of rebasing onto the root:

> git rebase -i --root

This time, for any commits you'd like to adjust the authorship of, change the rebase action to edit. Then, as you go through the rebase, it will stop on each one you selected. When it stops, you can run:

> git commit --amend --author "Firstname Lastname <email>"
> git rebase --continue

And now you have a nice, clean commit history in your new repository, with authorship correctly attributed to the appropriate GitHub user.

Created: 2026-05-24

Tags: git, programming