Merge Git repositories, preserving chronological order of commits
Wednesday, May 22, 2019
Sometimes you want to split a huge monorepo into multiple smaller repositories, like if you were migrating from Subversion. What if you want to do the opposite? I did.
I had a set of tiny git repositories that describe how to build Debian packages in OpenSuse Build Service. The repositories had similar directory structure and even the commits in each repository were alike. Eventually I thought that merging all repositories into a single monorepo would be more efficient.
I started searching if somebody had already achieved something like this. I found few attempts but they all differ in a way how the history is preserved.
Basically all the solutions I found merge repositories with --allow-unrelated-histories
option. It allows merging branches that don’t share common ancestor commits. But in result the history of each repository is trapped inside merge commits. Here is how it would look for two repositories:
A B C D W X Y Z
•--•--•--• + •--•--•--•
|
v
A B C D W X Y Z
•--•--•--• + •--•--•--•
/ /
•--•-------------•--•-------------•
Which is inconvenient if you still want to reorder or squash some consequent commits across multiple repositories. This is how I wanted the commits, ordered chronologically:
A W B C X Y Z D
•--•--•--•--•--•--•--•
Here is the approach I went with:
- First we clone all repositories somewhere locally.
mkdir -p ~/merge-attempt/repos
cd ~/merge-attempt/repos
git clone git@githost:user/repo1
...
git clone git@githost:user/repo9
- Then we do
git format-patch
inside each repository to extract entire history into a set of patch files. These files will be named like0000-commit-message.patch
,0001-another-commit.patch
and so on. That makes them appear in chronological order, but that not very useful if you try to iterate over the patches from all repositories at once. The obvious way would be to rename the files to include commit date and time in their filenames.
cd repos
for R in *; do
pushd $R
git format-patch --root --src-prefix=a/$R/ --dst-prefix=b/$R/
for P in *.patch; do
grep '^Date: ' $P | \
sed 's/^Date: \(.*\)/\1/' | \
date -Iseconds -f - | \
sed 's/[:+]/-/g' | \
xargs -I{} mv $P {}.patch
done
mv *.patch ../..
popd
done
Here --src-prefix
and --dst-prefix
arguments allow us to preconfigure the patches, so when applied, the files would be created in subdirectories named according to original repository.
We should get something like this:
2018-09-23T23-42-02-03-00.patch
2018-12-10T13-51-56-02-00.patch
2019-04-16T20-05-20-03-00.patch
2019-04-16T20-25-31-03-00.patch
2019-04-16T20-35-56-03-00.patch
2019-04-16T20-38-16-03-00.patch
2019-04-16T20-49-06-03-00.patch
- Having all the patches ready, we need to create a new repository and apply them.
mkdir -p ~/merge-attempt/monorepo
cd ~/merge-attempt/monorepo
git init .
git am ../*.patch
git filter-branch --env-filter 'export GIT_COMMITTER_DATE="$GIT_AUTHOR_DATE"'
And that would be it.