Migrating to Git from Subversion

I recently have migrated one of our development teams from subversion to Git. This post is intended to share my experiences and some techniques for doing a clean migration.

The migration went smoothly overall. It helped that I had planned all of the steps of the migration out beforehand so that it was simply a matter of executing them the day of the migration.

Pre-Migration Steps

There were a few steps I went through in preparing for the migration. I wanted to get as much stuff done beforehand so I could minimize the time from when I ask users to stop committing to SVN and when I got Git up and running.

1. Generating an svn authors file. This step creates an authors file so subversion users are properly mapped to git users. I found this script here: http://technicalpickles.com/posts/creating-a-svn-authorsfile-when-migrating-from-subversion-to-git/. Obviously you need to modify the file afterwards adding in the name and email for each user.


#!/usr/bin/env bash
authors=$(svn log -q | grep -e '^r' | awk 'BEGIN { FS = "|" } ; { print $2 }' | sort | uniq)
for author in ${authors}; do
echo "${author} = NAME ";
done

2. Perform the initial checkout. Doing a clone of a SVN repository to git can take a long time. There is no reason you need to do this on the day of your SVN->Git migration. You can have this done beforehand and all set to go. The day of the migration, I simply had to do a ‘git svn rebase’ to get up to date and then start the migration rather than having to do a full new checkout.

I chose not to use the –no-metadata option. This option strips out SVN IDs from the git repository. Eventually I wanted this but if you do it first, you lose the ability to keep updating your repository. I chose to leave this until the end and remove them using a filter-branch command. This rewrites the repository in one of the final stages to remove the IDs.

Also, I wanted to only merge a few selected branches over. Because of this, I did an init command instead of a clone and modified the branches I wanted to bring across.

I also noticed there was some extras I did not want to bring to the new git repository. A second project lived in the same repository as well as documentation. To strip these out, I added the –ignore-paths option.

The checkout ended up looking like this:


# Initialize the repository
git svn init --stdlayout --ignore-paths "(Documentation|project2)" https://server/path/to/projects migration2

# Copy in the svn authors before we do the fetch
cp svn-authors.txt migration2/
git config svn.authorsfile ../svn-authors.txt

# Only pull down the branches I'm interested in. In my case, a single branch named 4.5
git config svn-remote.svn.branches branches/{4.5}:refs/remotes/*

# Get everything form subversion
git svn fetch

The Migration

On merge day I went through the following process. Some of the ideas were found on this blog post.

1. Get up to date with the latest from subversion. This was quite fast since almost everything was always cloned.


git svn rebase

2. Create local branches for the branches I want to preserve to the new repository.


git branch 4.5 refs/remotes/4.5
git branch dev master

3. In the subversion repository, we had 2 projects under trunk (trunk/project1 and trunk/project2). In git, I was only pulling across project one. The –ignore-paths on the init removed project2 but now I wanted to collapse the project1 directory to the root of the repository. This was done by the filter-branch command. It re-writes the repository removing the extra root directory.


git filter-branch --subdirectory-filter project1 -- --all

4. The next step was to remove the subversion IDs from the commit messages. This is done again with a filter-branch command. I needed the –force option since I previously ran a filter-branch without cleaning it up.


git filter-branch --msg-filter '
sed -e "/^git-svn-id:/d"
' --force

5. Run the git garbage collector and filesystem check


git gc --prune=now
git fsck

6. Remove any SVN sections from the repository


git config --remove-section svn
git config --remove-section svn-remote.svn
rm -rf .git/svn .git/refs/remotes/* .git/logs/refs/remotes/*

7. Finally, push to the git bare repository. I already had a bare repository set up so all I needed to do was push the branches over I wanted.


git push https://server/path/to/bare.git dev master 4.5

The only thing left was to do a clone and start working again.

Recommendations

The migration went very smoothly. The biggest recommendation I have to anyone wanting to move from subversion to git is to plan ahead. Look up how to do everything you want and put together a migration plan. Do as much work as you can up front. You want to get up and running on the new repository as quickly as possible.