Category Archives: Git

Migrating to Git from Subversion

I recently have migrated one of our development teams from subversion to Git. This post is intended to share my experiences and some techniques for doing a clean migration.

The migration went smoothly overall. It helped that I had planned all of the steps of the migration out beforehand so that it was simply a matter of executing them the day of the migration.

Pre-Migration Steps

There were a few steps I went through in preparing for the migration. I wanted to get as much stuff done beforehand so I could minimize the time from when I ask users to stop committing to SVN and when I got Git up and running.

1. Generating an svn authors file. This step creates an authors file so subversion users are properly mapped to git users. I found this script here: http://technicalpickles.com/posts/creating-a-svn-authorsfile-when-migrating-from-subversion-to-git/. Obviously you need to modify the file afterwards adding in the name and email for each user.


#!/usr/bin/env bash
authors=$(svn log -q | grep -e '^r' | awk 'BEGIN { FS = "|" } ; { print $2 }' | sort | uniq)
for author in ${authors}; do
echo "${author} = NAME ";
done

2. Perform the initial checkout. Doing a clone of a SVN repository to git can take a long time. There is no reason you need to do this on the day of your SVN->Git migration. You can have this done beforehand and all set to go. The day of the migration, I simply had to do a ‘git svn rebase’ to get up to date and then start the migration rather than having to do a full new checkout.

I chose not to use the –no-metadata option. This option strips out SVN IDs from the git repository. Eventually I wanted this but if you do it first, you lose the ability to keep updating your repository. I chose to leave this until the end and remove them using a filter-branch command. This rewrites the repository in one of the final stages to remove the IDs.

Also, I wanted to only merge a few selected branches over. Because of this, I did an init command instead of a clone and modified the branches I wanted to bring across.

I also noticed there was some extras I did not want to bring to the new git repository. A second project lived in the same repository as well as documentation. To strip these out, I added the –ignore-paths option.

The checkout ended up looking like this:


# Initialize the repository
git svn init --stdlayout --ignore-paths "(Documentation|project2)" https://server/path/to/projects migration2

# Copy in the svn authors before we do the fetch
cp svn-authors.txt migration2/
git config svn.authorsfile ../svn-authors.txt

# Only pull down the branches I'm interested in. In my case, a single branch named 4.5
git config svn-remote.svn.branches branches/{4.5}:refs/remotes/*

# Get everything form subversion
git svn fetch

The Migration

On merge day I went through the following process. Some of the ideas were found on this blog post.

1. Get up to date with the latest from subversion. This was quite fast since almost everything was always cloned.


git svn rebase

2. Create local branches for the branches I want to preserve to the new repository.


git branch 4.5 refs/remotes/4.5
git branch dev master

3. In the subversion repository, we had 2 projects under trunk (trunk/project1 and trunk/project2). In git, I was only pulling across project one. The –ignore-paths on the init removed project2 but now I wanted to collapse the project1 directory to the root of the repository. This was done by the filter-branch command. It re-writes the repository removing the extra root directory.


git filter-branch --subdirectory-filter project1 -- --all

4. The next step was to remove the subversion IDs from the commit messages. This is done again with a filter-branch command. I needed the –force option since I previously ran a filter-branch without cleaning it up.


git filter-branch --msg-filter '
sed -e "/^git-svn-id:/d"
' --force

5. Run the git garbage collector and filesystem check


git gc --prune=now
git fsck

6. Remove any SVN sections from the repository


git config --remove-section svn
git config --remove-section svn-remote.svn
rm -rf .git/svn .git/refs/remotes/* .git/logs/refs/remotes/*

7. Finally, push to the git bare repository. I already had a bare repository set up so all I needed to do was push the branches over I wanted.


git push https://server/path/to/bare.git dev master 4.5

The only thing left was to do a clone and start working again.

Recommendations

The migration went very smoothly. The biggest recommendation I have to anyone wanting to move from subversion to git is to plan ahead. Look up how to do everything you want and put together a migration plan. Do as much work as you can up front. You want to get up and running on the new repository as quickly as possible.

Version Control is like a Highway not a Tree

I’ve been doing a lot of work with Git lately and have done a lot of thinking about version control systems. I think our analogy of a ‘tree’ to represent the life-cycle of software versions is no longer relevant. Today, trees and branches do not adequately represent what version control systems are supposed to do.

Branching is Easy

All version control systems can branch fairly well. Simply creating a branch does not give you much. It is simply a copy so it is expected that it will work well.

What is the good of branching if you cannot merge.

Merging is Hard

The thing I love the most about Git is that it gets merging right. Other version control systems I’ve used can do merging but it always feels like a pain to do so.

Image Source: http://very-bored.com/pics/weirdtrees/weird-trees-8.jpg

The tree metaphor does not really fit with the concept of merging. So why do we still use it? Most of the time I see people drawing Git graphs in lanes.

A new metaphor

Source control is more like lanes on a highway. Commits (Cars) are free to move from lane to lane over time. Branching and merging have an equal weight.

Working with Git against Subversion Repositories

Despite my previous post on why Subversion is not Dead, I really do love Distributed Version Control Systems. I’ve wanted for some time to move our development teams over to Mercurial or Git.

Our team is highly distributed so we store out code in a master Subversion repository in the cloud hosted by Codesion. Recently they have offered Git hosting as well. No, we did not need this to move to git but it makes a nice place everyone can push/pull to and from without worrying about VPN connections.

We are currently in the process of migrating projects over to Git. I plan to post about that experience here after everything is working smoothly. In the interim, I have been getting a few developers, including myself using Git+SVN.

Git as a Subversion Client

Git+SVN is a capability that Git has to ‘clone’ a Subversion repository into a Git repository. Essentially, it allows you to locally get the benefits of using Git and still push and fetch from a master Subversion hub. This is a great option for people just starting out with Git and if you are in a company that requires you to use Subversion for whatever reason.

Getting started with Git+SVN is very easy. First you need to download Git and install it. To clone the subversion repository it was as easy as running this command:


git svn clone --stdlayout --username cdail https://url/to/svn/repository git-repo

If for some reason the command dies or hangs, you can resume by running a fetch. This happened to me a few times as my repository I was cloning was huge. To continue after a hang:


git svn fetch

Be warned that this may take a very long time. The first time I ran it, it took around 7 hours.

Most of the operations after this point are standard Git commands. The few exceptions are interacting with Subversion. To pull the latest changes from subversion, run:


git svn rebase

To push any changes already commited to your local Git repository to Subversion, run:


git svn dcommit

Local Branching and Git+SVN

It is possible to leverage all of the capabilities of Git locally like being able to branch easily. The trick of course is making sure everything can get back into Subversion properly when required. A simple ‘git merge’ may not be enough as Subversion requires you to keep a linear history of commits.

Rebasing is what you will need to do to rewrite your history to keep it linear. This concept may be foreign for Subversion users. I had to do quite a bit of reading myself before I really understood how it worked. I would recommend reading this which provides a good summary of rebasing.

Branching is very easy. I’ve compiled a few simple steps that should help you merge your work back into subversion after working on a branch.

Branching

To create a local branch run:

git branch featureX

You can switch between branches by running the checkout command:


git checkout master
git checkout featureX

Merging

First make sure your master branch is up to date with SVN. You can rebase directly from a branch but I chose to rebase from SVN on master only and do my local work on branches:


git checkout master
git svn rebase

The next step is to rebase your feature branch on master. This makes the history linear from master allowing you to merge properly in a way SVN will be happy with:


git checkout featureX
git rebase master

If at any point you want to see what is going, use the ‘gitk –all’ command to see a graphical view of commits and branches. Make sure everything looks linear.

Now that you are rebased, you can merge this feature to master without worrying about losing commits.


git checkout master
git merge featureX

Check in gitk to make sure you changes are all present and linear.

Now you can commit to SVN as you would normally.