Tag Archives: version control

Migrating to Git from Subversion

I recently have migrated one of our development teams from subversion to Git. This post is intended to share my experiences and some techniques for doing a clean migration.

The migration went smoothly overall. It helped that I had planned all of the steps of the migration out beforehand so that it was simply a matter of executing them the day of the migration.

Pre-Migration Steps

There were a few steps I went through in preparing for the migration. I wanted to get as much stuff done beforehand so I could minimize the time from when I ask users to stop committing to SVN and when I got Git up and running.

1. Generating an svn authors file. This step creates an authors file so subversion users are properly mapped to git users. I found this script here: http://technicalpickles.com/posts/creating-a-svn-authorsfile-when-migrating-from-subversion-to-git/. Obviously you need to modify the file afterwards adding in the name and email for each user.


#!/usr/bin/env bash
authors=$(svn log -q | grep -e '^r' | awk 'BEGIN { FS = "|" } ; { print $2 }' | sort | uniq)
for author in ${authors}; do
echo "${author} = NAME ";
done

2. Perform the initial checkout. Doing a clone of a SVN repository to git can take a long time. There is no reason you need to do this on the day of your SVN->Git migration. You can have this done beforehand and all set to go. The day of the migration, I simply had to do a ‘git svn rebase’ to get up to date and then start the migration rather than having to do a full new checkout.

I chose not to use the –no-metadata option. This option strips out SVN IDs from the git repository. Eventually I wanted this but if you do it first, you lose the ability to keep updating your repository. I chose to leave this until the end and remove them using a filter-branch command. This rewrites the repository in one of the final stages to remove the IDs.

Also, I wanted to only merge a few selected branches over. Because of this, I did an init command instead of a clone and modified the branches I wanted to bring across.

I also noticed there was some extras I did not want to bring to the new git repository. A second project lived in the same repository as well as documentation. To strip these out, I added the –ignore-paths option.

The checkout ended up looking like this:


# Initialize the repository
git svn init --stdlayout --ignore-paths "(Documentation|project2)" https://server/path/to/projects migration2

# Copy in the svn authors before we do the fetch
cp svn-authors.txt migration2/
git config svn.authorsfile ../svn-authors.txt

# Only pull down the branches I'm interested in. In my case, a single branch named 4.5
git config svn-remote.svn.branches branches/{4.5}:refs/remotes/*

# Get everything form subversion
git svn fetch

The Migration

On merge day I went through the following process. Some of the ideas were found on this blog post.

1. Get up to date with the latest from subversion. This was quite fast since almost everything was always cloned.


git svn rebase

2. Create local branches for the branches I want to preserve to the new repository.


git branch 4.5 refs/remotes/4.5
git branch dev master

3. In the subversion repository, we had 2 projects under trunk (trunk/project1 and trunk/project2). In git, I was only pulling across project one. The –ignore-paths on the init removed project2 but now I wanted to collapse the project1 directory to the root of the repository. This was done by the filter-branch command. It re-writes the repository removing the extra root directory.


git filter-branch --subdirectory-filter project1 -- --all

4. The next step was to remove the subversion IDs from the commit messages. This is done again with a filter-branch command. I needed the –force option since I previously ran a filter-branch without cleaning it up.


git filter-branch --msg-filter '
sed -e "/^git-svn-id:/d"
' --force

5. Run the git garbage collector and filesystem check


git gc --prune=now
git fsck

6. Remove any SVN sections from the repository


git config --remove-section svn
git config --remove-section svn-remote.svn
rm -rf .git/svn .git/refs/remotes/* .git/logs/refs/remotes/*

7. Finally, push to the git bare repository. I already had a bare repository set up so all I needed to do was push the branches over I wanted.


git push https://server/path/to/bare.git dev master 4.5

The only thing left was to do a clone and start working again.

Recommendations

The migration went very smoothly. The biggest recommendation I have to anyone wanting to move from subversion to git is to plan ahead. Look up how to do everything you want and put together a migration plan. Do as much work as you can up front. You want to get up and running on the new repository as quickly as possible.

Version Control is like a Highway not a Tree

I’ve been doing a lot of work with Git lately and have done a lot of thinking about version control systems. I think our analogy of a ‘tree’ to represent the life-cycle of software versions is no longer relevant. Today, trees and branches do not adequately represent what version control systems are supposed to do.

Branching is Easy

All version control systems can branch fairly well. Simply creating a branch does not give you much. It is simply a copy so it is expected that it will work well.

What is the good of branching if you cannot merge.

Merging is Hard

The thing I love the most about Git is that it gets merging right. Other version control systems I’ve used can do merging but it always feels like a pain to do so.

Image Source: http://very-bored.com/pics/weirdtrees/weird-trees-8.jpg

The tree metaphor does not really fit with the concept of merging. So why do we still use it? Most of the time I see people drawing Git graphs in lanes.

A new metaphor

Source control is more like lanes on a highway. Commits (Cars) are free to move from lane to lane over time. Branching and merging have an equal weight.

Working with Git against Subversion Repositories

Despite my previous post on why Subversion is not Dead, I really do love Distributed Version Control Systems. I’ve wanted for some time to move our development teams over to Mercurial or Git.

Our team is highly distributed so we store out code in a master Subversion repository in the cloud hosted by Codesion. Recently they have offered Git hosting as well. No, we did not need this to move to git but it makes a nice place everyone can push/pull to and from without worrying about VPN connections.

We are currently in the process of migrating projects over to Git. I plan to post about that experience here after everything is working smoothly. In the interim, I have been getting a few developers, including myself using Git+SVN.

Git as a Subversion Client

Git+SVN is a capability that Git has to ‘clone’ a Subversion repository into a Git repository. Essentially, it allows you to locally get the benefits of using Git and still push and fetch from a master Subversion hub. This is a great option for people just starting out with Git and if you are in a company that requires you to use Subversion for whatever reason.

Getting started with Git+SVN is very easy. First you need to download Git and install it. To clone the subversion repository it was as easy as running this command:


git svn clone --stdlayout --username cdail https://url/to/svn/repository git-repo

If for some reason the command dies or hangs, you can resume by running a fetch. This happened to me a few times as my repository I was cloning was huge. To continue after a hang:


git svn fetch

Be warned that this may take a very long time. The first time I ran it, it took around 7 hours.

Most of the operations after this point are standard Git commands. The few exceptions are interacting with Subversion. To pull the latest changes from subversion, run:


git svn rebase

To push any changes already commited to your local Git repository to Subversion, run:


git svn dcommit

Local Branching and Git+SVN

It is possible to leverage all of the capabilities of Git locally like being able to branch easily. The trick of course is making sure everything can get back into Subversion properly when required. A simple ‘git merge’ may not be enough as Subversion requires you to keep a linear history of commits.

Rebasing is what you will need to do to rewrite your history to keep it linear. This concept may be foreign for Subversion users. I had to do quite a bit of reading myself before I really understood how it worked. I would recommend reading this which provides a good summary of rebasing.

Branching is very easy. I’ve compiled a few simple steps that should help you merge your work back into subversion after working on a branch.

Branching

To create a local branch run:

git branch featureX

You can switch between branches by running the checkout command:


git checkout master
git checkout featureX

Merging

First make sure your master branch is up to date with SVN. You can rebase directly from a branch but I chose to rebase from SVN on master only and do my local work on branches:


git checkout master
git svn rebase

The next step is to rebase your feature branch on master. This makes the history linear from master allowing you to merge properly in a way SVN will be happy with:


git checkout featureX
git rebase master

If at any point you want to see what is going, use the ‘gitk –all’ command to see a graphical view of commits and branches. Make sure everything looks linear.

Now that you are rebased, you can merge this feature to master without worrying about losing commits.


git checkout master
git merge featureX

Check in gitk to make sure you changes are all present and linear.

Now you can commit to SVN as you would normally.

Version Control and Bug tracking Integration (with Subversion and Bugzilla)

Two of the most useful tools to a developer outside of their development environment are version control and bug tracking systems. Version control allows tracking of changes to the product and allows for branching and merging. Bug tracking systems allow for tracking issues with the product whether they be bugs or enhancements.

Even though these tools are often separate products, they have a major commonality which is the code you are working with. Often times you want to be able to see for any given bug number, what code was changed for that bug. Also, for a change in the code (in version control) you want to see if it was associated with a particular issue in the bug tracking software.

At the company I work for we use Subversion for version control and Bugzilla for bug tracking. We have some best practices around these tools to make things easier.

Version Control and Bug Tracking Best Practices

When resolving issues in the bug tracking database, our team always puts in the build number of the build that contains the fix. This way a person who is looking at the bug can know if the build they have contains the fix. Anytime our team fixes a bug we put in a comment that looks like this:


Build Fixed: 1.0.1.12354

The last number is the revision number in Subversion.

When we commit code changes to Subversion, we also include the bug number for the bug being fixed. Our commit messages always appear in this format:


Bug 1234: Fixed this bug

Subversion Tooling

Recently I came across a neat feature in Subversion that allows you to link it to a bug tracking system. Basically this allows clicking on the bug number in the subversion history view to take you directly to the bug number in the bug tracking software.

Enabling this feature is fairly simple to do and involves setting 2 properties in the subversion repository. These properties need to be set on the root folder in subversion that you would use to checkout your project from. It automatically is available for everything in that tree but you need to checkout from this root for it to work. These are the two properties that need to be set.

  • bugtraq:logregex – This defines a regular expression to ‘match’ bug numbers in subversion comments. For the pattern I listed above, we are using: [Bb][Uu][Gg] (\d+)
  • bugtraq:url – This defines a URL to go to when the user clicks on a bug number. The browser is launched when the number is clicked on and takes you to this URL replacing the BUGID parameter. For our bugzilla repository we are using: https://some.server.somewhere.localhost/show_bug.cgi?id=%BUGID%

The following steps walk through this process of how to set this up using Tortoise SVN:

  • On the root folder of your subversion working copy, right click on the folder and click TortoiseSVN -> Properties.

  • Add each property listed above as new properties to the list.

Just like Taking out the Garbage (With Version Control)

I had a Mathematics teacher in High School who used to get very excited over factoring problems where you could simplify expressions by canceling out terms. He used to say it was just like “taking out the garbage”. Taking out the garbage was not usually a fun task but it is surprisingly satisfying to get rid of stuff that is not necessary and is just clutter. I run into this same sort of thing when developing software. I just love to “take out the garbage”.

Version Control software is essential to any project, no matter how small (That in itself is a topic for another day). Version Control software gives you the ability to take out the garbage all of the time without having to worry about losing something that is important. You can always go back to old versions of a file if you need them at a later point in time.

Delete Unused Classes

Often when you are refactoring a component or adding something you end up with a Class that was used before that is no longer relevant. Delete it. Remember that you can always get it back later if you need it again through your Version Control system. It does not matter how ‘useful’ this code was, how ‘nice’ it looks, how ‘cool’ it is. It may have been a useful utility that you might need again. You have to resist these urges to keep it around. Unless you know for a fact that you will use it again, you should get rid of it. It is not gone forever and you can get it back if you need it. This will help reduce the complexity of your code and consequently the readability.

Delete Unused Code, Don’t Comment It Out

I see a lot of developers who take a piece of code that is no longer used and comment out the entire section. Resist the urge to keep it inside the code. You can always get the code back through version control, so why do you want to keep a long comment block somewhere where it needs to be maintained? Another danger to this is that if you do want to use this code later on, you will likely end up removing comments around code that no longer compiles. Things change and this once working code may no longer work after you uncomment it.

The follow code is a good example of this. The line that was commented called a function that takes 2 parameters. Notice that the current version of the function takes only one. Code blocks that are commented out are not compiled so the code is not kept up to date with the rest of the code.

// We don’t need to do this anymore
// variable = someFunction(a, b)

someFunction(a) {

}

It is simply better to just delete the code block. If you need it at some point in time later, the version control software can resurrect your lost code.

Bug #12324 Add a heading here
Bug reference numbers are not necessary

I have seen lots of code with comments around a changed block indicating the change was for a particular bug. Before long, the code has a mess of comments all over the place indicating bugs that were fixed and where. This is another case where version control software can eliminate comments that unnecessarily clutter your code. This one does require a bit more discipline though.

Always indicate the bug number being fixed and a small when checking in code into Version Control. For example, you could add a comment like “Bug 12324: Added heading to section”. Then when you look through the Version Control log, you can easily see where changes were made and what they were made for.

If you are looking at a particular line of code and you need to know where it came from and why, you can use the “blame” feature. That will give you the person who last changed the line of code you are looking at along with the comment (which should include the bug number and fix description). For more information on this feature of version control systems, check out the “Who wrote this Crap?” (http://www.codinghorror.com/blog/archives/000992.html) article on Coding Horror.

Taking Out the Garbage

So next time you are confronted with “stuff” that is not longer used. Delete it and let your version control do the work of remembering it. This will keep your code much cleaner and easier to read. It is just like taking out the garbage.