Tag Archives: subversion

Migrating to Git from Subversion

I recently have migrated one of our development teams from subversion to Git. This post is intended to share my experiences and some techniques for doing a clean migration.

The migration went smoothly overall. It helped that I had planned all of the steps of the migration out beforehand so that it was simply a matter of executing them the day of the migration.

Pre-Migration Steps

There were a few steps I went through in preparing for the migration. I wanted to get as much stuff done beforehand so I could minimize the time from when I ask users to stop committing to SVN and when I got Git up and running.

1. Generating an svn authors file. This step creates an authors file so subversion users are properly mapped to git users. I found this script here: http://technicalpickles.com/posts/creating-a-svn-authorsfile-when-migrating-from-subversion-to-git/. Obviously you need to modify the file afterwards adding in the name and email for each user.


#!/usr/bin/env bash
authors=$(svn log -q | grep -e '^r' | awk 'BEGIN { FS = "|" } ; { print $2 }' | sort | uniq)
for author in ${authors}; do
echo "${author} = NAME ";
done

2. Perform the initial checkout. Doing a clone of a SVN repository to git can take a long time. There is no reason you need to do this on the day of your SVN->Git migration. You can have this done beforehand and all set to go. The day of the migration, I simply had to do a ‘git svn rebase’ to get up to date and then start the migration rather than having to do a full new checkout.

I chose not to use the –no-metadata option. This option strips out SVN IDs from the git repository. Eventually I wanted this but if you do it first, you lose the ability to keep updating your repository. I chose to leave this until the end and remove them using a filter-branch command. This rewrites the repository in one of the final stages to remove the IDs.

Also, I wanted to only merge a few selected branches over. Because of this, I did an init command instead of a clone and modified the branches I wanted to bring across.

I also noticed there was some extras I did not want to bring to the new git repository. A second project lived in the same repository as well as documentation. To strip these out, I added the –ignore-paths option.

The checkout ended up looking like this:


# Initialize the repository
git svn init --stdlayout --ignore-paths "(Documentation|project2)" https://server/path/to/projects migration2

# Copy in the svn authors before we do the fetch
cp svn-authors.txt migration2/
git config svn.authorsfile ../svn-authors.txt

# Only pull down the branches I'm interested in. In my case, a single branch named 4.5
git config svn-remote.svn.branches branches/{4.5}:refs/remotes/*

# Get everything form subversion
git svn fetch

The Migration

On merge day I went through the following process. Some of the ideas were found on this blog post.

1. Get up to date with the latest from subversion. This was quite fast since almost everything was always cloned.


git svn rebase

2. Create local branches for the branches I want to preserve to the new repository.


git branch 4.5 refs/remotes/4.5
git branch dev master

3. In the subversion repository, we had 2 projects under trunk (trunk/project1 and trunk/project2). In git, I was only pulling across project one. The –ignore-paths on the init removed project2 but now I wanted to collapse the project1 directory to the root of the repository. This was done by the filter-branch command. It re-writes the repository removing the extra root directory.


git filter-branch --subdirectory-filter project1 -- --all

4. The next step was to remove the subversion IDs from the commit messages. This is done again with a filter-branch command. I needed the –force option since I previously ran a filter-branch without cleaning it up.


git filter-branch --msg-filter '
sed -e "/^git-svn-id:/d"
' --force

5. Run the git garbage collector and filesystem check


git gc --prune=now
git fsck

6. Remove any SVN sections from the repository


git config --remove-section svn
git config --remove-section svn-remote.svn
rm -rf .git/svn .git/refs/remotes/* .git/logs/refs/remotes/*

7. Finally, push to the git bare repository. I already had a bare repository set up so all I needed to do was push the branches over I wanted.


git push https://server/path/to/bare.git dev master 4.5

The only thing left was to do a clone and start working again.

Recommendations

The migration went very smoothly. The biggest recommendation I have to anyone wanting to move from subversion to git is to plan ahead. Look up how to do everything you want and put together a migration plan. Do as much work as you can up front. You want to get up and running on the new repository as quickly as possible.

Working with Git against Subversion Repositories

Despite my previous post on why Subversion is not Dead, I really do love Distributed Version Control Systems. I’ve wanted for some time to move our development teams over to Mercurial or Git.

Our team is highly distributed so we store out code in a master Subversion repository in the cloud hosted by Codesion. Recently they have offered Git hosting as well. No, we did not need this to move to git but it makes a nice place everyone can push/pull to and from without worrying about VPN connections.

We are currently in the process of migrating projects over to Git. I plan to post about that experience here after everything is working smoothly. In the interim, I have been getting a few developers, including myself using Git+SVN.

Git as a Subversion Client

Git+SVN is a capability that Git has to ‘clone’ a Subversion repository into a Git repository. Essentially, it allows you to locally get the benefits of using Git and still push and fetch from a master Subversion hub. This is a great option for people just starting out with Git and if you are in a company that requires you to use Subversion for whatever reason.

Getting started with Git+SVN is very easy. First you need to download Git and install it. To clone the subversion repository it was as easy as running this command:


git svn clone --stdlayout --username cdail https://url/to/svn/repository git-repo

If for some reason the command dies or hangs, you can resume by running a fetch. This happened to me a few times as my repository I was cloning was huge. To continue after a hang:


git svn fetch

Be warned that this may take a very long time. The first time I ran it, it took around 7 hours.

Most of the operations after this point are standard Git commands. The few exceptions are interacting with Subversion. To pull the latest changes from subversion, run:


git svn rebase

To push any changes already commited to your local Git repository to Subversion, run:


git svn dcommit

Local Branching and Git+SVN

It is possible to leverage all of the capabilities of Git locally like being able to branch easily. The trick of course is making sure everything can get back into Subversion properly when required. A simple ‘git merge’ may not be enough as Subversion requires you to keep a linear history of commits.

Rebasing is what you will need to do to rewrite your history to keep it linear. This concept may be foreign for Subversion users. I had to do quite a bit of reading myself before I really understood how it worked. I would recommend reading this which provides a good summary of rebasing.

Branching is very easy. I’ve compiled a few simple steps that should help you merge your work back into subversion after working on a branch.

Branching

To create a local branch run:

git branch featureX

You can switch between branches by running the checkout command:


git checkout master
git checkout featureX

Merging

First make sure your master branch is up to date with SVN. You can rebase directly from a branch but I chose to rebase from SVN on master only and do my local work on branches:


git checkout master
git svn rebase

The next step is to rebase your feature branch on master. This makes the history linear from master allowing you to merge properly in a way SVN will be happy with:


git checkout featureX
git rebase master

If at any point you want to see what is going, use the ‘gitk –all’ command to see a graphical view of commits and branches. Make sure everything looks linear.

Now that you are rebased, you can merge this feature to master without worrying about losing commits.


git checkout master
git merge featureX

Check in gitk to make sure you changes are all present and linear.

Now you can commit to SVN as you would normally.

Subversion Merge Change-log in 10 lines of Groovy

Problem. Bugs happen. The common solution to this problem is to fix the bug and release a patch. Version 1.0 has bugs, version 1.0.1 fixes those bugs.

Inevitably at some point in time you will need to put together a list of all of the changes in a release. For me, this needs to go into a format we can post on our wiki. This process can be tedious if it is a manual process. There are a few approaches to handling this. You can go against the bug tracking repository and look for what bugs were fixed for this release. This will tell you everything that should have changed. I say ‘should have’ here because you cannot know for sure if the information is 100% accurate.

The other option is to go to the version control repository for information on what has change. This is the authoritative source of what has changed but often contains more information than what you would want in a change-log.

In my previous post on version control I mentioned that we have best practices around format for commit messages. All bugs start with the words “Bug

  • The ‘–xml’ option is used to format the output as XML. This allows groovy to break it down easily.
  • The ‘-g’ option is used which shows log messages from other revisions that were merged onto this branch. Let’s say you have 50 bugs that are merged onto the bug fix branch all at once. This would create a single revision on the branch. Using this option includes all 50 comments from their original commit on the trunk. This detail we want in the change-log. This gives nested entries though so the code has to handle that case.
  • The ‘-r’ option is used to specify the revision range to use. In this case for a branch, we want from the previous release revision number to the current (or HEAD). For this example, let’s assume the 1.0 branch was at revision 1528.
  • The command to run then becomes:

    
    svn log -r HEAD:1528 -g --xml
    
    

    The next step that needs to be done is to take this XML and turn it into a change-log. I plan to use this as a comment into a wiki so I prefix the lines with ‘*’ so they will appear as a bulleted list in trac. It also puts the revision number at the end of the line in brackets. The output should look like this:

    
     * Bug 123: Fixed some bug (1554)
     * Bug 126: Some other issue (1588)
     * Bug 322: Fixed the thing (1600)
    
    

    To generate this changelog, I wrote a groovy script. It uses the svn command to generate the changelog and uses Groovy’s XML Parsing to break it up and format it. The path to the working directory and revision number would change from release to release but the rest of the code is reusable.

    
    def handleEntry
    handleEntry = {entry->
        def message = entry.msg.text()
        if (message.size() >= 3 && message[0..2].equalsIgnoreCase("bug")) {
            println " * $message (${entry.@revision})"
        }
        entry.logentry.each(handleEntry)
    }
    
    def proc = "svn log -r HEAD:1528 -g --xml".execute(null, new File("/path/to/working/directory"))
    new XmlParser().parseText(proc.text).logentry.each(handleEntry)
    
    

    Version Control and Bug tracking Integration (with Subversion and Bugzilla)

    Two of the most useful tools to a developer outside of their development environment are version control and bug tracking systems. Version control allows tracking of changes to the product and allows for branching and merging. Bug tracking systems allow for tracking issues with the product whether they be bugs or enhancements.

    Even though these tools are often separate products, they have a major commonality which is the code you are working with. Often times you want to be able to see for any given bug number, what code was changed for that bug. Also, for a change in the code (in version control) you want to see if it was associated with a particular issue in the bug tracking software.

    At the company I work for we use Subversion for version control and Bugzilla for bug tracking. We have some best practices around these tools to make things easier.

    Version Control and Bug Tracking Best Practices

    When resolving issues in the bug tracking database, our team always puts in the build number of the build that contains the fix. This way a person who is looking at the bug can know if the build they have contains the fix. Anytime our team fixes a bug we put in a comment that looks like this:

    
    Build Fixed: 1.0.1.12354
    
    

    The last number is the revision number in Subversion.

    When we commit code changes to Subversion, we also include the bug number for the bug being fixed. Our commit messages always appear in this format:

    
    Bug 1234: Fixed this bug
    
    

    Subversion Tooling

    Recently I came across a neat feature in Subversion that allows you to link it to a bug tracking system. Basically this allows clicking on the bug number in the subversion history view to take you directly to the bug number in the bug tracking software.

    Enabling this feature is fairly simple to do and involves setting 2 properties in the subversion repository. These properties need to be set on the root folder in subversion that you would use to checkout your project from. It automatically is available for everything in that tree but you need to checkout from this root for it to work. These are the two properties that need to be set.

    • bugtraq:logregex – This defines a regular expression to ‘match’ bug numbers in subversion comments. For the pattern I listed above, we are using: [Bb][Uu][Gg] (\d+)
    • bugtraq:url – This defines a URL to go to when the user clicks on a bug number. The browser is launched when the number is clicked on and takes you to this URL replacing the BUGID parameter. For our bugzilla repository we are using: https://some.server.somewhere.localhost/show_bug.cgi?id=%BUGID%

    The following steps walk through this process of how to set this up using Tortoise SVN:

    • On the root folder of your subversion working copy, right click on the folder and click TortoiseSVN -> Properties.

    • Add each property listed above as new properties to the list.

    Building with Maven

    I’ve decided to use Apache Maven for building the code for my new project. So far I have had a love-hate relationship with Maven. If you don’t know what maven is, the folks over at Apache say Maven is …

    “… a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project’s build, reporting and documentation from a central piece of information.”

    Unfortunately that definition is about as vague as you can get so I will explain what Maven gives me and my project. This also happens to be the list of what I love about Maven:

    • Maven enforces a standardized project structure to all modules in a project.
    • Maven handles dependencies between libraries.
    • Maven builds all of my code and creates a distribution package.

    Essentially by using Maven, I can get all of these things without having to write and maintain ant scripts to do most of the work for me. Maven is not without its problems though.

    • Often times the number of lines of XML you need to write ends up to be just as complicated and long as an ant script to do the same thing.
    • I’ve found a lot of bugs particularly with the assembly module. It seems to include dependencies of modules that are set with the “compile” or “test” scope only.
    • The Groovy building plugin for maven does not yet work for Groovy 1.5.x.

    I still think using maven was a good choice for my project. It may have some qwerks but I think that I am further ahead than if I had to build everything from scratch.

    Using Maven to Generate a Build Number

    Earlier I metioned that I use the subversion revision number as a build number. This is actually quite easy to do from Maven. The following plugin descriptor provides you access to the subversion revision number using the ${scm.revision} property.

    
      <build>
        <plugins>
          <plugin>
            <artifactId>maven-scm-plugin</artifactId>
            <executions>
              <execution>
                <id>getting-scm.revision</id>
                <phase>validate</phase>
                <goals>
                  <goal>update</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
    
    

    Once you have access to the revision number, you can construct a build number using the product version and the revision number. I have added this build number to the Implementation-Version property of the jar manifest. The following plugin definition does this for me:

    
          <plugin>
            <artifactId>maven-jar-plugin</artifactId>
            <version>2.1</version>
            <configuration>
              <archive>
                <manifestEntries>
                  <Implementation-Version>${this.version}.${scm.revision}</Implementation-Version>
                </manifestEntries>
              </archive>
            </configuration>
          </plugin>
    
    

    The resulting manifest in the jar looks something like this:

    
    Manifest-Version: 1.0
    Archiver-Version: Plexus Archiver
    Created-By: Apache Maven
    Built-By: cdail
    Build-Jdk: 1.6.0_03
    Implementation-Version: 1.0-SNAPSHOT.93
    
    

    From Java code, this build number can be easily retrieved. The following code retrieves the Implementation-Version from the manifest:

    
    Package p = getClass().getPackage();
    String version = p.getImplementationVersion();
    
    

    The result is quite elegant but it takes a bit of work to get things into place. Like with many other things to do with Maven, it takes a while to figure out how to do what you want and you need to write a bunch of XML. Once that was done, everything fell into place. It is hard to say at this point if this is better or worse than the chunk of ant script I used to use to perform the same function.

    Just like Taking out the Garbage (With Version Control)

    I had a Mathematics teacher in High School who used to get very excited over factoring problems where you could simplify expressions by canceling out terms. He used to say it was just like “taking out the garbage”. Taking out the garbage was not usually a fun task but it is surprisingly satisfying to get rid of stuff that is not necessary and is just clutter. I run into this same sort of thing when developing software. I just love to “take out the garbage”.

    Version Control software is essential to any project, no matter how small (That in itself is a topic for another day). Version Control software gives you the ability to take out the garbage all of the time without having to worry about losing something that is important. You can always go back to old versions of a file if you need them at a later point in time.

    Delete Unused Classes

    Often when you are refactoring a component or adding something you end up with a Class that was used before that is no longer relevant. Delete it. Remember that you can always get it back later if you need it again through your Version Control system. It does not matter how ‘useful’ this code was, how ‘nice’ it looks, how ‘cool’ it is. It may have been a useful utility that you might need again. You have to resist these urges to keep it around. Unless you know for a fact that you will use it again, you should get rid of it. It is not gone forever and you can get it back if you need it. This will help reduce the complexity of your code and consequently the readability.

    Delete Unused Code, Don’t Comment It Out

    I see a lot of developers who take a piece of code that is no longer used and comment out the entire section. Resist the urge to keep it inside the code. You can always get the code back through version control, so why do you want to keep a long comment block somewhere where it needs to be maintained? Another danger to this is that if you do want to use this code later on, you will likely end up removing comments around code that no longer compiles. Things change and this once working code may no longer work after you uncomment it.

    The follow code is a good example of this. The line that was commented called a function that takes 2 parameters. Notice that the current version of the function takes only one. Code blocks that are commented out are not compiled so the code is not kept up to date with the rest of the code.

    // We don’t need to do this anymore
    // variable = someFunction(a, b)

    someFunction(a) {

    }

    It is simply better to just delete the code block. If you need it at some point in time later, the version control software can resurrect your lost code.

    Bug #12324 Add a heading here
    Bug reference numbers are not necessary

    I have seen lots of code with comments around a changed block indicating the change was for a particular bug. Before long, the code has a mess of comments all over the place indicating bugs that were fixed and where. This is another case where version control software can eliminate comments that unnecessarily clutter your code. This one does require a bit more discipline though.

    Always indicate the bug number being fixed and a small when checking in code into Version Control. For example, you could add a comment like “Bug 12324: Added heading to section”. Then when you look through the Version Control log, you can easily see where changes were made and what they were made for.

    If you are looking at a particular line of code and you need to know where it came from and why, you can use the “blame” feature. That will give you the person who last changed the line of code you are looking at along with the comment (which should include the bug number and fix description). For more information on this feature of version control systems, check out the “Who wrote this Crap?” (http://www.codinghorror.com/blog/archives/000992.html) article on Coding Horror.

    Taking Out the Garbage

    So next time you are confronted with “stuff” that is not longer used. Delete it and let your version control do the work of remembering it. This will keep your code much cleaner and easier to read. It is just like taking out the garbage.