Coding Clarity

Writing simple, clear and readable code.

For a recent project I was working on, I was required to set permissions on a remote windows share. All of the roads seemed to point to JCIFS as the library to do this. Unfortunately, JCIFS did not have support for the operations I required so I went about seeing what it would take to add them. This is the story of my JCIFS journey.

JCIFS is not what you might expect from a typical modern open source project. What source control system do they use? Git, SVN, Surely not CVS? I was surprised to find that the answer was none. There is no source control system that controls the official JCIFS releases. This stems from the fact that there is a single developer/maintainer of the codebase. The next thing I looked for was to see if I could find their bug tracking system. Same story. There is no bug tracking system for JCIFS either. The one thing JCIFS did have going for it was the active mailing list. Michael B Allen, the JCIFS developer/maintainer of the project, was very helpful in answering my questions to get me going.

What I Needed

What I was looking for was the ability to set Access Control on file shares of a Windows server. I found a promising patch that I thought was my answer on the JCIFS mailing list http://comments.gmane.org/gmane.network.samba.java/9045. It turns out that this was not exactly what I was looking for. This patch can be used to set file permissions (returned from JCIFS SmbFile.getSecurity()). What I was really looking for was to set permissions of the share (returned from JCIFS SmbFile.getShareSecurity()). This patch was a starting point but it would need some work.

If you have done any coding in Java that requires interoperability with Windows systems, you have probably come across JCIFS. JCIFS is an “Open Source client library that implements the CIFS/SMB networking protocol in 100% Java.” Many other java projects out there such J-Interop and many others use JCIFS internally. The reason for this is because JCIFS has implemented a Java version of Microsoft’s version of DCE/RPC. Leveraging this protocol, you can call pretty much any remote procedure call Microsoft has implemented. A great resource on what Microsoft has in this area is the MSDN documentation on Microsoft Communication Protocols (MCPP).

Microsoft has two protocols that I needed to add operations for:

  • [MS-SRVS]: Server Service Remote Protocol Specification
  • [MS-SAMR]: Security Account Manager (SAM) Remote Protocol Specification (Client-to-Server)

To SRVS, I needed to implement the NetrShareSetInfo call to set the permissions I was looking for. After working through this I realized I needed a way to lookup a user SID by name. To do this, I also implemented the SAMR call SamrLookupNamesInDomain.

Implementing My Changes

Implementing changes to the DCE/RPC calls in JCIFS was not trivial to figure out. There seemed to be generated code (srvsvc.java and samr.java) that was generated from srvsvc.idl and samr.idl. I figured Corba at first but quickly realized that this was not regular IDL. It was not even the Microsoft IDL as described in the Windows calls. This IDL was massaged into a format that JCIFS could work with. I spent a long time trying to find out how this IDL was compiled until I got a reply on the mailing list with this blog post by Christofer Dutz. He pointed out a tool that I missed called midlc that is part of JCIFS. It is unfortunately not referenced in the JCIFS main website at all other than having the download listed. Following his instructions, I was able to get midlc compiled and running.

The IDL compiler can be downloaded from http://jcifs.samba.org/src/midlc-0.6.1.tar.gz. It was originally built for Linux but compiles and runs fine on my mac. In a nutshell to compile it:


$ cd midlc-0.6.1/libmba-0.9.1
$ make ar
$ cd ..
$ make

Running the compiler was pretty simple as well.


./midlc -v -t jcifs -o [pathto]/srvsvc.java [pathto]/srvsvc.idl

Implementing the code to use this generated code was fun. There were a lot of good samples so it was not hard to get going.

Available for the future

I have made all of the work I did on JCIFS available on Github. Hopefully others will find it and use it.

https://github.com/chrisdail/jcifs

Edit (March 30, 2012): Updated to include original setSecurity.patch I based my work on. This has since been removed from nabble.

This past weekend I gave a tech talk on Git at Maritime DevCon 2011. The talk served mostly as an introduction to Git. It also covered the basics of how distributed version control works and why distributed version control is useful even in corporate environments with central repositories.

I have posted the slides from this talk.

I recently have migrated one of our development teams from subversion to Git. This post is intended to share my experiences and some techniques for doing a clean migration.

The migration went smoothly overall. It helped that I had planned all of the steps of the migration out beforehand so that it was simply a matter of executing them the day of the migration.

Pre-Migration Steps

There were a few steps I went through in preparing for the migration. I wanted to get as much stuff done beforehand so I could minimize the time from when I ask users to stop committing to SVN and when I got Git up and running.

1. Generating an svn authors file. This step creates an authors file so subversion users are properly mapped to git users. I found this script here: http://technicalpickles.com/posts/creating-a-svn-authorsfile-when-migrating-from-subversion-to-git/. Obviously you need to modify the file afterwards adding in the name and email for each user.


#!/usr/bin/env bash
authors=$(svn log -q | grep -e '^r' | awk 'BEGIN { FS = "|" } ; { print $2 }' | sort | uniq)
for author in ${authors}; do
echo "${author} = NAME ";
done

2. Perform the initial checkout. Doing a clone of a SVN repository to git can take a long time. There is no reason you need to do this on the day of your SVN->Git migration. You can have this done beforehand and all set to go. The day of the migration, I simply had to do a ‘git svn rebase’ to get up to date and then start the migration rather than having to do a full new checkout.

I chose not to use the –no-metadata option. This option strips out SVN IDs from the git repository. Eventually I wanted this but if you do it first, you lose the ability to keep updating your repository. I chose to leave this until the end and remove them using a filter-branch command. This rewrites the repository in one of the final stages to remove the IDs.

Also, I wanted to only merge a few selected branches over. Because of this, I did an init command instead of a clone and modified the branches I wanted to bring across.

I also noticed there was some extras I did not want to bring to the new git repository. A second project lived in the same repository as well as documentation. To strip these out, I added the –ignore-paths option.

The checkout ended up looking like this:


# Initialize the repository
git svn init --stdlayout --ignore-paths "(Documentation|project2)" https://server/path/to/projects migration2

# Copy in the svn authors before we do the fetch
cp svn-authors.txt migration2/
git config svn.authorsfile ../svn-authors.txt

# Only pull down the branches I'm interested in. In my case, a single branch named 4.5
git config svn-remote.svn.branches branches/{4.5}:refs/remotes/*

# Get everything form subversion
git svn fetch

The Migration

On merge day I went through the following process. Some of the ideas were found on this blog post.

1. Get up to date with the latest from subversion. This was quite fast since almost everything was always cloned.


git svn rebase

2. Create local branches for the branches I want to preserve to the new repository.


git branch 4.5 refs/remotes/4.5
git branch dev master

3. In the subversion repository, we had 2 projects under trunk (trunk/project1 and trunk/project2). In git, I was only pulling across project one. The –ignore-paths on the init removed project2 but now I wanted to collapse the project1 directory to the root of the repository. This was done by the filter-branch command. It re-writes the repository removing the extra root directory.


git filter-branch --subdirectory-filter project1 -- --all

4. The next step was to remove the subversion IDs from the commit messages. This is done again with a filter-branch command. I needed the –force option since I previously ran a filter-branch without cleaning it up.


git filter-branch --msg-filter '
sed -e "/^git-svn-id:/d"
' --force

5. Run the git garbage collector and filesystem check


git gc --prune=now
git fsck

6. Remove any SVN sections from the repository


git config --remove-section svn
git config --remove-section svn-remote.svn
rm -rf .git/svn .git/refs/remotes/* .git/logs/refs/remotes/*

7. Finally, push to the git bare repository. I already had a bare repository set up so all I needed to do was push the branches over I wanted.


git push https://server/path/to/bare.git dev master 4.5

The only thing left was to do a clone and start working again.

Recommendations

The migration went very smoothly. The biggest recommendation I have to anyone wanting to move from subversion to git is to plan ahead. Look up how to do everything you want and put together a migration plan. Do as much work as you can up front. You want to get up and running on the new repository as quickly as possible.

I’ve been doing a lot of work with Git lately and have done a lot of thinking about version control systems. I think our analogy of a ‘tree’ to represent the life-cycle of software versions is no longer relevant. Today, trees and branches do not adequately represent what version control systems are supposed to do.

Branching is Easy

All version control systems can branch fairly well. Simply creating a branch does not give you much. It is simply a copy so it is expected that it will work well.

What is the good of branching if you cannot merge.

Merging is Hard

The thing I love the most about Git is that it gets merging right. Other version control systems I’ve used can do merging but it always feels like a pain to do so.

Image Source: http://very-bored.com/pics/weirdtrees/weird-trees-8.jpg

The tree metaphor does not really fit with the concept of merging. So why do we still use it? Most of the time I see people drawing Git graphs in lanes.

A new metaphor

Source control is more like lanes on a highway. Commits (Cars) are free to move from lane to lane over time. Branching and merging have an equal weight.

Despite my previous post on why Subversion is not Dead, I really do love Distributed Version Control Systems. I’ve wanted for some time to move our development teams over to Mercurial or Git.

Our team is highly distributed so we store out code in a master Subversion repository in the cloud hosted by Codesion. Recently they have offered Git hosting as well. No, we did not need this to move to git but it makes a nice place everyone can push/pull to and from without worrying about VPN connections.

We are currently in the process of migrating projects over to Git. I plan to post about that experience here after everything is working smoothly. In the interim, I have been getting a few developers, including myself using Git+SVN.

Git as a Subversion Client

Git+SVN is a capability that Git has to ‘clone’ a Subversion repository into a Git repository. Essentially, it allows you to locally get the benefits of using Git and still push and fetch from a master Subversion hub. This is a great option for people just starting out with Git and if you are in a company that requires you to use Subversion for whatever reason.

Getting started with Git+SVN is very easy. First you need to download Git and install it. To clone the subversion repository it was as easy as running this command:


git svn clone --stdlayout --username cdail https://url/to/svn/repository git-repo

If for some reason the command dies or hangs, you can resume by running a fetch. This happened to me a few times as my repository I was cloning was huge. To continue after a hang:


git svn fetch

Be warned that this may take a very long time. The first time I ran it, it took around 7 hours.

Most of the operations after this point are standard Git commands. The few exceptions are interacting with Subversion. To pull the latest changes from subversion, run:


git svn rebase

To push any changes already commited to your local Git repository to Subversion, run:


git svn dcommit

Local Branching and Git+SVN

It is possible to leverage all of the capabilities of Git locally like being able to branch easily. The trick of course is making sure everything can get back into Subversion properly when required. A simple ‘git merge’ may not be enough as Subversion requires you to keep a linear history of commits.

Rebasing is what you will need to do to rewrite your history to keep it linear. This concept may be foreign for Subversion users. I had to do quite a bit of reading myself before I really understood how it worked. I would recommend reading this which provides a good summary of rebasing.

Branching is very easy. I’ve compiled a few simple steps that should help you merge your work back into subversion after working on a branch.

Branching

To create a local branch run:

git branch featureX

You can switch between branches by running the checkout command:


git checkout master
git checkout featureX

Merging

First make sure your master branch is up to date with SVN. You can rebase directly from a branch but I chose to rebase from SVN on master only and do my local work on branches:


git checkout master
git svn rebase

The next step is to rebase your feature branch on master. This makes the history linear from master allowing you to merge properly in a way SVN will be happy with:


git checkout featureX
git rebase master

If at any point you want to see what is going, use the ‘gitk –all’ command to see a graphical view of commits and branches. Make sure everything looks linear.

Now that you are rebased, you can merge this feature to master without worrying about losing commits.


git checkout master
git merge featureX

Check in gitk to make sure you changes are all present and linear.

Now you can commit to SVN as you would normally.

I often see posts on hacker news or proggit exclaiming how Subversion is dead and should not be used by anyone anymore. I do not agree with this but not for the reasons you might think.

Before I get to the reasons why Subversion will still be around for many years to come, let me rant about the anti-subversion movement…

Anti-Subversion Rant

I have a hard time when people lump CVS and Subversion into the same group. Subversion is vastly superior to CVS in almost every way. After working with CVS for a few years, Subversion was a breath of fresh air. People seem to forget that these days. True, distributed version control systems may equally be a leap forward above Subversion but that does not discount what Subversion did right.

When using Subversion with a small team, the merging and branching issues are not usually a problem. For commercial end user software, it has some benefits. For most commercial software, a centralized model for source control is beneficial. I love how the subversion revision number works. At my company we use it to identify the build. It works well to identify exactly what version a customer may be running. Git’s 40 character ID simply cannot be used in this way.

Subversion does have some things going for it especially when you intend to impose a centralized model anyway. Distinct product versions (1.0, 1.1, 2.0, etc) lend itself to this model. If you think about it, this is how commercial software always used to be. Web based products and continuous deployment kind of changed this which is why I think we have changed the way we do version control.

I still do think that DVCS is the way to go and will be the future of version control. I am currently looking into how to easily transition from Subversion to Git for our core product source control (using git+svn). I think the benefits out weight the few nice things we will lose. That is a topic for another post though.

Why Subversion will never Die

Now for the real reason Subversion will not die in the foreseeable future (particularly in the enterprise world). The real reason is the licensing. Consider licensing of Subversion. Subversion is licensed under the Apache license. The apache license is very commercial friendly as I’ve written here previously.

Alternatively consider the licensing of the various distributed version control systems:

  • Git – GPLv2
  • Mercurial – GPLv2+
  • Bazaar – GPLv2
  • Darcs – GPL

Notice a trend?

I totally understand why this is the case. The communities around these products want to protect from commercial companies who will fork these products, enhance them and not provide source for these changes. Typically if this was a client-server type model, it may be acceptable to have the core server licensed under GPL and have a library for accessing it under the LGPL. This would allow commercial offerings to freely build clients to these tools.

That simply cannot work here. Because of the nature of distributed version control systems, each user has their own repository. This means all of the magic that goes into providing a version control system lives as part of the client the user accesses. Licensing this under a commercial friendly license essentially provides that to the whole product. You cannot separate the ‘client’ from the ‘server’ in a distributed version control system as they are one and the same.

Perhaps it is easier to consider a case of this in action.

Embedded Version Control Support

Consider the popular text editor for Mac OS, Textmate. First a disclaimer, I am in no way involved in textmate and am just using it as a plausible example of the problem.

Textmate provides source control access within their editor to Subversion. They created a Java library called SVNKit for this purpose. I’ve actually used this in a product to provide subversion access from Java.

If you look at Textmate’s feature list, you will notice that they do not offer support for Git, Mercurial or any of the other GPL-DVCSs. They do have extensions for these but they cannot bundle support for it like they can with their SVNKit library.

The products I work on are in a similar state. We are excluded from providing Git or Mercurial support in our product due to the licensing. It may be possible to provide this as a plugin or extension that only drives the command line. Though technically possible, many of our enterprise customers would have issues accepting this especially when they have to download and install the GPLed version control software themselves.

Conclusions

As long as the distributed version control systems stick with the GPL license, they will be in exile from many enterprise environments. Perhaps some day we will have an alternative distributed version control system under the apache license. Until then, Subversion will continue to exist in the enterprise. Especially now with the community at Apache, Subversion will continue to grow and evolve for many years to come.

Humans have a hard time understanding the concept of ‘random’. A great example of this that I love to use is to get someone to quickly pick the first ‘random’ number they think of between 1 and 100 (you can do this right now). If the number was truly random, a pick of 2 or 97 are equally likely. In reality, humans are really bad at being random number generators. This becomes even more evident if you ask someone to pick 2 or 3 numbers. Most likely someone will not pick number close together but will instead pick a few nicely space numbers.

When you tell someone to pick a random number, their brains automatically try to create a normalized set of numbers. Computers are also bad at pick random numbers but for completely different reasons.

When it comes to software development, you may require a feature with random elements to it. The classic example of this is ‘shuffle’ in a music player such as iTunes. If every time it tried to pick a new song, it picked a ‘random’ one, you may find yourself listening to the same song a few times in a row or songs from the same album back to back. The typical user reaction is ‘This shuffle is not very random’. We know this to be absurd. In fact, the song selection is very random and it is the human who is unable to understand what random means. What the user actually means is they want a more normalized distribution of songs rather than truly selecting a random one each time.

How a lot of music players solve this problem is to randomize the order of all of the songs in the playlist instead of picking a random one each time. This produces a playlist where each song is played exactly once. To a human this feels more ‘random’ but is actually just more normalized. There are also other tricks you can use like ensuring that “like” songs do not occur back to back such as keeping the same artist from playing back to back. There are lots of other ways you might be able to give a better user experience by making the “random” feature less random.

Never take what a customer says initially at face value. You often need to dig at what they really want. Though the customer claims they want ‘random’, they probably do not. Software Development is all about trying to find out what the customer is really looking for when they ask for something completely different.

Have you ever received an XML Sample without a Schema that you wanted to use with JAXB or some other XML Binding? It actually happens more than I would like. The common thing to do is simply not to use any XML Binding at all. This is not ideal since JAXB is so much easier to use than DOM. Yes there are better Java libraries out there to deal with XML (or Groovy) but that is not the topic of discussion here.

You can always try to write your own schema from the sample provided but that can take some time to do. Many times by the time you are done, you could have just used something else. Another option is to generate a schema for you.

Let us consider the following scenario. You have been provided the following XML sample.


<?xml version="1.0" encoding="UTF-8"?>
<people>
    <Person id="123">
        <name>Chris Dail</name>
        <phone>555-1111</phone>
    </Person>
</people>

The first thing you need to do to use JAXB is to get a schema for this XML. There is a free software tool called the Trang Converter that can be used to convert between schema types. It has a cool feature that can generate a schema from an XML sample file. This is what I am going to do here.

The XML editor I use is called Oxygen XML. It actually has the Trang Converter built in as an option. Going to Tools->Schema Converter gives you a UI on top of the Trang Converter.

Using this, you take the sample and generate a schema for it. After the convert, you end up with a schema looking something like this:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="people">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Person"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="Person">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="name"/>
        <xs:element ref="phone"/>
      </xs:sequence>
      <xs:attribute name="id" use="required" type="xs:integer"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="name" type="xs:string"/>
  <xs:element name="phone" type="xs:NMTOKEN"/>
</xs:schema>

You can then use JAXB to generate your object model from this using the following command:


"%JAVA_HOME%\bin\xjc" -p com.chrisdail.jaxb.sample *.xsd

The result looks like this:


parsing a schema...
compiling a schema...
com\chrisdail\jaxb\sample\ObjectFactory.java
com\chrisdail\jaxb\sample\People.java
com\chrisdail\jaxb\sample\Person.java

Now you have a JAXB generated object model from just an XML sample file.

Example of the generate Person.java class.


//
// This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, vJAXB 2.1.10 in JDK 6 
// See <a href="http://java.sun.com/xml/jaxb">http://java.sun.com/xml/jaxb</a> 
// Any modifications to this file will be lost upon recompilation of the source schema. 
// Generated on: 2010.08.30 at 12:31:56 PM ADT 
//


package com.chrisdail.jaxb.sample;

import java.math.BigInteger;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlAttribute;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlSchemaType;
import javax.xml.bind.annotation.XmlType;
import javax.xml.bind.annotation.adapters.CollapsedStringAdapter;
import javax.xml.bind.annotation.adapters.XmlJavaTypeAdapter;


/**
 * <p>Java class for anonymous complex type.
 * 
 * <p>The following schema fragment specifies the expected content contained within this class.
 * 
 * <pre>
 * &lt;complexType>
 *   &lt;complexContent>
 *     &lt;restriction base="{http://www.w3.org/2001/XMLSchema}anyType">
 *       &lt;sequence>
 *         &lt;element ref="{}name"/>
 *         &lt;element ref="{}phone"/>
 *       &lt;/sequence>
 *       &lt;attribute name="id" use="required" type="{http://www.w3.org/2001/XMLSchema}integer" />
 *     &lt;/restriction>
 *   &lt;/complexContent>
 * &lt;/complexType>
 * </pre>
 * 
 * 
 */
@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
    "name",
    "phone"
})
@XmlRootElement(name = "Person")
public class Person {

    @XmlElement(required = true)
    protected String name;
    @XmlElement(required = true)
    @XmlJavaTypeAdapter(CollapsedStringAdapter.class)
    @XmlSchemaType(name = "NMTOKEN")
    protected String phone;
    @XmlAttribute(required = true)
    protected BigInteger id;

    /**
     * Gets the value of the name property.
     * 
     * @return
     *     possible object is
     *     {@link String }
     *     
     */
    public String getName() {
        return name;
    }

    /**
     * Sets the value of the name property.
     * 
     * @param value
     *     allowed object is
     *     {@link String }
     *     
     */
    public void setName(String value) {
        this.name = value;
    }

    /**
     * Gets the value of the phone property.
     * 
     * @return
     *     possible object is
     *     {@link String }
     *     
     */
    public String getPhone() {
        return phone;
    }

    /**
     * Sets the value of the phone property.
     * 
     * @param value
     *     allowed object is
     *     {@link String }
     *     
     */
    public void setPhone(String value) {
        this.phone = value;
    }

    /**
     * Gets the value of the id property.
     * 
     * @return
     *     possible object is
     *     {@link BigInteger }
     *     
     */
    public BigInteger getId() {
        return id;
    }

    /**
     * Sets the value of the id property.
     * 
     * @param value
     *     allowed object is
     *     {@link BigInteger }
     *     
     */
    public void setId(BigInteger value) {
        this.id = value;
    }

}

In my previous post, I mentioned writing performance tests anytime you need to do optimization to slow areas of code. Writing effective performance tests can be tedious in Java. Every test you want to run in Java needs to have the same timing logic before and after the tests. Groovy‘s Closures make separating timing code from actual test implementations easy. I write all my performance tests in groovy because it simplifies the testing code logic and allows me to focus on what I am trying to test.

The basics of a performance test is to get the current time before the test, run the test and then get the time after. In Groovy, we can use a closure to express this.


def timeit = {String message, Closure cl->
    def startTime = System.currentTimeMillis()
    cl()
    def deltaTime = System.currentTimeMillis() - startTime
    println "$message: \ttime: $deltaTime" 
}

This allows you to call a test like this:


timeit("Test 1") {
    // This would be the code you want to test
    Math.pow(2, 6)
}

If you are going to be writing many tests, this format is much shorter than having to constantly repeat the currentTimeMillis() calls in Java. Also, no heavyweight testing framework is required. The ‘message’ that is passed in is a convience method so the output of the test can be distinguished. The results look like this:


Test 1: 	time: 0

Right away you will notice that a time of 0 milliseconds is not that useful. The code simply ran too fast to measure. Yes we could use nanoseconds and may get better results. What I prefer to do is to run the test many many time and to take the average. This way you get an average of how fast it is and it provides more repeatable numbers.

Updating the groovy closure, we end up with the following that runs the test 500 times.


def timeit = {String message, int count=500,  Closure cl->
    def startTime = System.currentTimeMillis()
    count.times { cl() }
    def deltaTime = System.currentTimeMillis() - startTime
    def average = deltaTime / count
    println "$message:\tcount: $count \ttime: $deltaTime \taverage: $average" 
}

The output of this looks like this:


Test 2:	count: 500 	time: 18 	average: 0.036

Another thing that should be considered in Java is discounting the first few runs. The first time Java executes a particular class, things are always slower. Some work has to be done by the Java VM to load all of the classes for the first time. Subsequent invocations of the same code get faster. To account for this, I include a warming period in the tests. Essentially I run the code to be testing a bunch of times before I record the time. This discards these initial runs that will be slower. The closure for this looks like this:


def timeit = {String message, int count=500, Closure cl->
    // Warming period
    20.times { cl() }
    def startTime = System.currentTimeMillis()
    count.times { cl() }
    def deltaTime = System.currentTimeMillis() - startTime
    def average = deltaTime / count
    println "$message:\tcount: $count \ttime: $deltaTime \taverage: $average" 
}

The output of this looks like this:


Test 3:	count: 500 	time: 6 	average: 0.012

Another thing you might want to do is to run a multi-threaded test. In Java, this would require quite a few extra classes. In groovy, a simple modification to this closure can allow it to be run in multiple threads. Here is the new closure and test call calling it with 5 separate threads:


def timeit = {String message, int numThreads=1, int count=500, Closure cl->
    // Warming period
    20.times { cl() }
    def startTime = System.currentTimeMillis()
    count.times {
        def threads = []
        numThreads.times { threads << new Thread(cl as Runnable) }
        threads*.start()
        threads*.join()
    }
    def deltaTime = System.currentTimeMillis() - startTime
    def average = deltaTime / count
    println "$message:\tcount: $count \ttime: $deltaTime \taverage: $average" 
}

timeit("Test 4", 5) {
    Math.pow(2, 6)
}

An extra parameter to the timeit closure allows you to pass the number of threads to concurrently execute. The results are the following:


Test 4:	count: 500 	time: 465 	average: 0.93

As you can see, groovy makes it much easier to write performance tests for Java code. The closure listed above can be used in all sorts of different projects to test performance. Hopefully this snippet will make your life easier when it comes to writing your own performance tests.

Performance Tuning is one of those black arts in programming. It takes skill to do it properly. Often people end up attempting to optimize the wrong things for performance. As the great computer science wizard, Donald Knuth put it: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil”.

I think of it in these terms. Readability comes first and foremost because that leads to maintainability. If you have a performance issue, then worry about tuning performance. I am by no means saying you should completely ignore performance and brute force everything. You need to be aware of performance and do things in an optimal way. You should simply not go out of you way to make something faster at the cost of readability.

Occasionally you will be tasked with the job of performance tuning. On the last three major projects I have worked on, each of them required performance tuning at some point. For each of these there were some basic tools I used to go about looking for areas to optimize.

The Hunt

The first thing you must do when looking to boost performance is to go on the hunt. It is important to know what is slow before you can make it fast. You will be very surprised to find that most times the thing you think is slow may not be so slow and another thing that seemed to be trivial may be the cause of lots of performance issues.

Before going on the hunt, you first must have the proper tools. Here are some essential tools for tracking down performance issues:

  • Profiler – A code profiler allows you to see how long your application is taking doing various tasks. At my company, we use Eclipse for our Java development. It comes with a profiler as part of the testing and performance toolkit. There are lots of other commercial profilers out there that are likely much better. Pay attention to the ‘hot spots’ in your code that are executed more than others. Though it might not seem like it spends a lot of time each iteration, a small boost here could end up being a lot.
  • Poor Mans Profiler – Sometimes you might not have a profiler or you want to just look at a small section of code. In these cases, putting a few System.currentTimeMillis() will allow you to get some timings. In the current project I just worked on for performance optimization, the code already had extensive use of the Java Monitoring API (http://jamonapi.sourceforge.net/). Using this has the same effect as currentTimeMillis() but has a more refined API to work with. It can also help for seeing how fast certain calls are.
  • Performance Test Suites – I often like to write Unit Tests for specific functionality that I’m trying to optimize for performance. This way it is easier to profile and check performance on a specific part of the code. This way you can also do this in a unit test rather than starting the whole application.
  • Process Viewer – Task Manager on Windows and top on Unix are invaluable tools as well. These allow you to watch CPU usage when running performance tests. Often times a sure sign of a synchronization bottleneck in a multi-threaded application is watching a single CPU be maxed while the rest are idle. Always do development on a multi-core machine if you are writing multi-threaded applications so you can look for these issues.

Approaching your Target

After you have found a performance issue, it is time to attack the performance issue. You know where the performance issue lies but you don’t know what is the cause. There are a few things to look for.

  • Synchronization – A big performance issue I alluded to earlier is synchronization. In multi-threaded development, sometimes you need to work with shared objects. The easy way to do this in Java is to use the synchronized keyword. You need to be careful the scope of where this is used and keep it as narrow as possible. The CPU usage is a good indication of this problem. If this is your problem, you may want to look at using a modern concurrency library like the java.util.concurrent library for Java. The ConcurrentHashMap can solve many issues around synchronized maps and is much better than using Collections.synchronizedMap(). Many synchronization issues can be difficult to track down because a debugger cannot show you them.
  • Serialization – Serialization is another big performance hit. Anywhere you are going from data objects to XML, JSON or binary on disk or in memory, you have a performance hit. These operations are notoriously slow but are often necessary at times. You should make sure these are not being done more than they need to. Often times a cache on deserialization of objects can greatly improve performance here.
  • Nickels and Dimes – Often times there is not one single performance issue that is the cause of all of the problems. More than likely there are a few things that add up over time. If you shave off 1 millisecond from a call that is called 100000 times, you have saved yourself a second worth of processing time. This can often be better than shaving 50 milliseconds off of a call that is only called once. This is where your profiler and performance tests help out in knowing where the problem is.
  • Databases and Performance – If you are using a database and notice performance issues you should check a few things. Make sure you are using database queries. Most of the time the database can manipulate things faster than you can in code. Also make sure you have proper indexing on your database tables so the queries run fast. Sometimes things can be done faster manually in code. Make sure you run performance tests before an after to compare any changes.

The Cleanup

After you finish your performance tuning, it is very very important that you re-run your performance tests. You need to prove that the improvements you made had a positive effect on performance if they did not, then they weren’t needed and are more likely to introduce bugs than anything else. If the performance did not improve, throw out the change and return to the hunt.

Along this note, it is important to only be on the hunt for one issue at a time. If you make 2 changes at once, it is not possible to tell which one may have given the performance gain. Each change must be done in isolation so you can be sure each change is required.