When to do Unit Testing

I was listening to the Stackoverflow podcast number 41 today. One of the issues discussed was unit testing and when is it beneficial to do unit testing. I agreed with some of the points but not all of them. The point I especially found interesting was the “eating your own dogfood” concept where writing tests forces you to use the interfaces you created. I also never considered the perspective on tests serving as documentation or a spec on how the system should work.

There are lots of good reasons to do unit testing but there are a few issues I have run across with it that go beyond the obvious ones of time and effort.

  • Bugs in the tests – Let’s say a hypothetical programmer creates a bug in every 100 lines of code he writes. Logically, that would mean he would be creating a bug in every 100 lines of test code also. Unit tests do provide a double check to validate your code but a bug in the unit test could give you a false positive that your code works. A programmer who writes bad code will also write bad unit tests so even additional unit test coverage will result in poor code quality.
  • Tests are Bias – Unit tests are written by the programmer who wrote the code. If the programmer misunderstands how the code works or should be used (even if they wrote it), then it is likely that the test will also be built around this misunderstanding.
  • Test Failures due to design changes – Most times when I see unit tests fail it is because the code that was being tested was intentionally changed. Often the test is the last piece of code to be updated (right or wrong) to the new functionality of the rest of the system.
  • Code that Must be Maintained – Tests are additional code that developers need to maintain. I try to write as little code as possible to keep it clear and simple which makes it hard to justify writing a lot of extra code.

Writing tests requires effort. They take time to develop and maintain but they do have a payoff. I try to find a balance between the “100% test coverage” people and the “we don’t need any tests” people. The following is a list of criteria I use as a general guide to what I feel is most important to test. I choose the stuff that is most important areas to test.

  • Libraries – I always make sure I write tests for common libraries or utilities. These are generally easiest to test and are designed to be used in many places. Testing these has the biggest payoff.
  • APIs – APIs are a contract between systems. These are logical choices for tests as they should not be changing often and have strict rules.
  • Business Logic/Rules – Anytime there are business rules or logic that perform some defined function where there are clear inputs and outputs, these should be tested.

Libraries are usually number one on my testing priority. I usually aim for 90% or more coverage. GUI testing I have found to be of little benefit since the effort is so high and you end up with very little reward. Glue code or anything that ties different components together is hard to test and are more easily tested with basic smoke tests on the complete product.

Write as Little Code as Possible

The Zune Bug

A few weeks ago I heard of this issue where the Microsoft Zune crashes and won’t startup on December 31, 2008. The reason for this? A bug in the software that handled leap years. There are lots of articles on the original issue.

When I heard of this issue, I originally thought ‘How could that happen?’. Sure I could understand not handling things correctly when it is a leap year but causing a crash?

Well, if you like me were wondering how that could happen, you can now find out for yourself. This is a post someone made of the actual code that runs on the Zune. Look at line 249 and on.

If you missed it, there is a bug (obviously) where if the number of days passed in is 366, which is the case for December 31, 2008, the loop never terminates. The code checks to see if days is greater than 365 in a while loop. It then handles the greater than 366 condition but never checks if the days is equal to 366.

Write as Little Code as Possible

The less code you write the fewer bugs you will have in it. Most languages today have built in support for common tasks. One example of this is the Java Calendar object which would allow a developer to do what the zune code does using the platform APIs. Unfortunately, High level languages are usually not an option when writing code for a small embedded device.

Tips for writing as little code as possible:

  • Start with a high level language. Preferably use an agile and dynamic language if possible (Groovy, Ruby and Python etc). This of course depends largely on the requirements of the application.
  • Where possible using the built in platform APIs to do what you need.
  • Use open source software to fill in the gaps of platform APIs. Apache and Codehaus are great sources of open source software that are commercial friendly to ship.

Queue based on a Dictionary in Python

Two Weeks with Python

For the last few weeks I have been doing some work with Python. Most of the development work I do is in Java or Groovy. For this one project I was working on, I need a very lightweight language that I could use very a very small application that is called often from the command line. After some performance testing between a simple test between Perl and Python, I found that for the tasks I was doing Python seemed to run a bit quicker. I was quite pleased at this because I have wanted to learn Python for some time now but never had a reason to do so.

There are a lot of good things about Python. It is very dynamic and allowed me to say more in a few lines that I could in something like Java. The standard library in Python is very good. They pretty much had everything I needed to write my application without having to go download a bunch of third party dependencies like I would in Java (ie. Apache Commons Stuff).

The main issue I was having is that I needed a queue for my application that was persistable to disk. This is part of a guaranteed delivery system so I need to always keep the data written out on the disk. There are a few options in Python for persistence. All of them are based around Pickle which is Python’s object serialization.

For performance reasons I did not want to serialize the queue to disk. The queue will often have items added to it or removed from it so it will be constantly shrinking and growing. I did not want to have to write the entire file to disk every time a simple change was made to the queue.

This is about the time when I found Shelve, Python’s object persistence mechanism. It allows for key/value pair, dictionary or map style access to data in a ‘database’. Database here is a dbm or a Unix style database. I had dome some work with Berkeley DB2 before so I was familiar with the concept. This is what I wanted to use to store my data. The only problem was that it was essentially a dictionary and not a queue.

Dictionary base Queue

I set out to write an implementation of a simple queue in Python that stored all of the data into a dictionary. In my case I would be using a Shelf but it could really be backed by any dictionary. This would allow an entry to be added or removed from the queue easily without having to write the entire queue to the disk every time.

The data structure works like this:

  • All keys in the dictionary are always numbers. The numbers typically start at 1 but can be any continuous range. It is important they are continuous or the algorithm will not owrk.
  • The data strcture keeps track of the minimum key index which is the head of the queue and the maximum key number which is the tail of the queue.
  • When the queue is created, it iterates of all keys in the dictionary to determine the min and max key numbers. This is the only call to get the keys from the underlying dictionary. Using Shelve, the keys() method is very expensive so we want to call this as little as possible.
  • The dictionary is Thread safe. Locking is done around methods that need to modify the queue. This was a requirement for the code that was using this implementation.

Methods provided:

  • head – Gives the item at the head of the queue. Does not remove it.
  • pop – Pops the head item off the queue, removing it.
  • enqueue – Adds an item to the queue.
  • size – Returns the size of the queue.
  • sync – Syncs the shelve to disk. Called after each operation to modify the queue. This only works if the dictionary is a shelve.

The code:


class DictQueue:
    lock = threading.Lock()
    min_key = 1
    max_key = 0
    
    def __init__(self, dict):
        self.dict = dict
        
        keys = dict.keys()
        if len(keys) > 0:
            self.min_key = keys[0]

        for k in keys:
            i = int(k)
            
            if i > self.max_key:
                self.max_key = i
            if i < self.min_key:
                self.min_key = i
    
    def head(self):
        with self.lock:
            try:
                return self.dict[str(self.min_key)]
            except:
                return None
    
    def pop(self):
        with self.lock:
            k = self.dict.pop(str(self.min_key))
            self.min_key += 1
            self.sync()
            return k
    
    def enqueue(self, value):
        with self.lock:
            self.max_key += 1
            self.dict[str(self.max_key)] = value
            self.sync()

    def size(self):
        return self.max_key + 1 - self.min_key

    def sync(self):
        try:
            self.dict.sync()
        except:
            pass

Usage:


    s = DictQueue(shelve.open('db'))
    s.enqueue('a')
    s.enqueue('b')
    s.enqueue('c')

    print 'size:', s.size()
    while self.queue.size() > 0:
        print 'item:', s.pop()

Getting access to SOAP Headers from an Apache CXF service implementation

For one project I was working on, I needed to get access to the SOAP headers of web service call. I am using the Apache CXF services stack. There were quite a few threads on how to get access to SOAP headers from an interceptor. In my case, I needed the contents of the header inside the implementation.

wsdl2java does have the ability to create a java service class that provides access to the headers using the -exsh option. This was not an option for me since even with this flag on, the headers were not added to the service calls. I think it was how the WSDL I had was designed. This meant that I needed to do the work of pulling out the headers myself.

The SOAP headers can be retrieved from the JAX-WS SOAPMessageContext the easiest. Getting access to this was not trivial. I added the resource annotation to get access to the WebServiceContext. Unfortunately the MessageContext this gave me was not a SOAPMessageContext and provided no way to access the SOAP headers.

After looking through the code for the SoapMessage CXF class, I found how it gets the headers out of the Message. I came up with the following to access the headers from my implementation class:


@Resource
private WebServiceContext context;

private List<Header> getHeaders() {
    MessageContext messageContext = context.getMessageContext();
    if (messageContext == null || !(messageContext instanceof WrappedMessageContext)) {
        return null;
    }
        
    Message message = ((WrappedMessageContext) messageContext).getWrappedMessage();
    List<Header> headers = CastUtils.cast((List<?>) message.get(Header.HEADER_LIST));
    return headers;
}

This provides all of the headers available. To get the specific one I needed using JAXB I added the following to my code:


List<Header> headers = getHeaders();
if (headers != null) {
    for (Header h: headers) {
        Object o = h.getObject();
                
        // Unwrap the node using JAXB
        if (o instanceof Node) {
            o = getJaxb().createUnmarshaller().unmarshal((Node) o);
        }
                
        if (o instanceof DesiredHeaderType) {
            // Do whatever is required with the header object instance
        }
    }
}

This way of accessing the headers turns out to be much simpler than writing an interceptor and trying to stuff the results of that into the request.

Detecting Empty XML Elements with JAXB

I use JAXB in conjunction with Apache CXF for web services. One requirement for the data model is to allow data fields to be “cleared”. This is done by entering no value in the XML. Data elements of the model can also be left out completely. This means that the value should not be set and is very different from a clear. Consider a JAXB class with one string field, one number field and one date field.


<string-value/>
<number-value/>
<date-value/>

The previous XML indicates that the string, number and date values should all be cleared. The actual values you would get in the data model are as follows:


string-value: "" (empty string)
number-value: NULL
date-value: NULL

If no XML was passed in, the following would be the resulting values in the data model:


string-value: NULL
number-value: NULL
date-value: NULL

For string values, you can tell if the user intends to clear the value by checking if the value is empty string. The problem comes in when you look at the number and date fields. You get NULL if the XML element is present or not. This makes it impossible to tell the difference between no XML being specified and an empty XML element.

The solution I came up with was to use a special number and date to denote a clearable value. I wanted to pick numbers that should never occur in the data model so I chose the following:


Date: Epoch, 1970-01-01T00:00:00.0-00:00
Number: Integer.MIN_VALUE, -2147483648

In JAXB, I set these values to the “default value” in the XML schema. This means that if the user types in the XML element by itself with no text value, it should be treated as the default value. If the user leaves the element out, according the the schema spec, the default value should not be used. This is exactly the behaviour I want for the “clearable” fields.

On the data model class, I added the annotations on dates and numbers respectively (actually I used a constant):


@XmlElement(defaultValue="1970-01-01T00:00:00.0-00:00")
@XmlElement(defaultValue="-2147483648")

Now, if you have the original XML example:


<string-value/>
<number-value/>
<date-value/>

You will get the following values in the data model:


string-value: "" (empty string)
number-value: -2147483648
date-value: new Date(0) (1970-01-01T00:00:00.0-00:00)

Now we can easily differentiate between an empty XML element and leaving out the element altogether.

HTTP Basic Authentication with Apache CXF Revisited

I receive a lot of traffic to my post about HTTP Basic Authentication in Apache CXF. I decided to do a followup to that post to address some of the comments.

I have never tried to use this with Mule but if someone has, please let me know so I can update this post.

I have uploaded the Java code for the BasicAuthAuthorizationInterceptor class. There are a few changes over the original version. This one includes a Map of authorized users and their corresponding passwords. I believe the original example I created was for Apache CXF 2.0. This version works with Apache CXF 2.1.1.

In the original post, I also did not include a sample of how to use this code in a real application. The following section shows a sample of how to define the security interceptor and enable it on a simple endpoint.


<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:jaxws="http://cxf.apache.org/jaxws"
    xsi:schemaLocation="
    http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
    http://cxf.apache.org/jaxws http://cxf.apache.org/schemas/jaxws.xsd">
    
    <bean id="securityInterceptor" class="BasicAuthAuthorizationInterceptor">
      <property name="users"> 
        <map>
          <entry key="username" value="password"/>
        </map>
      </property>
    </bean>

    <bean id="service" class="sample.Service"/>
    
    <jaxws:endpoint
      id="serviceEndpoint" 
      implementor="#service"
      address="${services.url}/Service">
      <jaxws:inInterceptors>
        <ref bean="securityInterceptor"/>
      </jaxws:inInterceptors>
    </jaxws:endpoint>
</beans>

You Don’t Have to Use it

This past week was the Electronic Entertainment Expo (E3 for short). Most of the major video game console vendors, publishers and developers get together and show the press some of the new stuff happening this year.

This year Microsoft unveils their new and improved Dashboard. The dashboard is the main Xbox interface that is used to navigate the downloads store, games, achievements, friends and media. Initial reactions on it were mixed. Some people felt it tries to dumb things down.

New XBox 360 Dashboard

As a consequence of this new press, Microsoft’s Nelson indicates that a href=”http://xboxfocus.com/news/618-using-new-dashboard-optionable/index.html”>you don’t have to use the new interface. The entire old interface is still present and can be accessed with the press of a button.

There are two fundamental problems I have with how they are approaching this problem. The first is that this is a violation of the DRY principle. Don’t Repeat Yourself. Most of the time we talk about not repeating code sections but I like to apply this to interfaces as well. In most cases, I cannot see a reason to create two interfaces that do the exact same thing. Users can be confused when they are presented with multiple ways to do the same task. From a user’s perspective they want to be instructed how to use the feature correctly and they expect a single answer for this.

The other problem with Microsoft’s approach is that they created multiple ways to do the same thing. I have been working on projects where new interfaces or features were proposed do deal with specific problems. These features were clear improvements over the old way. Inevitably someone asks “Is it possible to keep the old way of doing things as well as the new” or “can’t we have a button to enable the old interface”. These comments should cause you to rethink how good the change is. The new feature always one of the following: better than the old feature, worse than the old feature or neither better or worse than the old feature. If it is better, adopt the better feature. If it is worse or not any better, the feature should be revisited as to why the new approach was taken. Maybe a tweak of the feature could be better. The best option is rarely to keep both ways of doing things.

It is always best to provide a single way to do a single task. This creates a clear and consistent interface for the users. It also creates less confusion for users and makes it clear what is the proper way to use the product.

Customer Issues that are not Reproducible

A few weeks ago my Honda Civic was experiencing an intermittent clunking noise. I am by no means an expert in automobiles but I am fairly certain this noise was not “normal”. I proceeded to let the dealer know about this problem when I had it in for maintenance. I explained how it seems to make this noise sometimes when accelerating or decelerating.

Were they able to fix my problem? Of course not. They were not able to reproduce the problem and consequently did not fix anything. For car problems, we accept this and as a user of the car I am supposed to ignore this until it is either reproducible all the time or something major breaks. They did assure me that they did not see anything major wrong so there was no danger in driving the car.

In the world of software development, this just does not fly. If the user has a problem, they expect it to be fixed regardless of if they can reproduce the problem or even adequately explain it. For example, a support technician brought a customer issue to me where a server product hangs sometimes for no reason. It only happens on one machine and it works fine everywhere else.

The customer wants two things. An explanation as to what is causing the problem and a fix if one is possible. This is not an unreasonable request. I essentially want the same thing for my car. But how can this problem be fixed if it is not reproducible?

There are no coincidences in software. If it happens once, it can happen again. I do not care if the problem was caused by planetary alignment and cosmic rays… the planets can align again and there are always cosmic rays.

Your job as software developer or support technician is to gather as much information as possible to narrow down the problem and to try and find the root of the problem. Find out when it is likely to happen. Try to come up with a reproducible set of steps that will cause it again. With enough work you should be able to figure out what caused the initial issue. Once you find out the cause, then you can fix it like any reproducible issue.

The Customer is Always Right

The “customer is always right” is a typical business saying. In some ways the customer is always right because they choose whether to buy your software or not. But are they always right?

I’ve been watching a lot of the television show called House M.D. It is not your typical run of the mill evening drama. It features a doctor called Gregory House who uses his un-conventional style to solve diagnostic medical cases.

“Dr. Gregory House (Hugh Laurie) – Department Head: Department of Diagnostic Medicine. The show’s protagonist, Dr. Gregory House is a maverick diagnostician with a double specialty in infectious disease and nephrology. Dr. House is seemingly lacking in bedside manner and prefers to avoid direct contact with his patients whenever possible. Due to an infarction in his right thigh, House lost a substantial portion of the muscle in his upper leg and must use a cane to assist with walking. As a result, House is also forced to deal with constant physical pain, which he manages through a dependency on the prescription pain medication Vicodin. Although his behavior can border on antisocial or misanthropic, House is viewed as a maverick physician whose unconventional thinking and excellent instincts have afforded him a great deal of respect and an unusual level of tolerance from his colleagues and the medical world.” – http://en.wikipedia.org/wiki/List_of_House_characters

One of the sayings House has is that “everyone lies”. Often he proves this in the show by showing the dark side of seemingly innocent character who tries to hide information from the doctors they feel is irrelevant. Often times that information is just what House needs to solve the case.

I would not go as far as to say that I believe everyone lies but I would say that most people lie. In particular, customers lie. Often the lies are more subtle like in the show. Customers try to protect themselves by hiding information they feel is irrelevant, such as un-related changes to the system that they may not be aware effect your software. They also feed you disinformation based on how they understand your software works.

Every computer savvy user gets frustrated when you call tech support for your DSL line and you have to go through a huge list of steps to prove the problem is exactly what you told them it was. Why do you have to go through this? Because you lie! The support process requires that they can verify you have diagnosed the problem properly. This requires that you follow a long, boring series of baby steps.

So how should we handle customers that lie? We need to realize that they lie and walk them through each step. If they say “your software crashes”, walk them through the process and verify the steps they took to cause the problem. If they say, “nothing has changed in the configuration”, verify this for yourself. Something could have change without them realizing it.

Remember to always be respectful. Often customers lie without knowing it. You need to keep them happy.

The customer is always right but the customer lies.

Software Development Philosophy and Performance

Recently I have been working a lot with some people I would call “Old School Developers”. They were brought up in the “glory days” of the pre-internet era of development. I have been tasked with teaching an old dog new tricks. In this case, I am teaching Web Services development using Apache CXF and the Spring Framework to a few C++ developers of 20+ years. These people are really smart and have a good understanding of software development. What they need is a paradigm shift.

When working with these people, I constantly hear concerns about performance. This seems to be the number 1 issue on their mind. They need to be careful that they always achieve the best performance possible no matter how much extra complexity it adds to the code. These constraints even influence the design of the code where fundamental design changes are made for the sake of performance alone.

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” (Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.)

Programmers are notoriously bad at knowing when to optimize code. We end up focusing on the little things when we really should be concerned with reducing more broad performance issues. Is shaving off a few milliseconds a priority when you are dealing with remote web service call over the internet?

My philosophy on software development is to focus on the most important aspects first. As you might have guessed, performance does not make that list. The most important criteria for

  • Readability – Write Code that is readable. You or someone else will appreciate it when you have to come back to it at a later point in time.
  • Simplicity – Keep it simple. Only build in complexity where required and only to the degree required. You can always refactor it later.
  • Say Less – Less code is more. If you can leverage something that exists, do so.

Do not take this the wrong way. I am not trying to say here that performance is not to be considered or that it is not important. I am simply saying that it is not the most important issue. Once you have good readable and simple code, you can go back and optimize it where it needs it. If you start out with performance as a goal, you will never end up with readable and simple code. No one goes back over working and fast code with the intent to make it slower and more readable.