Performance Tuning

Performance Tuning is one of those black arts in programming. It takes skill to do it properly. Often people end up attempting to optimize the wrong things for performance. As the great computer science wizard, Donald Knuth put it: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil”.

I think of it in these terms. Readability comes first and foremost because that leads to maintainability. If you have a performance issue, then worry about tuning performance. I am by no means saying you should completely ignore performance and brute force everything. You need to be aware of performance and do things in an optimal way. You should simply not go out of you way to make something faster at the cost of readability.

Occasionally you will be tasked with the job of performance tuning. On the last three major projects I have worked on, each of them required performance tuning at some point. For each of these there were some basic tools I used to go about looking for areas to optimize.

The Hunt

The first thing you must do when looking to boost performance is to go on the hunt. It is important to know what is slow before you can make it fast. You will be very surprised to find that most times the thing you think is slow may not be so slow and another thing that seemed to be trivial may be the cause of lots of performance issues.

Before going on the hunt, you first must have the proper tools. Here are some essential tools for tracking down performance issues:

Profiler – A code profiler allows you to see how long your application is taking doing various tasks. At my company, we use Eclipse for our Java development. It comes with a profiler as part of the testing and performance toolkit. There are lots of other commercial profilers out there that are likely much better. Pay attention to the ‘hot spots’ in your code that are executed more than others. Though it might not seem like it spends a lot of time each iteration, a small boost here could end up being a lot.
Poor Mans Profiler – Sometimes you might not have a profiler or you want to just look at a small section of code. In these cases, putting a few System.currentTimeMillis() will allow you to get some timings. In the current project I just worked on for performance optimization, the code already had extensive use of the Java Monitoring API (http://jamonapi.sourceforge.net/). Using this has the same effect as currentTimeMillis() but has a more refined API to work with. It can also help for seeing how fast certain calls are.
Performance Test Suites – I often like to write Unit Tests for specific functionality that I’m trying to optimize for performance. This way it is easier to profile and check performance on a specific part of the code. This way you can also do this in a unit test rather than starting the whole application.
Process Viewer – Task Manager on Windows and top on Unix are invaluable tools as well. These allow you to watch CPU usage when running performance tests. Often times a sure sign of a synchronization bottleneck in a multi-threaded application is watching a single CPU be maxed while the rest are idle. Always do development on a multi-core machine if you are writing multi-threaded applications so you can look for these issues.

Approaching your Target

After you have found a performance issue, it is time to attack the performance issue. You know where the performance issue lies but you don’t know what is the cause. There are a few things to look for.

Synchronization – A big performance issue I alluded to earlier is synchronization. In multi-threaded development, sometimes you need to work with shared objects. The easy way to do this in Java is to use the synchronized keyword. You need to be careful the scope of where this is used and keep it as narrow as possible. The CPU usage is a good indication of this problem. If this is your problem, you may want to look at using a modern concurrency library like the java.util.concurrent library for Java. The ConcurrentHashMap can solve many issues around synchronized maps and is much better than using Collections.synchronizedMap(). Many synchronization issues can be difficult to track down because a debugger cannot show you them.
Serialization – Serialization is another big performance hit. Anywhere you are going from data objects to XML, JSON or binary on disk or in memory, you have a performance hit. These operations are notoriously slow but are often necessary at times. You should make sure these are not being done more than they need to. Often times a cache on deserialization of objects can greatly improve performance here.
Nickels and Dimes – Often times there is not one single performance issue that is the cause of all of the problems. More than likely there are a few things that add up over time. If you shave off 1 millisecond from a call that is called 100000 times, you have saved yourself a second worth of processing time. This can often be better than shaving 50 milliseconds off of a call that is only called once. This is where your profiler and performance tests help out in knowing where the problem is.
Databases and Performance – If you are using a database and notice performance issues you should check a few things. Make sure you are using database queries. Most of the time the database can manipulate things faster than you can in code. Also make sure you have proper indexing on your database tables so the queries run fast. Sometimes things can be done faster manually in code. Make sure you run performance tests before an after to compare any changes.

The Cleanup

After you finish your performance tuning, it is very very important that you re-run your performance tests. You need to prove that the improvements you made had a positive effect on performance if they did not, then they weren’t needed and are more likely to introduce bugs than anything else. If the performance did not improve, throw out the change and return to the hunt.

Along this note, it is important to only be on the hunt for one issue at a time. If you make 2 changes at once, it is not possible to tell which one may have given the performance gain. Each change must be done in isolation so you can be sure each change is required.

Chris Dail