Customer Issues that are not Reproducible

A few weeks ago my Honda Civic was experiencing an intermittent clunking noise. I am by no means an expert in automobiles but I am fairly certain this noise was not “normal”. I proceeded to let the dealer know about this problem when I had it in for maintenance. I explained how it seems to make this noise sometimes when accelerating or decelerating.

Were they able to fix my problem? Of course not. They were not able to reproduce the problem and consequently did not fix anything. For car problems, we accept this and as a user of the car I am supposed to ignore this until it is either reproducible all the time or something major breaks. They did assure me that they did not see anything major wrong so there was no danger in driving the car.

In the world of software development, this just does not fly. If the user has a problem, they expect it to be fixed regardless of if they can reproduce the problem or even adequately explain it. For example, a support technician brought a customer issue to me where a server product hangs sometimes for no reason. It only happens on one machine and it works fine everywhere else.

The customer wants two things. An explanation as to what is causing the problem and a fix if one is possible. This is not an unreasonable request. I essentially want the same thing for my car. But how can this problem be fixed if it is not reproducible?

There are no coincidences in software. If it happens once, it can happen again. I do not care if the problem was caused by planetary alignment and cosmic rays… the planets can align again and there are always cosmic rays.

Your job as software developer or support technician is to gather as much information as possible to narrow down the problem and to try and find the root of the problem. Find out when it is likely to happen. Try to come up with a reproducible set of steps that will cause it again. With enough work you should be able to figure out what caused the initial issue. Once you find out the cause, then you can fix it like any reproducible issue.