Floating Point Numbers again

By Michael Kay on February 27, 2006 at 02:18p.m.

The quality of the Saxon product benefits enormously from the contributions of a dozen or so "power-users": as soon as I upload a new release, they run it through all their regression tests and often give me the bad news within hours.

In this case it was a performance problem with the new floating-point to string conversion routines. Wolfgang Hoschek found that one or two of the (ten thousand) XQuery XQTS tests were running hideously slowly.

What happened was this. I had decided to bring the code for floating-point to string conversion in-house, for a couple of reasons: firstly, the IKVMC implementation that I was using on the .NET platform had unacceptable bugs, and secondly, Saxon was doing some very inefficient post-processing of the result of Java's float output to make it conform to XPath rules, and I felt there were savings to be had by avoiding this.

I looked around for some reusable code to do the job, and hit on some routines written by Jack Shirazi that  appeared suitable.  A few tests showed that they were sometimes producing an answer that was just a little bit too high or too low as a result of rounding errors, so I added a tweak to detect this situation and correct it. Some fairly thorough testing showed that after this change, it was getting the right answers, and some quick performance checks showed acceptable performance.

However, it turned out that there were a few cases where Shirazi's code was getting the answer badly wrong. Because it was calculating the exponent and mantissa separately, using floating point arithmetic it would sometimes format a value such as 0.00001 not as 1.0e-5 or 0.99999e-5 but as 9.99999e-5. My "tweak" was picking up that the answer was wrong and was then applying a million small adjustments to get it right - finally delivering the correct answer after about 30 seconds.

So: back to the drawing board. I decided to do a bit more searching of the literature and everything seemed to point to a paper published by Guy Steele and Jon White at SIGPLAN 1990. (They had actually written it ten years earlier, but being engineers rather than academics, publishing was not high on their list of priorities.) Over the weekend I forked out $10 to download a copy of the paper and set about implementing their algorithm, and today I uploaded the resulting replacement module. In due course, it will find its way into an 8.7.1 release.

Are there lessons to be learned? I could adopt the approach of taking less risk - not changing things in case they break. I don't believe in that strategy, at least not until I get to a point where there is a release I want to advertise as a stable, gold star release. Saxon got where it is by moving fast, and very often benefiting from the contributions of "power users" like Wolfgang. One thing that is important is to get rid of bugs fast once they are found: the bugginess of the product is the number of bugs multiplied by their half-life, and both factors need to be addressed. One thing that concerns me now (especially with the .NET platform becoming available) is that it's taking increasingly long to ship a release. I need to invest more in automating the process, especially in detecting genuine test failures.