Jaa


Floating point fun

I've been an advocate of using SQL Server for more than a mere datastore for about as long as I've been working with the product.  Too many times, I've walked into development shops only to discover that they're using SQL Server no differently than they'd use Access or some other DBMS with far fewer features because they take a "least common denominator" approach -- only using features that most DBMSs have in common.  I always believed that that was a mistake and that it was usually the result of developers not taking the time to get acquainted with the facilities and features the product offers.  I've been a long-time proponent of using "what comes in the box" -- regardless of whether we're talking about SQL Server, the C++ STL, the Delphi VCL, the .NET Framework, or some other piece of the software development puzzle.  For me, it just made sense to at least look at leveraging the features you get free with the development tool(s), DBMSs, and OS platform you're using rather than needlessly duplicating them in custom code.  It's been my experience that the code coming out of places like Microsoft, Oracle, Borland, IBM, and their industry partners is generally of a higher quality than the average developer is capable of producing.  To put it succinctly, the average developer isn't likely to do as well as a large software vendor in building the plumbing and widgets the typical app requires, and may do far worse.  For me, it's just commonsense to at least take a look at using these built-in facilities before you build your own.  It's hard to do that when you don't even know about them.

For SQL Server, this means becoming intimately familiar with Transact-SQL and the many other facilities provided by the product for building applications.  It means avoiding pulling data down to the client or to the middle tier when you can and letting the server do what it does best -- rip through data via set-oriented operations.

That said, there are times when it makes sense to move things outside the server or at least consider doing so.  One of those is when performing certain types of floating point calculations.  I first documented the fact that SQL Server's Transact-SQL doesn't correct for floating point errors in The Guru's Guide to Transact-SQL.  Here's the example query I offered:

CREATE TABLE #rand
(k1 int identity,
c1 float DEFAULT (
(CASE (CAST(RAND()+.5 AS int)*-1) WHEN 0 THEN 1 ELSE -1 END)*(CONVERT(int, RAND() * 100000) % 10000)*RAND()
)
)

INSERT #rand DEFAULT VALUES
INSERT #rand DEFAULT VALUES
INSERT #rand DEFAULT VALUES
INSERT #rand DEFAULT VALUES
INSERT #rand DEFAULT VALUES
INSERT #rand DEFAULT VALUES
INSERT #rand DEFAULT VALUES
INSERT #rand DEFAULT VALUES
INSERT #rand DEFAULT VALUES
INSERT #rand DEFAULT VALUES

SELECT SUM(c1) FROM #rand

SELECT * INTO #rand2 FROM #rand ORDER BY c1

SELECT SUM(c1) FROM #rand2

SELECT * INTO #rand3 FROM #rand2 ORDER BY ABS(c1)

SELECT SUM(c1) FROM #rand3

 

-----------------------------------------------------
-10378.804170841649

                                                     
-----------------------------------------------------
-10378.804170841646

                                                     
-----------------------------------------------------
-10378.804170841648

This query fills a temp table with some random floating point values, then totals those values three different ways.  The only difference between the three totals is the ordering of the terms.  Languages that don't correct for floating point rounding errors are susceptible to returning different results for the same numeric computation stated in different ways -- heresy in the world of mathematics.  All I do in the example above is change the order by which I'm traversing the values.  Because Transact-SQL doesn't correct for floating point rounding errors, the three totals often differ from one another.

What can you do about this?  One option is to switch to a fixed-point type, such as decimal or numeric, like so:

CREATE TABLE #rand
(k1 int identity,
c1 decimal(10,4) DEFAULT (
(CASE (CAST(RAND()+.5 AS int)*-1) WHEN 0 THEN 1 ELSE -1 END)*(CONVERT(int, RAND() * 100000) % 10000)*RAND()
)
)

Of course, this alleviates the problem because you're no longer working with floating point values.  The problem with this is that you're also not using the processor's FPU circuitry, which will slow down the computation, perhaps dramatically.  With today's chips, you might not even notice the difference, but with more complex expressions computed over large amounts of data, I'll bet you would.

Foregoing any discussion of SQL Server 2005 features for now, your best option may be to simply move the calculation completely out of Transact-SQL into a client-side language that properly handles floating point rounding errors.  IOW, the best solution of all may be to avoid doing floating point computations of any complexity in Transact-SQL in the first place.  While I'm a stickler for using "what comes in the box", and I'm an even bigger stickler for using the right tool for the job, and, in this case, T-SQL may not be the best tool for the job.

Comments

  • Anonymous
    May 21, 2005
    I ran your script quite a few times on 2000 SP3 and SP4 beta and always got the same result for each three results.

    The other thing I don't really understand is why we have as many digits as we have. For float there is only 15 digit precision, right? So only 10378.8041708416 is of value where 10378.804170841649 as two places more than required. I assumed that in 2005 there is some change in this respect because select cast (1 as float) / cast (10 as float)
    = 0.1 rather than 0.10000000000000001 in 2000.

  • Anonymous
    October 15, 2007
    PingBack from http://suburbandestiny.com/tech/?p=291

  • Anonymous
    October 25, 2007
    I thought I’d perhaps blog about this although I don’t truly think this is of any real significance unless