Why you shouldn't compress SQL Server data and log files

Most knowledgable SQL Server DBAs know you shouldn't compress a database's data and log files, but I'll wager that few know the details behind why that's such a bad idea.  The general reply might be "Performance" -- and that's correct -- but the assumption that the performance hit you take when you compress data/log files is due to compressing/decompressing data is only part of the story.  There's a lot more to it than that.

SQL Server abstracts all of the I/O for data and log files in its UMS component.  UMS (User Mode Scheduler) allows the engine to schedule work and carry out I/O without making direct Win32 API calls.  This design permits the engine to support fibers without having separate code lines for threads and fibers, and it allows the engine to run on versions of Windows that don't support asynchronous file I/O (Win9x, WinME) without having to have separate code lines for asynchronous I/O and synchronous I/O.  You'll recall that SQL Server leverages NT's ability to carry out I/O operations asynchronously and to perform asynchronous I/O in a scatter/gather fashion.  Windows 9x/ME doesn't support any of this.  By abstracting this away in UMS, the engine merely schedules the work it needs to do, and UMS carries it out in the most efficient way possible.

One nuance of Windows' asynchronous I/O facility is that, depending on the API you use, you may or may not actually get an async operation back from the OS.  IOW, although you request an operation to be carried out asynchronously, Windows may decide to run it synchronously and may hold up your API call until the operation completes.  The OS always makes the final decision on whether an async I/O request is honored.  What happens when Windows decides not to honor an async I/O request depends on the API.  For ReadFile/WriteFile, they simply block until the operation completes and return TRUE.  You have to check their return values and respond accordingly -- you can't write code that expects the operation to complete at some point in the future if the operation actually finished immediately.  For ReadFileEx/WriteFileEx, the API returns TRUE regardless of whether the action is carried out synchronously or asynchronously.  I'm not aware of an easy way to detect that an async I/O request has been carried out synchronously using ReadFileEx or WriteFileEx. 

On Win9x/ME, UMS automatically schedules all file I/O to run synchronously.  It knows that the version of Windows on which it's running does not support async I/O, and it schedules I/O operations accordingly.  For NT, it always attempts to run file I/O operations asynchronously, but, as I've said, Windows may have other ideas.

One circumstance in which Windows never honors an async I/O request is when the file it's reading or writing is compressed.  When calling ReadFile or WriteFile against a compressed file, Windows always runs the operation synchronously, regardless of whether the caller requested an async I/O operation.  That's right:  compressing a file disables an app's ability to read or write it asynchronously.

So, not only are you paying the obvious cost of compression/decompression when you compress database data and log files, you're also paying the cost of switching from async I/O to synchronous I/O -- a change that can make a huge difference with a high-speed RDBMS such as SQL Server.  That the operations work at all in this situation is a testament to SQL Server's robustness, but the performance hit you take is rarely worth it, and that doesn't even touch on how compression affects reliability and recoverability.  Hard drives are cheap; don't compress your data and log files unless you have no other choice.

Comments

  • Anonymous
    May 11, 2005
    RePost : http://www.yeyan.cn/Database/CompressSQLServer.aspx

  • Anonymous
    June 04, 2006

    You will be surprised to know how many times this topic comes up during discussions with SQL Server...

  • Anonymous
    June 04, 2006
    This is a good summation of the age-old discussion if you should put your SQL dbs on a compressed disk...

  • Anonymous
    June 13, 2006
    Это следует знать! SQL Server 2000 не любит компрессированных файлов БД .mdf и .

  • Anonymous
    June 19, 2006
    Persone los pioneros non rabata. Great...

  • Anonymous
    June 29, 2006
    Asaspal. Memrano tu es besta. Amigo.

  • Anonymous
    July 04, 2006
    Your article is quite right, thanks.

  • Anonymous
    April 03, 2007
    that make sense....but what about if we have a very fast CPU, and a big database.....then it will be a bottle neck on the harddrive speed....so if we compress, will the we fast CPU overcome the speed of harddrive to make it faster in some situation the speed of CPU will totally over the speed of the harddrive, that will make everthing faster

  • Anonymous
    April 26, 2007
    PingBack from http://www.shawnjohnjr.idv.tw/wp/?p=115

  • Anonymous
    May 09, 2007
    Superb post, thanks again Ken. Your books are great, the Guru's Guide to T-SQL is always on my desk somewhere.

  • Anonymous
    January 19, 2009
    PingBack from http://honeyvirus.wordpress.com/2009/01/20/sql-server-on-compressed-filesystems/

  • Anonymous
    January 21, 2009
    PingBack from http://www.keyongtech.com/49904-should-not-compress-sql-database