Data Compression: Why Do we need it?
As announced in Tech-Ed 2007, data compression is a new and exciting feature targeted to be available in SQL Server 2008. This is a huge topic to be covered in one BLOG post, I have decided to break it into a series of posts, each building on the previous ones. If you are interested discussions on any specific topic on data compression, please send me a note and I will make sure that I include a discussion on it in my subsequent BLOGs.
This is the first in the series of BLOG entries for data compression feature in SQL Server 2008.
Starting with SQL Server 2008 release, SQL Server is targeted to support native data compression. It will allow you to compress a table, or an index or one or more of their partitions to save space on the disk. I will cover about the details on these compressions in later BLOG entries, but let us start with one fundamental question: Why Compression? This may very well be a non-question for our customers who have been waiting for this feature for a while. But still, it is useful to visit or re-visit the reasons why data compression is needed or useful.
One obvious reason is to save the cost of disk. While this may seem strange given that the disks are cheap. We all can buy 100s of GB disk for less than $100.00. So what all this fuss is about? First reason, for the uninitiated, is that the disks used for high-end systems are not cheap. Secondly, there is rarely a single copy of the production data. For example, if you are using high availability features like replication, log shipping or mirroring, you will have at least one more copy. Now what about test environment? There is one copy there as well. All these add up to the cost of hardware. Third, how about backups? If you have many backups of your database over time, just multiply the cost. Agreed that backup can be stored on less expensive media but still there is some cost associated with it.
Second reason is the cost of managing the data. Larger the database, it takes longer to do the backup, recovery. Similarly for running DBCC commands, rebuilding indexes, and bulk import/export. Clearly if these commands are IO bound, then if we could reduce the size of the data, they will run faster. Even better, they will have less of an impact on the concurrent workload in your system. One interesting point is that if your database is compressed, the backup will be automatically smaller. So how does this relate to backup compression, a new feature in SQL Server 2008. Well, as you will see in subsequent BLOG posts, backup compression is orthogonal to data compression and can be used together.
Third reason is memory. Don’t we all wish we had more memory on our servers? Well, if the data is compressed, you can fit more data in the same memory. So if you could compress the data 50%, then suddenly you have increased your memory 100% (i.e. you can fit double the size of the data). Is this not fantastic? Even if you have 64-bit machine with the capability for huge amount of addressable memory, the databases, for most customers, is many order of magnitude larger than the memory. So compression will benefit even for servers running on 64-bit architecture. Clearly your IO bound workloads are likely to see increase in throughput with data compression as there is less data to be read.
Hope this has created some excitement in you for the data compression feature. More in the future BLOGs.
Comments
Anonymous
September 29, 2007
PingBack from http://www.artofbam.com/wordpress/?p=3939Anonymous
September 29, 2007
very good article, i think it is helpful for me and some others.Anonymous
September 30, 2007
Dal team dello Storage Engine di SQL Server arrivano alcune notizie interessanti a proposito delle nuoveAnonymous
September 30, 2007
I always wondered why previous SQL editions didn't at least include compression for database backup and restore. I always ended up compressing my backups and was surprised at the ratios achieved. One question though, doesn't compressed data actually require /more/ memory since both the compressed data and decompressed data has to be stored at a time? Though I guess selective compression and decompression mitigates some of that to the point where it's beneficial...Anonymous
September 30, 2007
Yes, backup compression was requested before as well. We finally did it! On your other question, we only store compressed data so only 1 copy. We de-compress at a row level when the row is retrieved.Anonymous
October 01, 2007
Aren't you then having a copy of the row in compressed form and a copy of the row in decompressed form both in memory at the same time? :) Of course that still is a huge memory savings with all the other rows compressed for a single table, and the larger the table the greater the savings, so I get that..Anonymous
October 01, 2007
Like I was saying, we keep the data compressed. WE de-compress only when the row is needed. So at that time, there are two copies of the row (actually of the requested columns), one that is compressed and other that is not. Note, this is for 'that' row onlyAnonymous
October 01, 2007
Like I was saying, we keep the data compressed. WE de-compress only when the row is needed. So at that time, there are two copies of the row (actually of the requested columns), one that is compressed and other that is not. Note, this is for 'that' row onlyAnonymous
August 08, 2011
how helpful is data compression in file transmission??Anonymous
August 10, 2011
Eric: can you please describe your scenario in more detail. The data is returned to SQL client un-compressed. The data stays compressed in the persistent storage thanks SunilAnonymous
October 07, 2014
Can you please explain how the data compression going to impact with row versioning.