Freigeben über


Estimating the space savings with data compression

Like I had indicated in my previous blog, it is recommended that you estimate compression savings on the object of interest before actually enabling compression as enabling compression is an expensive operation. To show how to estimate data compression, I have created a very simplistic and definitely not realistic example to show case data compression estimates for the following table.

create table t1_big (c1 int, c2 int, c3 char(8000))

go

I will show you the compression estimates both for ROW and PAGE compression. You may recall that

· ROW compression removes unneeded bytes from the column values by storing them in variable length format.

· PAGE compression stores the repeating value only once per page and sets the pointer from the respective columns within the page.

Now load 6000 rows into this table. Since there is one row per page, the size of this table will be 48 MB (6000 * 8K = 48MB). Note that for the third column, we are inserting has same value for all the rows and that we are only inserting 60 characters even though the size of of the column is 8000.

declare @i int

select @i = 0

while (@i < 6000)

begin

insert into t1_big values (@i, @i + 6000, replicate (‘a’, 60))

            set @i = @i + 1

end

-- find the current size of the uncompressed table

EXEC sp_spaceused N't1_big'

The output of this command is as follows showing that the size of the table is 48 MB as we expect.

Name Rows Reserved Data Index_size unused

t1_big 6000 48008 KB 48000 KB 8 KB 0 KB

Now, estimate the size of compression by executing the following stored procedure. When you execute this stored procedure, you are instructing SQL Server to find compression savings with ROW compression on the table t1_big. The parameter 3 and 4 refer to index-id and partition-id. You can use this stored procedure to estimate compression savings even for an index or for a specific partition(s).

exec sp_estimate_data_compression_savings

            'dbo', 't1_big', NULL, NULL, 'ROW'

Here is the “re-arranged” output of the above command for better readability. There are few key points to note. First, the estimated space savings from ROW compression is around 99% for this example (the estimated size goes from 48MB to ½ MB). Though this kind of space savings are not realistic but it is easy to see why we got such a huge savings for the current schema. For example, in this case, the majority of space savings come from removing the trailing blanks from column C3. Second, when estimating compression savings, the SQL Server creates a sample table/index in tempdb and then compresses it with the requested compression mode to find the space savings. The stored procedure extrapolates the savings achieved on the sampled table to the original table.

object_name schema_name index_id partition_number

----------- ----------- ------- -----------------

t1_big dbo 0 1

size_with_current_compression_setting(KB)

-----------------------------------------

48008

size_with_requested_compression_setting(KB)

-------------------------------------------

472

sample_size_with_current_compression_setting(KB)

-----------------------------------------------

39648

sample_size_with_requested_compression_setting(KB)

--------------------------------------------------

392

Note, in the case of ROW compression, all SQL Server does is to remove the trailing blanks from the CHAR(8000) column type and remove extra bytes from integer values. Since we have the same value (60 characters) for all the rows for column C3, this value will need to be stored only once on the page with PAGE compression. So it is easy to see why PAGE compression will lead to huge space savings. Here is the estimate with PAGE compression.

-- estimate the PAGE compression

exec sp_estimate_data_compression_savings

            'dbo', 't1_big', NULL, NULL, 'PAGE'

Here is the output that shows the estimated compression to be

object_name schema_name index_id partition_number

----------- ---------------- ---------- -----------------

t1_big dbo 0 1

size_with_current_compression_setting(KB)

-----------------------------------------

48008

size_with_requested_compression_setting(KB)

-------------------------------------------

80

sample_size_with_current_compression_setting(KB)

-----------------------------------------------

39960

sample_size_with_requested_compression_setting(KB)

--------------------------------------------------

72

In my next blog, I will show you the estimates for a more realistic example.

Comments

  • Anonymous
    November 19, 2007
    very nice feature, indeed. One question though. Just in case, if that particular page which actually holds the value(60 characters) in this case gets corrupted or somehow damaged, what would happen to the pointers?

  • Anonymous
    November 19, 2007
    The comment has been removed

  • Anonymous
    November 19, 2007
    I attended your session at SQL Connections in Las Vegas and was excited with your presentation. As soon as I received the email with the November CTP, I downloaded it quickly so I could see how compression would work with my databases. But… did data compression not make this build? If not, when do you think it will be released?

  • Anonymous
    November 20, 2007
    thanks for your interest. data comrpession will be available starting from CTP-6.