A Unicode Compression example
Now that we have Unicode compression available in SQL Server 2008R2 as described in, let me take a simple example using AdventureWorksDW database to show you the additional compression that can be achieved on tables with one or more columns of type NCHAR or NVARCHAR.
use [AdventureWorksDW2008]
-- the table FactResellerSales is a FACT table with three columns of type
-- NVARCAHR types. Let us find the average length of each of these columns
select AVG( LEN(salesordernumber))
from FactResellerSales
This returns a value of 7. This represents number of character in the string and not the numnber of bytes.
select AVG( LEN(CarrierTrackingNUmber))
from FactResellerSales
This returns a value of 12. This represents number of character in the string and not the numnber of bytes.
select AVG( LEN(CustomerPONumber))
from FactResellerSales
This returns a value of 12. This represents number of character in the string and not the numnber of bytes.
Based on these computations, we have approximately 31 characters of NVARCHAR types. With Unicode compression, we will expect that the average length of the row will reduce further by 31 bytes (i.e. we will save 1 byte per UCS-2 representation). To measure this, we will attach the identical copy of AdventureWorks2008 database to two different instances of SQL Server, one instance of SQL 2008RTM and other instance of SQL 20088R2 and measure the average length of the row after applying the ROW compression. Before applying the ROW compression, let us measure the average length of the row in both instances using the following query
select avg_record_size_in_bytes
from sys.dm_db_index_physical_stats (db_id('AdventureWorksDW2008'),
object_id('dbo.factresellersales'),
null, null, 'SAMPLED')
where object_name (object_id) like '%ResellerSales%' AND index_id =1
In both instances, the average length of the row comes out to be around 191 bytes. This should not surprise you as there are no changes to the uncompressed data. Now, let us execute the command to enable ROW compression in each of the instances using the following command
alter table dbo.factresellersales rebuild
with (
data_compression = ROW)
And, now let us find the average row length in each of the instances. Since Unicode compression is available as part of ROW compression, we will expect that the average length of the row will be around 31 bytes less in SQL2008R2. Let us run the following query again to find the average row length after the row compression
select avg_record_size_in_bytes
from sys.dm_db_index_physical_stats (db_id('AdventureWorksDW2008'),
object_id('dbo.factresellersales'),
null, null, 'SAMPLED')
where object_name (object_id) like '%ResellerSales%' AND index_id =1
Here are the vallues returned
· Average Row length in SQL2008 after row compression = 132 (around 30% compression)
· Average Row length in SQL2008R2 after row compression = 100 (around 45% compression). This extra savings of 32 bytes is what we had expected.
As you can see that in this example,the unicode comrpession gave us extra 15% compression with ROW compression. We tried the Unicode compression with some in-house customer workloads and we observed the compression savings as shown in the table below. Of course, your savings will depend on your data and the schema but the key point to know is that space savings with Unicode compression can be significant if you consider tables in tera-byte range. In our testing we found that the additional impact on CPU is minimal.
ROW Compression |
ROW with UNICODE |
9% |
43% |
30% |
53.2% |
45% |
64% |
30% |
45% |
I enccourage you to download CTP2 verision (https://www.microsoft.com/sqlserver/2008/en/us/r2.aspx) of SQL2008R2 and see for yourself the kind of savings you can achieve with your workload.
Thanks
Sunil Agarwal
Comments
Anonymous
March 12, 2010
Hi Sunil, Please let me know deom where we can download AdventureWorksDW2008 database Thanks SureshAnonymous
March 12, 2010
Suresh:: Sample Databases for Microsoft SQL Server 2008R2 November CTP This release is dedicated to the sample databases that ship for Microsoft SQL Server 2008R2 November CTP. See Database Prerequisites for SQL Server 2008R2 for feature configurations required for installing the sample databases. See Installing SQL Server 2008R2 Databases for step by step installation instructions. All of the AdventureWorks sample databases for SQL Server 2008R2 are included in the AdventureWorks2008R2_NovemberCTP.exe self extracting zip file. When this zip file is run, all content is unzipped to a temporary directory and an installer application is automatically started. This application copies the database scripts and data files to the directory specified, and optionally installs sample databases. If some databases cannot be installed, a brief explanation is displayed and a link is provided for more information about how to resolve the installation issues. Thanks SunilAnonymous
March 14, 2010
Thanks Sunil for your quick turnaround.. I was able to install required database for this example.Anonymous
June 01, 2010
if compression was enabled in 2008, will the additional benefits from unicode compression be implemented when I upgrade to R2, or will I have to re-compress?Anonymous
June 01, 2010
if compression was enabled in 2008, will the additional benefits from unicode compression be implemented when I upgrade to R2, or will I have to re-compress?Anonymous
June 01, 2010
never mind - I found it at msdn.microsoft.com/en-us/library/ee240835.aspx. new rows will be compressed with the new functionality, but the object will need to be rebuilt to unicode compress the whole thing.Anonymous
June 02, 2010
John, You are right. When you upgrade to SQL2008R2, if you update unicode data in existing row or inset new row, you will get unicode compression. you will need to re-compress to get the unicode compression benefit for all rows.. thanks Sunil