Data Compression and Heaps
I think we can all agree that Data Compression is a great new functionality in SQL Server 2008. I know many of you have been experimenting with it but there was one little catch I wanted you to be aware of.
When you apply page compression to a table the pages get compressed when they are full but there is an exception as you would have expected. When you have a heap the newly allocated pages are NOT compressed until you rebuild the table (or remove and reapply compression or add and remove a clustered index) unless you are using BULK INSERT.
I have added a test script so you can see it with your own eyes.
First the clustered table:
SET NOCOUNT ON
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[myClusteredTable]') AND type in (N'U'))
DROP TABLE [dbo].[myClusteredTable]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[myClusteredTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[TheDate] [date] NOT NULL,
[TheTime] [time](7) NOT NULL,
[SomeFiller] [varchar](256) NOT NULL,
CONSTRAINT [PK_myClusteredTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
--Add some records to the table with varying length
INSERT INTO dbo.myClusteredTable
VALUES (SYSDATETIME(), SYSDATETIME(), REPLICATE('W', CEILING(RAND() * 100) + 1))
GO 15000
--Check the space used
sp_spaceused myClusteredTable
GO
--Now apply compression
ALTER TABLE [dbo].[myClusteredTable] REBUILD PARTITION = ALL
WITH
(DATA_COMPRESSION = PAGE
)
GO
--Check the space used
sp_spaceused myClusteredTable
GO
--Add some more records to the table
INSERT INTO dbo.myClusteredTable (TheDate, TheTime, SomeFiller)
SELECT TheDate, TheTime, SomeFiller FROM myClusteredTable
GO
--Check the space used again
sp_spaceused myClusteredTable
GO
DROP TABLE myClusteredTable
As you can see the compression keeps going even when you insert new rows.
But what about the heap?
SET NOCOUNT ON
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[myHeap]') AND type in (N'U'))
DROP TABLE [dbo].[myHeap]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[myHeap](
[ID] [int] IDENTITY(1,1) NOT NULL,
[TheDate] [date] NOT NULL,
[TheTime] [time](7) NOT NULL,
[SomeFiller] [varchar](256) NOT NULL,
CONSTRAINT [PK_myHeap] PRIMARY KEY NONCLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
--Add some records to the table with varying length
INSERT INTO dbo.myHeap
VALUES (SYSDATETIME(), SYSDATETIME(), REPLICATE('W', CEILING(RAND() * 100) + 1))
GO 15000
--Check the space used
sp_spaceused myHeap
GO
--Now apply compression
ALTER TABLE [dbo].[myHeap] REBUILD PARTITION = ALL
WITH
(DATA_COMPRESSION = PAGE
)
GO
--Check the space used
sp_spaceused myHeap
GO
--Add some more records to the table
INSERT INTO dbo.myHeap (TheDate, TheTime, SomeFiller)
SELECT TheDate, TheTime, SomeFiller FROM myHeap
GO
--Check the space used again
sp_spaceused myHeap
GO
--Rebuild the heap?
ALTER TABLE myHeap REBUILD
GO
--Check the space used again
sp_spaceused myHeap
GO
--What about a bulk insert?
BULK INSERT TestDB.dbo.MyHeap
FROM 'c:\temp\test.txt'
WITH
(
FORMATFILE='c:\temp\myHeap.xml',
KEEPIDENTITY,
TABLOCK
)
GO
--Check the space used again
sp_spaceused myHeap
GO
--Rebuild the heap?
ALTER TABLE myHeap REBUILD
GO
--Check the space used again
sp_spaceused myHeap
GO
--Seems compression was indeed applied with BULK INSERT, go ahead and drop the test table
DROP TABLE myHeap
This was tested on SQL Server 2008 SP1 + CU2 (build 10.0.2714).
Comments
Anonymous
June 16, 2009
PingBack from http://blog.a-foton.ru/index.php/2009/06/17/data-compression-and-heaps/Anonymous
July 10, 2009
That's important info! (although I hope there are more clustered indexed with data compression than heaps). I'm not sure if the TABLOCK is required with bulk inserts, but I remember reading that "INSERT INTO tbl WITH (TABLOCK)" also makes page compression work in heaps. Cheers, S. Neumann