DATA_COMPRESSION = COLUMNSTORE_ARCHIVE not working for partitioned table

Rajiv Kumar Tiwari 0

I created partition table(24 partition) with DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX .

Table size was 300 GB but after DATA_COMPRESSION = COLUMNSTORE_ARCHIVE the size increased many folds(7-8 times).Does partitioned table archive is not supported in dedicated SQL Pool?

Vinodh247 18,906 Reputation points

2024-09-17T06:00:53.19+00:00

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will help us to close this thread.
Vinodh247 18,906 Reputation points

2024-09-18T10:09:11.3966667+00:00

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

1 answer

Vinodh247 18,906 Reputation points

2024-09-13T08:49:50.5233333+00:00
Hi Rajiv Kumar Tiwari,

Thanks for reaching out to Microsoft Q&A.

In synapse analytics dedicated SQL pools, applying 'DATA_COMPRESSION = COLUMNSTORE_ARCHIVE' on partitioned tables should be supported, but the behavior you're seeing suggests a few potential causes for the size increase:

Columnstore indexes, especially with 'COLUMNSTORE_ARCHIVE' compression, rely heavily on how the data is stored and organized. If the partitioning of the table leads to fragmented or unevenly distributed data across partitions, it may not compress as efficiently. The 'COLUMNSTORE_ARCHIVE' mode prioritizes maximum compression, but for fragmented data, it might lead to larger sizes due to overhead in organizing the data blocks.

Data Skew or Distribution: If the 'ROUND_ROBIN' distribution results in data skew (uneven data distribution across partitions), this could also negatively impact the compression. A 'HASH' or 'REPLICATE' distribution could sometimes be more efficient, depending on the table’s usage patterns.

The compression algorithms used in 'COLUMNSTORE_ARCHIVE' can sometimes lead to larger metadata if the data isn’t easily compressible or if the process introduces too much overhead in metadata.

To resolve the issue, consider the following:

Reviewing partitioning and distribution strategies to ensure even distribution across partitions.

Testing with different compression settings or using regular 'COLUMNSTORE' compression without 'ARCHIVE' to compare size differences.

Analyzing your data for patterns that might impact compression efficiency (ex., high variance in values)

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.
Please sign in to rate this answer.
Rajiv Kumar Tiwari 0 Reputation points

2024-09-14T04:29:55.5833333+00:00

The partition are equally distributed but running the query to check the space of partitioned table after archive showing 8 times increase in space whereas the archive on same table without partition saving around 33 per space.

Is there any command or steps I need to follow?

Rajiv Kumar Tiwari 0 Reputation points

2024-09-14T07:47:53.2933333+00:00

The partition are equally distributed but running the query to check the space of partitioned table after archive showing 8 times increase in space whereas the archive on same table without partition saving around 33 per space.

Is there any command or steps I need to follow?

Vinodh247 18,906 Reputation points

2024-09-15T10:08:47.7433333+00:00

Check Partition Distribution: You mentioned the partitions are equally distributed, but running the following query can help confirm the row distribution across partitions. This ensures that the data is indeed evenly distributed across partitions.

SELECT partition_number, COUNT(*) as row_count

FROM sys.dm_pdw_nodes_db_partition_stats

WHERE object_id = OBJECT_ID('your_table_name')

GROUP BY partition_number

ORDER BY partition_number;

Rebuild the Index with Compression:

Rebuilding the columnstore index with the compression option explicitly applied might help. This ensures that each partition has the COLUMNSTORE_ARCHIVE compression applied properly.

ALTER INDEX ALL ON your_table_name

REBUILD WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE);

Check Compression Impact on Each Partition:

Use the following query to inspect the size of the partitions individually:

SELECT partition_id, SUM(used_pages) * 8 / 1024 as partition_size_MB

FROM sys.dm_pdw_nodes_db_partition_stats

WHERE object_id = OBJECT_ID('your_table_name')

GROUP BY partition_id;

Compare this across partitions to see if certain partitions are disproportionately larger after compression.

Check for Large Delta Stores:

If the columnstore index has a large number of rows in the delta store (the part of the index that stores data before it is compressed into the columnstore), that could lead to inefficient compression. Run this query to check the delta store:

SELECT [partition_number], row_group_id, state_desc, total_rows

FROM sys.dm_pdw_nodes_db_column_store_row_group_physical_stats

WHERE object_id = OBJECT_ID('your_table_name');

Look for row groups in the OPEN or CLOSED states with many rows, which could indicate inefficient compression.

Drop and Recreate the Index:

If rebuilding doesn't resolve the issue, you can try dropping and recreating the index with compression. Dropping and recreating the index might help realign the partitions for better compression.

DROP INDEX index_name ON your_table_name;

CREATE CLUSTERED COLUMNSTORE INDEX index_name ON your_table_name WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE);

Further Considerations

Data Skew: Even though the partitions are equally distributed by row count, ensure that the data itself isn't skewed within each partition (ex., highly repetitive values or large text fields in certain partitions could affect compression efficiency).

Testing on Subset: If possible, test the behavior of COLUMNSTORE_ARCHIVE on a subset of the partitions or a similar table to ensure it compresses as expected.

Bhargava-MSFT 30,891 Reputation points Microsoft Employee

2024-09-17T15:33:59.5833333+00:00

Thank you Vinodh247

Hello Rajiv Kumar Tiwari,

I am checking to see if you have any further questions here.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

DATA_COMPRESSION = COLUMNSTORE_ARCHIVE not working for partitioned table

1 answer

Your answer