DATA_COMPRESSION = COLUMNSTORE_ARCHIVE not working for partitioned table

Rajiv Kumar Tiwari 0 Reputation points
2024-09-13T07:33:21.6633333+00:00

I created partition table(24 partition) with DISTRIBUTION = ROUND_ROBIN,   CLUSTERED COLUMNSTORE INDEX .

Table size was 300 GB but after DATA_COMPRESSION = COLUMNSTORE_ARCHIVE the size increased many folds(7-8 times).Does partitioned table archive is not supported in dedicated SQL Pool?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,875 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 18,906 Reputation points
    2024-09-13T08:49:50.5233333+00:00

    Hi Rajiv Kumar Tiwari,

    Thanks for reaching out to Microsoft Q&A.

    In synapse analytics dedicated SQL pools, applying 'DATA_COMPRESSION = COLUMNSTORE_ARCHIVE' on partitioned tables should be supported, but the behavior you're seeing suggests a few potential causes for the size increase:

    • Columnstore indexes, especially with 'COLUMNSTORE_ARCHIVE' compression, rely heavily on how the data is stored and organized. If the partitioning of the table leads to fragmented or unevenly distributed data across partitions, it may not compress as efficiently. The 'COLUMNSTORE_ARCHIVE' mode prioritizes maximum compression, but for fragmented data, it might lead to larger sizes due to overhead in organizing the data blocks.
    • Data Skew or Distribution: If the 'ROUND_ROBIN' distribution results in data skew (uneven data distribution across partitions), this could also negatively impact the compression. A 'HASH' or 'REPLICATE' distribution could sometimes be more efficient, depending on the table’s usage patterns.
    • The compression algorithms used in 'COLUMNSTORE_ARCHIVE' can sometimes lead to larger metadata if the data isn’t easily compressible or if the process introduces too much overhead in metadata.

    To resolve the issue, consider the following:

    • Reviewing partitioning and distribution strategies to ensure even distribution across partitions.
    • Testing with different compression settings or using regular 'COLUMNSTORE' compression without 'ARCHIVE' to compare size differences.
    • Analyzing your data for patterns that might impact compression efficiency (ex., high variance in values)

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.