Thanks for reaching out to Microsoft Q&A.
In synapse analytics dedicated SQL pools, applying 'DATA_COMPRESSION = COLUMNSTORE_ARCHIVE' on partitioned tables should be supported, but the behavior you're seeing suggests a few potential causes for the size increase:
- Columnstore indexes, especially with 'COLUMNSTORE_ARCHIVE' compression, rely heavily on how the data is stored and organized. If the partitioning of the table leads to fragmented or unevenly distributed data across partitions, it may not compress as efficiently. The 'COLUMNSTORE_ARCHIVE' mode prioritizes maximum compression, but for fragmented data, it might lead to larger sizes due to overhead in organizing the data blocks.
- Data Skew or Distribution: If the 'ROUND_ROBIN' distribution results in data skew (uneven data distribution across partitions), this could also negatively impact the compression. A 'HASH' or 'REPLICATE' distribution could sometimes be more efficient, depending on the table’s usage patterns.
- The compression algorithms used in 'COLUMNSTORE_ARCHIVE' can sometimes lead to larger metadata if the data isn’t easily compressible or if the process introduces too much overhead in metadata.
To resolve the issue, consider the following:
- Reviewing partitioning and distribution strategies to ensure even distribution across partitions.
- Testing with different compression settings or using regular 'COLUMNSTORE' compression without 'ARCHIVE' to compare size differences.
- Analyzing your data for patterns that might impact compression efficiency (ex., high variance in values)
Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.