Hi ,
Thanks for reaching out to Microsoft Q&A.
By default in Synapse dedicated SQL pool, if you do not specify an index type when creating a table, it will typically create a clustered columnstore index. To prevent that, you must explicitly state you want a HEAP (or a different index type). Or, if you want to use hash distribution but still have no index.
- HEAP: Instructs Synapse to create the table without any index (i.e., a heap).
- DISTRIBUTION: Specifies how data is distributed (
HASH(column_name)
,ROUND_ROBIN
, orREPLICATE
).
Note: There is no global switch to turn off the default index creation; you must specify “HEAP” (or another index type) explicitly in each
CREATE TABLE
statement.
Performance and Storage: CCI vs. Heap
a) Clustered Columnstore Index (CCI)
- Compression: Often 5–10x (sometimes even more) compared to rowstore/heap tables.
- Query Performance: Can be significantly faster (10–100x) for analytical queries that scan large volumes of data, especially with aggregations across many columns.
- Recommended Use: Best for large fact tables in data warehouse scenarios where you primarily run read heavy analytics/aggregations.
b) Heap Tables
- No Automatic Compression:
- Data is simply stored in pages without an index, so you don’t typically get the compression benefits.
- Faster Write-Load in Some Cases:
- Loads can be slightly faster for staging or transient tables because no index maintenance is required during insert.
- Less Efficient for Analytics:
- Queries scanning large heap tables can become slower unless you create additional indexes or ultimately convert them to columnstore for analytical workloads.
Rule of Thumb:
- Storage Savings from CCI: ~5–10x compression vs. rowstore/heap in many real-world scenarios.
- Query Speedup with CCI: Commonly 10x faster for scan-heavy, aggregation-focused queries, and it can go above 50x or 100x in certain extreme cases.
- Practical Tips
- Load into a Heap, Then Convert:
- For large fact tables where ingestion speed matters, you can load data into a heap quickly, then create a clustered columnstore index afterward. This can be more efficient than maintaining columnstore during the initial bulk load.
- Use CCI for Analytics:
- For long-term analytical tables, leverage clustered columnstore unless you have a specific reason (unusual query patterns or smaller reference tables) to use a heap or clustered index.
- Experiment and Measure:
- Performance gains vary with data volume, cardinality, and query patterns. Always measure your specific workload in a test environment.
Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.