Hi ,
Thanks for reaching out to Microsoft Q&A.
Few possible explanations for why your azure synapse Link for dataverse (with Finance & Operations) is generating multiple parquet files for a specific table, even though the appendOnly option is enabled and the partition is by the year.
Possible Causes:
- High Data Volume and Throughput
- If the table experiences high freq changes or large inserts/updates, synapse link may generate multiple Parquet files per partition (year) to optimize performance and avoid excessive file size.
- This behavior ensures that writes are not bottlenecked by a single large file.
- Auto Compaction and file splitting
- Synapse Link (or the underlying ADLS mechanisms) might be splitting large data writes into multiple smaller Parquet files instead of appending to a single file.
- This happens because appending to an existing Parquet file is inefficient, and it's preferable to create new files.
- If the table is heavily updated, Synapse Link might create multiple versions of the data, leading to new files instead of appending to the existing one.
- Unlike a static partition strategy (where only new data lands in a single file), high-frequency updates can trigger multiple file generations.
- Schema Evolution / Changes
- If there are frequent schema changes (new columns being added dynamically), Synapse Link may generate new files instead of modifying existing ones.
- If a file reaches a certain size threshold (typically around 100MB to 1GB, depending on system settings), Azure automatically splits the data into smaller files.
Possible Solutions:
- Check Table Activity: Use Dataverse monitoring tools or Azure Synapse monitoring to see if this table experiences more frequent changes than others.
- Manually Optimize Files: If having fewer Parquet files is preferred, consider running a periodic merge (using Spark or Data Flow in Azure Synapse to compact small files).
- Check Append Only Behavior: Ensure that AppendOnly is actually functioning correctly for this table. You can try disabling and re-enabling Synapse Link for that specific table.
- Validate Schema Evolution: If this table undergoes schema changes more frequently than others, consider stabilizing its structure to reduce unnecessary file generation.
Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.