Synapse Link Dataverse F&O Multiple Files Partitions

Manuel Alejandro Estrada Doñes 20 Reputation points
2025-03-05T15:46:08.8833333+00:00

Hi Team

We have an Azure Synapse Link Dataverse with F&O, with a partition by year and the option appendOnly checked.

In all my tables I just getting one file per year that make me sense, but i have one table that for some reason is generating multiple files in one year.

Do you have any idea about this issue?

to clarify the files are parquet files.

Thanks in advance for your comments.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,228 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 29,361 Reputation points MVP
    2025-03-05T17:01:39.01+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    Few possible explanations for why your azure synapse Link for dataverse (with Finance & Operations) is generating multiple parquet files for a specific table, even though the appendOnly option is enabled and the partition is by the year.

    Possible Causes:

    1. High Data Volume and Throughput
      • If the table experiences high freq changes or large inserts/updates, synapse link may generate multiple Parquet files per partition (year) to optimize performance and avoid excessive file size.
      • This behavior ensures that writes are not bottlenecked by a single large file.
    2. Auto Compaction and file splitting
      • Synapse Link (or the underlying ADLS mechanisms) might be splitting large data writes into multiple smaller Parquet files instead of appending to a single file.
      • This happens because appending to an existing Parquet file is inefficient, and it's preferable to create new files.
      Background Delta Synchronization
      • If the table is heavily updated, Synapse Link might create multiple versions of the data, leading to new files instead of appending to the existing one.
        • Unlike a static partition strategy (where only new data lands in a single file), high-frequency updates can trigger multiple file generations.
    3. Schema Evolution / Changes
      • If there are frequent schema changes (new columns being added dynamically), Synapse Link may generate new files instead of modifying existing ones.
      Storage Constraints & Optimization Rules
      • If a file reaches a certain size threshold (typically around 100MB to 1GB, depending on system settings), Azure automatically splits the data into smaller files.

    Possible Solutions:

    • Check Table Activity: Use Dataverse monitoring tools or Azure Synapse monitoring to see if this table experiences more frequent changes than others.
    • Manually Optimize Files: If having fewer Parquet files is preferred, consider running a periodic merge (using Spark or Data Flow in Azure Synapse to compact small files).
    • Check Append Only Behavior: Ensure that AppendOnly is actually functioning correctly for this table. You can try disabling and re-enabling Synapse Link for that specific table.
    • Validate Schema Evolution: If this table undergoes schema changes more frequently than others, consider stabilizing its structure to reduce unnecessary file generation.

    Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.