Need help to copy Data from Snowflake to Azure Blob Storage via ADF Pipeline

Question

I want to create an Azure Data Factory pipeline to perform a bulk data copy from Snowflake tables to Azure Blob Storage. Since I have large tables to handle, I aim to organize the data into folders named after each table. Inside each table’s folder, the corresponding data will be copied into multiple part files, as shown below:

data/

table1/

    part-0000.parquet

    part-0001.parquet

table2/

    part-0000.parquet

    part-0001.parquet

Issue is : I am trying to use Dataset Parameters to specify the FilePath with item().Table_Name for folder but the copy activity is trying to copy to single file and then the Pipeline fails with "Max Limit Exceeded trying to unload to single File" I tried to set property as SingleFile :false but not working.

Can you help with this?

Accepted Answer

Hi ,

Thanks for reaching out to Microsoft Q&A.

Solution Steps

Enable Snowflake Query Partitioning:
- Use Snowflake's SQL COPY INTO command with parallelization by specifying the PARTITION BY clause in the query.
- In the ADF pipeline, specify the SQL query in the Source dataset that generates partitions. For example: SELECT * FROM table1 WHERE MOD(ROW_NUMBER() OVER(), 2) = 0
- Or partition based on data logic (ex: date range, region, etc.).
Set ADF Dataset Parameters:
- Create a dataset parameter in ADF for the folder path, ex:@dataset().folderPath.
- Bind the parameter dynamically to the sink path in the Copy Activity,
Disable Single File Setting:
- Ensure that the Copy Behavior in the sink dataset is set to Preserve Hierarchy or Flatten Hierarchy to allow for multiple files to be written.
- In the Sink Settings of the copy activity, uncheck the "Write to single file" option (if enabled). Also, confirm that the SingleFile property is explicitly set to false.
Dynamic Folder Creation for Each Table:
- Use the ForEach activity to loop through the list of Snowflake tables.
- Inside the loop, add a Copy Data activity with the following configuration:
- Source: The Snowflake table.
- Sink: Azure Blob Storage with dynamic pathing
  - container-name/@{item().TableName}/
Use Partition Count in Snowflake Copy Command:
- Snowflake's external unload supports generating multiple files using the max_file_size and parallel parameters. Ensure the data unload is set to generate multiple files.

Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.

Answer

Even though setting SingleFile = False didn't resolve the issue, I was able to fix it by reviewing the source dataset. As the output format is in Parquet, I tried to keep the Filename in the Dataset as Empty and included my path in the directory box. I moved the expression to the directory box, left the filename field empty, and was then able to copy the data as multiple files, overcoming the single-file limitation.

Share via

Need help to copy Data from Snowflake to Azure Blob Storage via ADF Pipeline

1 additional answer

Your answer