Azure Data Factory pipeline to extract data from Snowflake and load it into Azure Blob Storage

Question

Hi Team,

Azure Data Factory pipeline to extract data from Snowflake and load it into Azure Blob Storage. However, I’m facing some challenges with the data copy process and could use some assistance.

I have several Tables in my snowflake schema. Need to perform a bulk copy.

Some of the Tables have columns with Datatype "Timestamp_NTZ or Timestamp_TZ", Need to copy all the data into Azure Blob Storage.

Pipeline has -->

Lookup --> To find all Tables in the Snowflake Schema
Foreach --> Loop through all the Tables from Lookup1

Copy Activity --> Has code to cast the Data typed column. My copy activity is failing.

So, Is there a way I can perform copy activity successfully without casting as there might be different columns with this Data type in multiple tables. (FYI, Tried this route but facing a different error) [Can I implement Staging? Not sure how it works? Any documentation or instructions please?]

If have to cast, How can I approach?

User's image

Appreciate' your suggestions and Help.

Thank you,

Nalini.

Answer

I was able to fix the issue and copy data successfully with Json format instead of Parquet. But, I see a small drop in the number of rows copied for huge tables. Need help to understand the reason for the data drop.

Thanks.

Answer

@Nalini Bhavaraju

I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .

Ask: Azure Data Factory pipeline to extract data from Snowflake and load it into Azure Blob Storage. However, I’m facing some challenges with the data copy process and could use some assistance.

I have several Tables in my snowflake schema. Need to perform a bulk copy.

Some of the Tables have columns with Datatype "Timestamp_NTZ or Timestamp_TZ", Need to copy all the data into Azure Blob Storage.

Pipeline has -->

Lookup --> To find all Tables in the Snowflake Schema Foreach --> Loop through all the Tables from Lookup1

Copy Activity --> Has code to cast the Data typed column. My copy activity is failing.

So, Is there a way I can perform copy activity successfully without casting as there might be different columns with this Data type in multiple tables. (FYI, Tried this route but facing a different error) [Can I implement Staging? Not sure how it works? Any documentation or instructions please?]

If have to cast, How can I approach?

User's image

Appreciate' your suggestions and Help.

Solution: I was able to fix the issue and successfully copy data from Snowflake to Azure Blob Storage via Azure Data Factory Pipeline by change the dataset format to Json instead of Parquet.

Regarding the drop in data records for large tables, there are potential reasons this might be happening:

If there are data type mismatches between Snowflake and Azure Blob Storage, some records might not be copied correctly. Ensure that all data types are compatible and correctly mapped.
Large datasets can sometimes cause timeouts or performance issues, leading to incomplete data transfers. Check the performance settings and consider increasing the timeout values in your pipeline.
Azure Data Factory has a property called enableSkipIncompatibleRow. If this is set to True, rows that don't match the schema or have data issues will be skipped
Ensure this property is set correctly based on your requirements.
If you're using incremental data loads, ensure that the watermark columns are correctly configured. Incorrect watermark settings can lead to missing records

High concurrency and parallelism settings can sometimes cause issues with large datasets. Try adjusting these settings to see if it improves the data transfer.

Steps to Troubleshoot

Check Logs: Review the pipeline logs to identify any errors or warnings during the data transfer.
Data Validation: Compare the record counts between Snowflake and Azure Blob Storage to identify any discrepancies.
Retry Mechanism: Implement a retry mechanism in your pipeline to handle transient errors.
Incrementally copy multiple tables using Azure Data Factory

If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.

Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Share via

Azure Data Factory pipeline to extract data from Snowflake and load it into Azure Blob Storage

2 answers

Your answer