How to Automate Large .SAV File to Parquet Conversion in Azure?

Akshay Patel 70 Reputation points
2024-12-27T10:31:15.69+00:00

I'm using Azure ADLS as our primary storage, Azure Data Factory (ADF) for data transformations, and Power BI for reporting and visualization.

I have a large .SAV file (200-300 MB, containing 2-4 million rows) stored in Azure Data Lake Storage (ADLS). To load the data into a SQL table, I need to first convert the .SAV file into a Parquet file, as Azure Data Factory (ADF) cannot directly process .SAV files.

I previously attempted to use an Azure Function for this conversion, but encountered a limitation where execution times out after 10 minutes, which is insufficient for processing files of this size.

I'm looking for an optimized and scalable solution to automate this conversion process within the Azure ecosystem.

Key Considerations:

  1. The solution must handle large files efficiently.
  2. It should be compatible with Azure services and integrate seamlessly into a data pipeline.
  3. Preferably avoid time-out or size limitations like those in Azure Functions.

Any guidance on how to approach this or examples of similar implementations would be highly appreciated.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,521 questions
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,303 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,100 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.