Partitioned parquet file into an Azure Database

Iwan 65 Reputation points
2024-11-07T10:34:05.4833333+00:00

I inserted a parquet file into an Azure database and it had low throughput so I thought if I partitioned the file I could load the partitions in parallel.

I partitioned the file on DefaultRating using pyspark and tried to insert but I'm not getting the right settings as it's not copying at all anymore.

Below is one of the partition folders based on DefaultRating:

User's image

Below is the source dataset settings, a simple copy data activity with the parquet dataset including the snappy part files in each partition folder.

User's image

This returned a path not found error. When the filename was the Output_25_10_24.parquet nothing was written:

User's image

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,996 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.