How do I split a large file into multiple files with specific number of rows in each file?

Balachandran Kannan 181 Reputation points
2021-03-22T17:45:22.733+00:00

I have a large source file that I want to split, with each file having 10K rows. Data flow allows me to split into set number of partitions. But there is a problem. I don't want to split source file if it has only 10K or less rows. Anything above 10K should be split into multiple 10K chunks.

As an example

12K rows => produces 2 files - 1 with 10K another with 2K
20K rows => produces 2 files - each with 10K
9K rows => produces 1 file
20.1K rows => produces 3 files - two 10k files and 1 with remaining rows and so on

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,047 questions
{count} vote

1 answer

Sort by: Most helpful
  1. MarkKromer-MSFT 5,216 Reputation points Microsoft Employee
    2021-03-22T20:47:11.843+00:00

    Use the techniques in this blog post below to create your formula for dynamically sizing the size of partition:

    https://kromerbigdata.com/2021/03/04/dynamic-data-flow-partitions-in-adf-synapse/

    In my example, I used a hardcoded value for the target file size. But you can use case() or iif() to apply your rule as described above in the size expression.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.