How do I copy multiple arrays from an Append Blob into SQL using an Azure Data Factory pipeline?

Jacob Massengill 20 Reputation points
2025-02-12T20:14:49.5833333+00:00

Hello!

I am currently having trouble setting up an API data flow in Azure Data Factory. The pipeline I have created, among other things, uses an until loop to pull 1000 records from an API connection at a time and post them into blob storage. Then, once all records have been pulled, it attempts to copy them all at once into an Azure SQL DB.

User's image

For reference, the API response looks something like:

{
    "items": [
        {<item 1>},  
        {<item 2>},  
        ...  
        {<item n>}
    ],
    ...
}

Where each item is essentially a row in the resulting table in SQL. During the loop, the "items" arrays from each pull of 1000 records are appended into the same blob storage json file. I then attempt to read from that file to copy into the SQL DB.

The issue I'm having is that, because there are multiple separate arrays of items in the resulting json file, the copy activity in the pipeline ignores everything after the first 1000 records because it runs into the closing bracket of the first array. Which brings me to my question:

Is there any way to get these arrays combined into the same file in an effective way where they can be copied into my DB all at once? Perhaps by excluding the array brackets somehow, and adding them back in at the end around every record? Maybe some other method?

A couple of notes:

  1. The 1000 records at a time is an unfortunate limit of the API itself, so there is no way around that.
  2. I do have this process working where I just post the records into a Block Blob and copy to SQL within each iteration of the loop; however, the overhead for the copy data activity adds a lot of time to the execution of the pipeline.
Azure SQL Database
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,082 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,245 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 28,536 Reputation points
    2025-02-12T22:27:26.49+00:00

    Just continue appending the JSON arrays to the same blob. Ensure that each array is correctly formatted and separated.

    
    [{<item 1>}, {<item 2>}, ..., {<item 1000>}][{<item 1001>}, {<item 1002>}, ..., {<item 2000>}]
    
    

    Then in your data flow transformation, configure it to read from the append blob, then use a derived column to transform the JSON. For example, you can use an expression to remove the closing and opening brackets:

    
    replace(replace(replace(columnName, '][', ','), '[', ''), ']', '')
    
    

    And configure the sink to write the transformed JSON to the SQL database, then set up the Copy Data activity to read from the transformed JSON and write to the SQL database.

    For example :

    If your JSON in the blob looks like:

    
    [{<item 1>}, {<item 2>}, ..., {<item 1000>}][{<item 1001>}, {<item 1002>}, ..., {<item 2000>}]
    
    

    You can use the following expression in the Derived Column transformation to concatenate the arrays:

    
    replace(replace(replace(yourColumn, '][', ','), '[', ''), ']', '')
    
    

    This will transform the JSON into:

    
    {<item 1>}, {<item 2>}, ..., {<item 1000>}, {<item 1001>}, {<item 1002>}, ..., {<item 2000>}
    
    

    Then, you can wrap it back into a single array:

    
    [{<item 1>}, {<item 2>}, ..., {<item 1000>}, {<item 1001>}, {<item 1002>}, ..., {<item 2000>}]
    
    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.