Optimize Nested Loop Processing in ADF Pipeline for Dynamic JSON Transformation

Naresh Arvapalli 0 Reputation points
2025-01-24T16:27:16.6033333+00:00

I have two JSON responses from Lookup activities in my Azure Data Factory pipeline, as shown below:

Lookup Activity 1 Response:

[
	{
	  "cra5a_columnname": "SALES_CHANNEL",
	  "cra5a_datatype": "OptionSet",
	  "cra5a_name": null,
	  "cra5a_targettable": null,
	  "cra5a_textlength": null
	},
	{
	  "cra5a_columnname": "ORG",
	  "cra5a_datatype": "OptionSet",
	  "cra5a_name": null,
	  "cra5a_targettable": null,
	  "cra5a_textlength": null
	}
]

Lookup Activity 2 Response:

[
	{
	  "variableName": "test",
	  "value": {
		"NAME": "AUBF2",
		"ORG": "Australia",
		"SALES_CHANNEL": "Field Sales Executives",
		"RESPONSIBLEUSER": null,
		"RESPONSIBLETEAM": null
	  }
	}
]

Currently, I am using two nested ForEach loops to process these JSON responses:

  1. The first loop iterates over Lookup Activity 1 response (cra5a_columnname) and passes the column name to a child pipeline.
  2. The second loop iterates over the Lookup Activity 2 response, retrieves the key-value pair corresponding to the column name, and applies some transformation logic to convert string values into integers.

The output after transformation looks like this:

[
	{
	  "variableName": "test",
	  "value": {
		"NAME": "AUBF2",
		"ORG": 4567,
		"SALES_CHANNEL": 1234,
		"RESPONSIBLEUSER": null,
		"RESPONSIBLETEAM": null
	  }
	}
]

However, this approach is impacting performance due to:

  • ~15 records in Lookup Activity 1 response.
  • ~200+ records in Lookup Activity 2 response.

Is there a way to:

  1. Extract all cra5a_columnname values from Lookup Activity 1 into an array without using the first loop?
  2. Use this array in a single loop or a dynamic method to transform the Lookup Activity 2 response in a performant manner?

Alternatively, can this be achieved using a Data Flow in ADF? The solution needs to be dynamic since:

  • The structure of Lookup Activity 1 response is consistent.
  • The keys and values in Lookup Activity 2 response can vary for each parent pipeline iteration.

I’d appreciate any suggestions for optimizing this process. Thank you!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,165 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 16,446 Reputation points
    2025-01-25T02:06:46.5833333+00:00

    Hello Naresh Arvapalli,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you would like to Optimize Nested Loop Processing in ADF Pipeline for Dynamic JSON Transformation.

    To solve the problem dynamically and optimize performance:

    Step 1: Extract cra5a_columnname into an Array

    Use a Lookup activity and a subsequent Set Variable activity to extract only the cra5a_columnname values. Use the following expression:

    @array(activity('LookupActivity1').output.value[*].cra5a_columnname)
    

    Step 2: Single Loop for Dynamic Transformation

    Use a single ForEach loop to iterate over the Lookup Activity 2 response. Within the loop, use an If Condition activity to check if the key exists in the extracted array. Apply the transformation logic dynamically. For an example pipeline configuration: ForEach Activity to Iterate over Lookup Activity 2 response.

      @activity('LookupActivity2').output.value
    

    If Condition Activity, then check if the key exists in the column names array.

      @contains(variables('ColumnNames'), item().key)
    

    Execute Pipeline: Dynamically transform the value.

    Pass item().key and item().value as parameters to a child pipeline for processing.

    Step 3: This is an alternative Data Flow Solution

    To optimize performance for large datasets, use a Data Flow in ADF:

    1. Source: Read Lookup Activity 2 response.
    2. Derived Column: Add a new column with the transformation logic:
    • Use an exists() function to check if the key matches any cra5a_columnname.
    • Apply transformations only if the key exists.
    1. Sink: Write the transformed data to the desired location. For an example of a dynamic mapping expression:
    if(exists(cra5a_columnname_array, currentColumnName), toInteger(currentColumnValue), currentColumnValue)
    

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.