ADF Copy Activity for JSON filesfails to Locate Blob Despite Successful Metadata Retrieval

Denktas, Gökalp Yigit 20

I’m currently working on an Azure Data Factory (ADF) pipeline that consolidates files from multiple source containers into a single target container. The goal is to gather one file type (lod2) from various resource-based subfolders and versioned folders in source before storing them all under a consolidated lod2 folder in target.

Below is a redacted view of my Azure Storage structure:

Storage Account:

J N S S Kasyap 80 Reputation points Microsoft Vendor

2025-02-18T05:23:44.44+00:00
Hi @Denktas, Gökalp Yigit
Thank you for reaching out and posting the query.
Azure Data Factory (ADF) pipeline is able to successfully retrieve metadata, but the Copy Activity is failing to locate and copy the blob to the target container.
To fix this issue:

Ensure that the source dataset's file or folder path uses the correct format, as incorrect use of wildcards or discrepancies between metadata retrieval and the actual file path can cause issues. Additionally, verify that the correct file type is specified in the dataset, especially when copying JSON files.

Sometimes, ADF fails to locate blobs if the linked service configurations are incorrect. Validate your linked service by ensuring the connection string points to the correct storage account and the authentication method (account key, managed identity, or SAS) is properly configured.

Ensure ADF has the necessary IAM roles in Blob Storage You can verify this through the Access Control (IAM) settings in Azure.

Ensure the firewall is configured to allow access from the ADF integration runtime (IR). If using a self-hosted IR, verify the firewall settings to allow connections from the appropriate IP range. If using Azure IR, ensure there are no restrictions on the storage account.

Check the activity logs in ADF for error messages. Look for specific codes or messages that indicate access issues or path resolution failures.

Check the ADF activity logs for error messages or specific codes that can indicate access issues, path resolution failures, or other underlying problems preventing the Copy Activity from working.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Thank you.
Denktas, Gökalp Yigit 20 Reputation points

2025-02-18T12:05:53.4366667+00:00
Hi @J N S S Kasyap ,

Thanks for your response. I wanted to provide some additional context regarding the issue I’m facing with the ADF pipeline. My pipeline is able to successfully retrieve metadata using the Get Metadata activity, but the Copy Data activity fails with an error indicating that the blob is missing. Here’s a quick summary of what I’ve tried and the configuration details:

Pipeline Details

ListJSONs Activity (Get Metadata):

Dataset: lod2_json

Parameters:

region: @trim(replace(item(), '"', ''))

subfolder: "lod2_gebaeude"

Result: Successfully retrieves a list of JSON files from the expected folder.

Debug_resolvedPath Activity (Set Variable):

Variable: resolvedPath (String)

Value: @concat('/', item(), '/', 'lod2', '/')

Output:
/de_xx_xxxx(dynamically changing)/lod2_gebaeude/

Note: This confirms that the dynamic folder name is being parsed correctly.

CopyData_Lod2 Activity (Copy Data):

Dataset: lod2_json

Parameters:

region: @trim(replace(item(), '"', ''))

subfolder: "lod2_gebaeude"

File Path: Configured via the dataset as /@{dataset().region}/@{dataset().subfolder}/ with the file name set to *.json.

Error:
ErrorCode=UserErrorSourceBlobNotExist,... path:de_xx_xxxx(dynamically changing)/lod2_gebaeude/*.json. ```![image](/api/attachments/31c5f52a-a9e0-461b-8672-dccb4cfdafb9?platform=QnA)

What I’ve Noticed

The metadata activity correctly lists files, so the dynamic parameter (region) is working as intended.

The resolved path logged (/de_xx_xxxx(dynamically changing)/lod2_gebaeude/) appears correct.

The error in the Copy Data activity shows it’s attempting to access a blob literally named *.json (i.e., it is not expanding the wildcard).

Important Clarifications

No Firewall/IR Issues: I don’t have any firewall or ADF Integration Runtime (IR) issues because all operations are performed within the same resource group and the same blob storage account (albeit across two different containers). Thus, configuration or network restrictions can be ruled out.

Activity Logs: The ADF activity logs don’t provide detailed errors beyond the “blob is missing” message referenced above.

Pipeline Structure: I need to avoid looping through each file individually due to per-file connection costs. Changing the approach (i.e., using a different wildcard activity) isn’t an option, as it would break other parts of the pipeline and the extraction logic for the dynamic de_ folder names.

My Request

Given the above, could you please advise on the following?

Wildcard Handling: How can I configure the Copy Data activity so that the *.json in the filename is treated as a wildcard filter rather than a literal file name, while still using the “File path in dataset” option?

Maintaining Current Pipeline Logic: I need to preserve the current structure that relies on dynamically extracting the region (the de_ folder names) and using the wildcard for file selection, without resorting to per-file looping.

I appreciate your guidance on this matter and look forward to any suggestions that can help resolve the 404 “blob is missing” error in the Copy Data activity.

Thank you for your time.

Regards,

Gökalp
Denktas, Gökalp Yigit 20 Reputation points

2025-02-18T12:08:44.96+00:00

Here is an image of the dataset configuration. This configuration works with various set/append variable, get metadata and wait/until kind of activity except the copy data. I reall appriciate your guidence.
J N S S Kasyap 80 Reputation points Microsoft Vendor

2025-02-20T01:30:04.7233333+00:00

@Denktas, Gökalp Yigit
Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Denktas, Gökalp Yigit 20 Reputation points

2025-02-20T13:36:42.29+00:00

I actually sent a comment but I think it got lost, this solved the issue, thanks! Also another tip for upcoming users: nested loops using child pipelines actually makes the process so much easier to manage.

Accepted answer

J N S S Kasyap 80 Reputation points Microsoft Vendor

2025-02-19T07:52:43.5766667+00:00

@Denktas, Gökalp Yigit

Wildcard Handling:

This usually happens when the file path or dataset isn’t set up to handle the wildcard correctly. You need to make sure the file path in the dataset is configured properly to allow ADF to expand the wildcard and match all the .json files.
To treat *.json as a wildcard filter rather than a literal file name, Azure Data Factory requires that the dataset and Copy Data activity are configured to use dynamic expressions for both the folder path and the file pattern. Here's the approach.
To pass the file name dynamically in Azure Data Factory:
1.Modify the Dataset: Add a parameter called fileName to your lod2_json dataset and set the file path to @{dataset().fileName}.
2.Configure Copy Data Activity: In the Copy Data activity, pass *.json as the value for the fileName parameter.
3.Set File Path: Ensure the file path in the dataset uses the @{dataset().fileName} setup, which will allow the activity to match all .json files in the specified path.

Maintaining Current Pipeline Logic:

You can preserve the current pipeline structure by keeping the dynamic folder path and wildcard selection in place, while ensuring that metadata retrieval correctly identifies the files, and the Copy Data activity can still process all files matching the wildcard without looping over each file individually.
I hope this information helps. Please do let us know if you have any further queries.
Please sign in to rate this answer.

1 person found this answer helpful.
Denktas, Gökalp Yigit 20 Reputation points

2025-02-19T12:11:37.7333333+00:00

Thanks! This actually worked well. To circumwent previous pipeline logic, I have created duplicate datasets for both source and sink, with inputName and outputName parameters. Then I have created a nested loop executing a child pipeline inside a for each activity, to go through each file in each folder so that i keep the original file names. This option slows down the execution since ADF does not allow paralel child pipeline executions, however end result is way cleaner.

For anyone wondering in the future, clearly set @{dataset().fileName} in the filename section in source and sink both. then, in the copy data activity, make sure to write *.json (or any file format yo uare using) this way you capture all the files for copy activity. At least in the existing version of ADF.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

ADF Copy Activity for JSON filesfails to Locate Blob Despite Successful Metadata Retrieval

Pipeline Details

What I’ve Noticed

Important Clarifications

My Request

0 additional answers

Your answer