Issue with JSON to CSV Conversion in ADF Pipeline

Question

To ask your question effectively on a forum, ensure your query is clear, concise, and provides all the necessary details. Here's a template you can use:

I am working on an Azure Data Factory (ADF) pipeline where I need to convert multiple JSON files stored in Azure Blob Storage into CSV format. Below is a summary of my setup and the issue I am facing:

Pipeline Setup:

GetMetadata Activity: Fetches all JSON files from the input folder using childItems.
ForEach Activity: Iterates through the list of JSON files.
Copy Activity: Converts each JSON file into a CSV file and stores it in the output folder.

Problem:

The pipeline executes successfully, but the output CSV files are Missing values from the JSON files only header are showing.

Relevant Details:

The JSON file structure is as follows:

[
    {"Name": "Aanya", "Age": "11.0", "Gender": "Female", "Weight_kg": "35.0", "Height_m": "1.48", "BMI": "15.98"},
    {"Name": "Ishita", "Age": "15.0", "Gender": "Female", "Weight_kg": "42.06", "Height_m": "1.61", "BMI": "16.23"}
]

The expected CSV output should look like:

Name, Age, Gender, Weight_kg, Height_m, BMI
Aanya, 11.0, Female, 35.0, 1.48, 15.98
Ishita, 15.0, Female, 42.06, 1.61, 16.23

ADF Sink Path Configuration:

@concat('Output/', replace(item().name, '.json', ''), '/', replace(item().name, '.json', '.csv'))

What I Have Tried:

Verified the output of the GetMetadata activity to ensure it fetches the correct file names.
Checked the ForEach loop to confirm it iterates over all files.
Inspected the JSON source and sink configurations to ensure compatibility.

Request:

Can anyone help me identify what might be causing the missing values in the output CSV? Are there specific configurations I might be overlooking in ADF?

Answer

Hi ,

Thanks for reaching out to Microsoft Q&A.

The issue of missing values in your output CSV file likely stems from how the JSON source is being parsed or mapped to the CSV sink in ADF. Here are some steps and considerations to help you resolve the problem:

Verify JSON File Structure

ADF requires that the JSON files have a consistent structure. If the JSON files do not follow a strict array format, ADF may fail to parse them correctly. Your JSON file example seems correct ([{}, {}]), so this may not be an issue unless some files deviate from this structure.

Inspect the Mapping in Copy Activity

The default mapping might not align JSON properties with CSV headers. Explicitly define the mapping in the Copy Activity:
1. Go to the Mapping tab of the Copy Activity.
  1. Select Import Schema for both the source and sink datasets.
    1. Map each JSON property to the corresponding CSV column.

Set the Correct JSON File Format

Ensure your JSON dataset settings correctly specify that the file is an array of objects:
- File format: Choose JSON and set File Pattern to Array of documents if it's a JSON array.
  - If your JSON files are not a perfect array, you might need to preprocess them using Azure Functions or Data Flows.

Adjust Sink Dataset Settings

Check the sink dataset configuration:
Ensure the File Format is set to Delimited Text (CSV).
- Define a Column Delimiter (ex, comma ,).
- Set the First Row as Header option to true.

Debugging and Validation

Use Debug mode to inspect the output after each step.
Add a Data Preview step (if you are using a Data Flow) to verify that the data is correctly transformed before writing it to the sink.

Check ADF Data Flow for Complex Scenarios

If your JSON file structure is more complex, consider using a Data Flow for the transformation:
1. Add a Source Transformation for the JSON dataset.
  1. Use a Flatten Transformation to expand nested arrays or objects.
    1. Add a Sink Transformation to output the data as a CSV file.
      1. Verify column mapping explicitly in the Sink.

Modify Output Path Configuration

Your output path configuration seems fine. However, double-check that it doesn’t inadvertently overwrite files or create an incorrect directory structure. Ensure item().name references the correct file name.

Test with Sample Files

Run the pipeline with a single JSON file and inspect the result. This will help you isolate any issues specific to the file's format or pipeline logic.

Key Configuration Checklist

Setting	Expected Value
JSON Source Dataset	File format: JSON, File pattern: Array of documents
JSON Source Dataset	File format: JSON, File pattern: Array of documents
CSV Sink Dataset	File format: Delimited text, Column delimiter: `,`, First row as header: `True`
Copy Activity Mapping	Explicit mapping of JSON properties to CSV columns
Output Path	Ensure no overwrite or incorrect structure

By carefully inspecting these areas, you should be able to resolve the issue of missing values in the CSV output. If the problem persists, share specific pipeline configurations or error messages for more targeted assistance.

Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.

Share via

Issue with JSON to CSV Conversion in ADF Pipeline

1 answer

Your answer