Issue with JSON to CSV Conversion in ADF Pipeline

lankesh Bisen 0 Reputation points
2024-11-23T17:11:41.56+00:00

To ask your question effectively on a forum, ensure your query is clear, concise, and provides all the necessary details. Here's a template you can use:


I am working on an Azure Data Factory (ADF) pipeline where I need to convert multiple JSON files stored in Azure Blob Storage into CSV format. Below is a summary of my setup and the issue I am facing:

Pipeline Setup:

  1. GetMetadata Activity: Fetches all JSON files from the input folder using childItems.
  2. ForEach Activity: Iterates through the list of JSON files.
  3. Copy Activity: Converts each JSON file into a CSV file and stores it in the output folder.

Problem:

  • The pipeline executes successfully, but the output CSV files are Missing values from the JSON files only header are showing.

Relevant Details:

The JSON file structure is as follows:

[
    {"Name": "Aanya", "Age": "11.0", "Gender": "Female", "Weight_kg": "35.0", "Height_m": "1.48", "BMI": "15.98"},
    {"Name": "Ishita", "Age": "15.0", "Gender": "Female", "Weight_kg": "42.06", "Height_m": "1.61", "BMI": "16.23"}
]

The expected CSV output should look like:

Name, Age, Gender, Weight_kg, Height_m, BMI
Aanya, 11.0, Female, 35.0, 1.48, 15.98
Ishita, 15.0, Female, 42.06, 1.61, 16.23

ADF Sink Path Configuration:

@concat('Output/', replace(item().name, '.json', ''), '/', replace(item().name, '.json', '.csv'))

What I Have Tried:

  1. Verified the output of the GetMetadata activity to ensure it fetches the correct file names.
  2. Checked the ForEach loop to confirm it iterates over all files.
  3. Inspected the JSON source and sink configurations to ensure compatibility.

Request:

Can anyone help me identify what might be causing the missing values in the output CSV? Are there specific configurations I might be overlooking in ADF?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,924 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 24,091 Reputation points MVP
    2024-11-24T06:48:43.06+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    The issue of missing values in your output CSV file likely stems from how the JSON source is being parsed or mapped to the CSV sink in ADF. Here are some steps and considerations to help you resolve the problem:

    1. Verify JSON File Structure
    • ADF requires that the JSON files have a consistent structure. If the JSON files do not follow a strict array format, ADF may fail to parse them correctly. Your JSON file example seems correct ([{}, {}]), so this may not be an issue unless some files deviate from this structure.
    1. Inspect the Mapping in Copy Activity
    • The default mapping might not align JSON properties with CSV headers. Explicitly define the mapping in the Copy Activity:
      1. Go to the Mapping tab of the Copy Activity.
        1. Select Import Schema for both the source and sink datasets.
          1. Map each JSON property to the corresponding CSV column.
    1. Set the Correct JSON File Format
    • Ensure your JSON dataset settings correctly specify that the file is an array of objects:
      • File format: Choose JSON and set File Pattern to Array of documents if it's a JSON array.
        • If your JSON files are not a perfect array, you might need to preprocess them using Azure Functions or Data Flows.
    1. Adjust Sink Dataset Settings
    • Check the sink dataset configuration:
    • Ensure the File Format is set to Delimited Text (CSV).
      • Define a Column Delimiter (ex, comma ,).
      • Set the First Row as Header option to true.
    1. Debugging and Validation
    • Use Debug mode to inspect the output after each step.
    • Add a Data Preview step (if you are using a Data Flow) to verify that the data is correctly transformed before writing it to the sink.
    1. Check ADF Data Flow for Complex Scenarios
    • If your JSON file structure is more complex, consider using a Data Flow for the transformation:
      1. Add a Source Transformation for the JSON dataset.
        1. Use a Flatten Transformation to expand nested arrays or objects.
          1. Add a Sink Transformation to output the data as a CSV file.
            1. Verify column mapping explicitly in the Sink.
    1. Modify Output Path Configuration
    • Your output path configuration seems fine. However, double-check that it doesn’t inadvertently overwrite files or create an incorrect directory structure. Ensure item().name references the correct file name.
    1. Test with Sample Files
    • Run the pipeline with a single JSON file and inspect the result. This will help you isolate any issues specific to the file's format or pipeline logic.

    Key Configuration Checklist

    Setting Expected Value
    JSON Source Dataset File format: JSON, File pattern: Array of documents
    JSON Source Dataset File format: JSON, File pattern: Array of documents
    CSV Sink Dataset File format: Delimited text, Column delimiter: ,, First row as header: True
    Copy Activity Mapping Explicit mapping of JSON properties to CSV columns
    Output Path Ensure no overwrite or incorrect structure

    By carefully inspecting these areas, you should be able to resolve the issue of missing values in the CSV output. If the problem persists, share specific pipeline configurations or error messages for more targeted assistance.

    Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.