How to write a CSV file to a Folder in Azure Data Lake Gen2 Container from a Notebook in Synapse workspace

Chitti, Srinivasa 40 Reputation points
2025-02-20T12:44:33.44+00:00

I have a pipeline of Notebooks in Azure synapse workspace. From one Notebook, I need to save some intermediate transformation results as a CSV to a specific folder in Azure Data Lake Gen2 Container. Can some one guide me how to do that?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,200 questions
{count} votes

Accepted answer
  1. Marcin Policht 36,360 Reputation points MVP
    2025-02-20T12:59:29.3833333+00:00

    You should be able to implement this by using Spark in your Synapse Notebook to write the intermediate transformation results as a CSV file to Azure Data Lake Gen2 (ADLS Gen2)

    1. Set up the storage account configuration First, ensure that your Synapse workspace has access to the ADLS Gen2 container using Linked Service or Account Key / SAS Token / Managed Identity.

    2. Use the following code in the Synapse notebook If you're using Apache Spark (PySpark), you can write your DataFrame (df) as a CSV file.

    from pyspark.sql import SparkSession
    
    # Define your Storage Account Name and Container
    storage_account_name = "yourstorageaccount"
    container_name = "yourcontainer"
    folder_path = "intermediate-results/"  # Folder where CSV will be stored
    
    # Mount ADLS Gen2 using abfss:// (Azure Blob File System)
    adls_path = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/{folder_path}"
    
    # Writing DataFrame as CSV
    df.write.mode("overwrite").option("header", "true").csv(adls_path)
    
    print(f"Data saved to {adls_path}")
    

    3. Authentication methods Ensure that you have access to the storage account via one of the following:

    • Managed Identity (recommended)
      • Grant Storage Blob Data Contributor role to the Synapse Managed Identity.
      • No need to specify credentials in the notebook.
    • Account Key (If not using Managed Identity)
        spark.conf.set(
            f"fs.azure.account.key.{storage_account_name}.dfs.core.windows.net",
            "your-storage-account-key"
        )
      

    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Alex Burlachenko 1,190 Reputation points
    2025-02-20T13:23:57.54+00:00

    Hi,

    easy stuff! Just use pyspark in your Synapse Notebook.

    df.write.format("csv").option("header", "true").save("abfss://<container>@<storage_account>.dfs.core.windows.net/<folder>/")
    

    yours synapse workspace is linked to the storage with proper permissions (otherwise, you'll feel like an intern debugging permissions at 3am :)

    The abfss:// url is correct (typos here are like semicolons in JavaScript—pure evil).

    If you’re dealing with large files, consider partitioning (df.repartition(1)) or using .coalesce(1), but be careful too much coalescing can make Spark cry ))))))

    take care, alex

    0 comments No comments

  2. Alex Burlachenko 1,190 Reputation points
    2025-02-20T13:32:47.37+00:00

    ....... sorry that could be del just try to repost my answer...

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.