How to write a CSV file to a Folder in Azure Data Lake Gen2 Container from a Notebook in Synapse workspace

Question

I have a pipeline of Notebooks in Azure synapse workspace. From one Notebook, I need to save some intermediate transformation results as a CSV to a specific folder in Azure Data Lake Gen2 Container. Can some one guide me how to do that?

Accepted Answer

You should be able to implement this by using Spark in your Synapse Notebook to write the intermediate transformation results as a CSV file to Azure Data Lake Gen2 (ADLS Gen2)

1. Set up the storage account configuration First, ensure that your Synapse workspace has access to the ADLS Gen2 container using Linked Service or Account Key / SAS Token / Managed Identity.

2. Use the following code in the Synapse notebook If you're using Apache Spark (PySpark), you can write your DataFrame (df) as a CSV file.

from pyspark.sql import SparkSession

# Define your Storage Account Name and Container
storage_account_name = "yourstorageaccount"
container_name = "yourcontainer"
folder_path = "intermediate-results/"  # Folder where CSV will be stored

# Mount ADLS Gen2 using abfss:// (Azure Blob File System)
adls_path = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/{folder_path}"

# Writing DataFrame as CSV
df.write.mode("overwrite").option("header", "true").csv(adls_path)

print(f"Data saved to {adls_path}")

3. Authentication methods Ensure that you have access to the storage account via one of the following:

Managed Identity (recommended)
- Grant Storage Blob Data Contributor role to the Synapse Managed Identity.
- No need to specify credentials in the notebook.

Account Key (If not using Managed Identity)

  spark.conf.set(
      f"fs.azure.account.key.{storage_account_name}.dfs.core.windows.net",
      "your-storage-account-key"
  )

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

Answer

Hi,

easy stuff! Just use pyspark in your Synapse Notebook.

df.write.format("csv").option("header", "true").save("abfss://@.dfs.core.windows.net//")

yours synapse workspace is linked to the storage with proper permissions (otherwise, you'll feel like an intern debugging permissions at 3am :)

The abfss:// url is correct (typos here are like semicolons in JavaScript—pure evil).

If you’re dealing with large files, consider partitioning (df.repartition(1)) or using .coalesce(1), but be careful too much coalescing can make Spark cry ))))))

take care, alex

Answer

....... sorry that could be del just try to repost my answer...

Share via

How to write a CSV file to a Folder in Azure Data Lake Gen2 Container from a Notebook in Synapse workspace

2 additional answers

Your answer