I have pyspark script in Notebook to read and write data in ADLS Gen2. Below is an sample of the pyspark script. But in the Synapse I only have a linked service created with Service Principle could connect to the ADLS Gen2, so I need to specify in notebook to use that linked service to make the connection. But how could I do this in notebook? df.write.mode("overwrite").parquet("abfss://******@xxxxxxxxxxxx.dfs.core.windows.net/test")

Hello Shao Peng Sun, Welcome to the MS Q&A platform. You can use the storage_options parameter in your PySpark script to specify the linked service for reading and writing data in ADLS Gen2. You can modify your script to below to use the linked service: linked_service_name = 'your_linked_service_name' # read data df = spark.read.parquet( "abfss://******@bhargavasynapsegen2.dfs.core.windows.net/NYCTrip.parquet", storage_options={'linked_service': linked_service_name} ) df.show() # write spark.conf.set("fs.azure.account.auth.type.<your-storage-account-name>.dfs.core.windows.net", "OAuth") spark.conf.set("fs.azure.account.oauth.provider.type.<your-storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") spark.conf.set("fs.azure.account.oauth2.client.id.<your-storage-account-name>.dfs.core.windows.net", "<your-service-principal-client-id>") spark.conf.set("fs.azure.account.oauth2.client.secret.<your-storage-account-name>.dfs.core.windows.net", "<your-service-principal-client-secret>") spark.conf.set("fs.azure.account.oauth2.client.endpoint.<your-storage-account-name>.dfs.core.windows.net", "https://login.microsoftonline.com/<your-tenant-id>/oauth2/token") df.write.mode("overwrite").parquet("abfss://******@xxxxxxxxxxxx.dfs.core.windows.net/test") Please see the below screenshot for your reference. Using this, I was able to read my parquet file. I hope this help. If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.

So we still have to use the storage account name. Which means, from the point of view of being able to deploy environment-aware linked services and use them from environment-unaware code, it's pointless.

How to use linked service in Notebook with pyspark

Accepted answer

Bhargava-MSFT 31,201 Reputation points Microsoft Employee

2023-03-31T19:19:34.4633333+00:00
Hello Shao Peng Sun,

Welcome to the MS Q&A platform.

You can use the storage_options parameter in your PySpark script to specify the linked service for reading and writing data in ADLS Gen2. You can modify your script to below to use the linked service:

linked_service_name = 'your_linked_service_name' # read data df = spark.read.parquet( "abfss://******@bhargavasynapsegen2.dfs.core.windows.net/NYCTrip.parquet", storage_options={'linked_service': linked_service_name} ) df.show()

# write spark.conf.set("fs.azure.account.auth.type.<your-storage-account-name>.dfs.core.windows.net", "OAuth") spark.conf.set("fs.azure.account.oauth.provider.type.<your-storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") spark.conf.set("fs.azure.account.oauth2.client.id.<your-storage-account-name>.dfs.core.windows.net", "<your-service-principal-client-id>") spark.conf.set("fs.azure.account.oauth2.client.secret.<your-storage-account-name>.dfs.core.windows.net", "<your-service-principal-client-secret>") spark.conf.set("fs.azure.account.oauth2.client.endpoint.<your-storage-account-name>.dfs.core.windows.net", "https://login.microsoftonline.com/<your-tenant-id>/oauth2/token") df.write.mode("overwrite").parquet("abfss://******@xxxxxxxxxxxx.dfs.core.windows.net/test")

Please see the below screenshot for your reference.

Using this, I was able to read my parquet file.

I hope this help.

If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.
Please sign in to rate this answer.
Shao Peng Sun 91 Reputation points

2023-03-31T22:13:54.2733333+00:00

Thank you BhargavaGunnam for you help. But when I use storage__options to write dataframe, it seems that df.write.parquet cannot use this "storage__options"

And when I use spark.read.parquet to read with storage_option, I also got error like below, but the linked service does have write permission because if I created a data flow I could specify that linked service to write data there

Bhargava-MSFT 31,201 Reputation points Microsoft Employee

2023-04-03T16:34:24.0866667+00:00

Hello Shao Peng Sun,

Sorry about that. The storage_options parameter is available for reading data by not writing data in PySpark. You can use the hadoop_conf method to set the required configurations for the linked service. Please see my below example to write data.

# To write spark.conf.set("fs.azure.account.auth.type.<your-storage-account-name>.dfs.core.windows.net", "OAuth") spark.conf.set("fs.azure.account.oauth.provider.type.<your-storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") spark.conf.set("fs.azure.account.oauth2.client.id.<your-storage-account-name>.dfs.core.windows.net", "<your-service-principal-client-id>") spark.conf.set("fs.azure.account.oauth2.client.secret.<your-storage-account-name>.dfs.core.windows.net", "<your-service-principal-client-secret>") spark.conf.set("fs.azure.account.oauth2.client.endpoint.<your-storage-account-name>.dfs.core.windows.net", "https://login.microsoftonline.com/<your-tenant-id>/oauth2/token") # Modified write code df.write.mode("overwrite").parquet("abfss://******@xxxxxxxxxxxx.dfs.core.windows.net/test")

linked_service_name = 'LS_gen2_serviceprincipal' # read df = spark.read.parquet( "abfss://******@bhargavasynapsegen2.dfs.core.windows.net/NYCTrip.parquet", storage_options={'linked_service': linked_service_name} ) df.show()

I have tested the code from my end. Please see the below screenshot for your reference.

I hope this helps. Please let me know if you have any further questions.

Bhargava-MSFT 31,201 Reputation points Microsoft Employee

2023-04-06T19:32:51.9833333+00:00

Hello Shao Peng Sun,
I am checking to see if you got a chance to look into my earlier response. Please let me know if you have any further questions.

Shao Peng Sun 91 Reputation points

2023-04-07T09:42:34.41+00:00

This finally works. Thank you so much @Bhargava-MSFT !

Bhargava-MSFT 31,201 Reputation points Microsoft Employee

2023-04-07T17:42:36.17+00:00

Hello Shao Peng Sun, Glad to know the answer was helpful. Thank you for accepting the answer.

Renato de Melo 125 Reputation points

2023-04-11T11:40:22.39+00:00

HI @Bhargava-MSFT ,

Thank you for your reply, but the solution you provided requires hard-coding the credential into PySpark code or storing credential in a Key Vault and applying a tool to retrieve the credential from Key Vault.

We found a better way enabling the option to run the notebook as "managed identity" and adding the following code to our script. It applies the linked service to Spark without having to worry with credentials:

source_full_storage_account_name = "<storage-account-name>.dfs.core.windows.net" spark.conf.set(f"spark.storage.synapse.{source_full_storage_account_name}.linkedServiceName", "<linked-service-name>") spark.conf.set(f"fs.azure.account.oauth.provider.type.{source_full_storage_account_name}", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider")

Bhargava-MSFT 31,201 Reputation points Microsoft Employee

2023-04-11T16:53:47.6633333+00:00

Thank you for sharing this, Renato de Melo Much appreciated!
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

How to use linked service in Notebook with pyspark

1 additional answer

Your answer