Hello Shao Peng Sun,
Welcome to the MS Q&A platform.
You can use the storage_options parameter in your PySpark script to specify the linked service for reading and writing data in ADLS Gen2. You can modify your script to below to use the linked service:
linked_service_name = 'your_linked_service_name'
# read data
df = spark.read.parquet(
"abfss://******@bhargavasynapsegen2.dfs.core.windows.net/NYCTrip.parquet",
storage_options={'linked_service': linked_service_name}
)
df.show()
# write
spark.conf.set("fs.azure.account.auth.type.<your-storage-account-name>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<your-storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<your-storage-account-name>.dfs.core.windows.net", "<your-service-principal-client-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<your-storage-account-name>.dfs.core.windows.net", "<your-service-principal-client-secret>")
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<your-storage-account-name>.dfs.core.windows.net", "https://login.microsoftonline.com/<your-tenant-id>/oauth2/token")
df.write.mode("overwrite").parquet("abfss://******@xxxxxxxxxxxx.dfs.core.windows.net/test")
Please see the below screenshot for your reference.
Using this, I was able to read my parquet file.
I hope this help.
If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.