Get data from Azure SQL Database Managed Instance(Linked Service) using Synapse notebook spark pool

Aditya Singh 160 Reputation points
2024-03-07T13:32:46.88+00:00

I have a linked service from Azure SQL Database Managed Instance and I want to get the data via Synapse spark pool

In simple words I want to connect to the Azure SQL Database Managed Instance(Linked Service) and get the data via Synapse Notebook spark pool

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,066 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,271 questions
SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
14,159 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,010 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 12,320 Reputation points Microsoft Vendor
    2024-03-08T07:14:48.2966667+00:00

    @Aditya Singh

    Thanks for reaching out to Microsoft Q&A.

    Create a Linked Service for Azure SQL Database Managed Instance:

    In your Synapse workspace, set up a linked service that points to your Azure SQL Database Managed Instance. This linked service will allow you to establish a connection between Synapse and your SQL Database.

    Create a Spark Pool:

    • Now, create a Spark pool within your Synapse workspace. A Spark pool defines the compute resource requirements for Spark instances.
    • You can configure properties such as the number of nodes, node size, and scaling behavior.
    • Choose from various node sizes, ranging from small to XXLarge, based on your workload needs.
    • If you require a high degree of isolation, consider the Isolated Compute option, available with the XXXLarge node size.

    Attach the Spark Pool to Your Synapse Notebook:

    • Open your Synapse Notebook.
    • Navigate to the Compute tab.
    • Select Attached computes and choose Synapse Spark pool.
    • This step associates your notebook with the Spark pool, enabling you to execute Spark jobs against your data sources, including the Azure SQL Database Managed Instance.

    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-synapse-spark-pool?view=azureml-api-2&tabs=studio

    Write Spark Code in Your Notebook:

    1. Import the SQL Server JDBC Driver:
      Synapse Spark pools don't have the SQL Server JDBC driver pre-installed. You'll need to import it into your notebook. You can upload the driver JAR file to your Synapse workspace and reference it in your code.
    2. Create Spark DataFrame:
      Use the spark.read.jdbc function to create a Spark DataFrame from your SQL Database Managed Instance. Here's an example:
      Python
      from pyspark.sql import SparkSession
    # Replace with your linked service name
    linked_service_name = "your_linked_service_name"
    # Retrieve connection string from linked service using Synapse library
    access_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)
    # Build the JDBC URL
    jdbc_url = f"jdbc:sqlserver://<server_name>.database.windows.net:<port>;databaseName=<database_name>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30"
    # Create Spark DataFrame
    df = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", jdbc_url) \
        .option("databaseName", "<database_name>") \
        .option("accessToken", access_token) \
        .option("dbtable", "<table>") \
        .load()
    

    Replace the placeholders with:

    <server_name>: Your Azure SQL Database Managed Instance server name.

    <port>: The port number of your Managed Instance (usually 1433).

    <database_name>: The name of the database you want to access.

    <table>: The name of the table you want to read data from

    Process the DataFrame:
    Use Spark functionalities to process the DataFrame (df) as needed. You can filter, transform, or join data before further analysis.

    View or Persist Data:

    • Use display(df) to view the DataFrame contents in your notebook.
    • Use other Spark functions like write to save the data to different storage options (e.g., Data Lake Storage).

    Important Note:

    Currently, Synapse Spark pools don't directly support connecting to Azure SQL Database Managed Instances using linked services. The workaround is to use the connection string retrieved from the linked service and build the JDBC URL manually.

    For more details and alternative approaches, refer to these resources:

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.