Query on connecting SQL server from Databricks

BrianC 100 Reputation points
2025-03-03T08:13:36.2866667+00:00

Hi all,

We are studying on the methods to read/write data in SQL server from Databricks.

I recognized that there are at least two different way to do it. One is the spark connector for SQL server https://learn.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver16 and another one is JDBC https://docs.databricks.com/aws/en/connect/external-systems/jdbc

The code we are currently using for writing data to SQL server is as follow:

df.write \

.format("jdbc") \

.option("url", url) \

.option("dbtable", TableName) \

.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \

.option("truncate","true") \

.mode("overwrite") \

.save()

I am not sure which method (Spark connector or JDBC) we are currently using. Could someone clarify for us? Thanks in advance.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,355 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 29,481 Reputation points
    2025-03-03T12:39:10.1566667+00:00

    The .format("jdbc") specifies that you are using the JDBC method to connect to the database.

    • url: This is the JDBC connection string for your SQL Server.
    • dbtable: This specifies the table in SQL Server where you want to write the data.
    • driver: This specifies the JDBC driver class for SQL Server (com.microsoft.sqlserver.jdbc.SQLServerDriver).
    • truncate: This option is specific to JDBC and indicates whether to truncate the table before writing new data.
    • mode("overwrite"): This specifies that the table should be overwritten with the new data.

    The Spark connector for SQL Server is a different approach. It is optimized for working with SQL Server and provides better performance and additional features compared to the generic JDBC connector. To use the Spark connector, you would typically use the .format("com.microsoft.sqlserver.jdbc.spark") and configure it with specific options provided by the connector.

    
    df.write \
    
        .format("com.microsoft.sqlserver.jdbc.spark") \
    
        .option("url", url) \
    
        .option("dbtable", TableName) \
    
        .option("truncate","true") \
    
        .mode("overwrite") \
    
        .save()
    

    Which one to use ? :

    • If you are working exclusively with SQL Server and need better performance, consider switching to the Spark connector.
    • If you need a generic solution that works with multiple databases, or if you are already satisfied with the performance of the JDBC method, you can continue using the JDBC approach.
    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.