Query on connecting SQL server from Databricks

Question

Hi all,

We are studying on the methods to read/write data in SQL server from Databricks.

I recognized that there are at least two different way to do it. One is the spark connector for SQL server https://learn.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver16 and another one is JDBC https://docs.databricks.com/aws/en/connect/external-systems/jdbc

The code we are currently using for writing data to SQL server is as follow:

df.write \

.format("jdbc") \

.option("url", url) \

.option("dbtable", TableName) \

.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \

.option("truncate","true") \

.mode("overwrite") \

.save()

I am not sure which method (Spark connector or JDBC) we are currently using. Could someone clarify for us? Thanks in advance.

Accepted Answer

The .format("jdbc") specifies that you are using the JDBC method to connect to the database.

url: This is the JDBC connection string for your SQL Server.
dbtable: This specifies the table in SQL Server where you want to write the data.
driver: This specifies the JDBC driver class for SQL Server (com.microsoft.sqlserver.jdbc.SQLServerDriver).
truncate: This option is specific to JDBC and indicates whether to truncate the table before writing new data.
mode("overwrite"): This specifies that the table should be overwritten with the new data.

The Spark connector for SQL Server is a different approach. It is optimized for working with SQL Server and provides better performance and additional features compared to the generic JDBC connector. To use the Spark connector, you would typically use the .format("com.microsoft.sqlserver.jdbc.spark") and configure it with specific options provided by the connector.


df.write \

    .format("com.microsoft.sqlserver.jdbc.spark") \

    .option("url", url) \

    .option("dbtable", TableName) \

    .option("truncate","true") \

    .mode("overwrite") \

    .save()

Which one to use ? :

If you are working exclusively with SQL Server and need better performance, consider switching to the Spark connector.
If you need a generic solution that works with multiple databases, or if you are already satisfied with the performance of the JDBC method, you can continue using the JDBC approach.

Share via

Query on connecting SQL server from Databricks

0 additional answers

Your answer