Troubleshooting Azure Databricks to On-premise MS SQL Server VPN tunnel instability

Nathalie 20 Reputation points
2025-02-14T10:35:45.9233333+00:00

We have a VPN tunnel that connects Azure Databricks to an on-premise server. The tunnel is however instable, and the phase 2 selector goes down daily at random times. We did an audit and it does seem to be mostly after working hours, varying from 7pm to 1am to 4am - it varies per day. Once we restart the phase 2 selector, it keeps running for a couple of hours until it goes down again. What we already did but didn't help:

  • A keep-alive job in databricks that uses the compute to send a query every 10 minutes to the on-premise server. (we experimented with higher intervals, but that didn't seem to help either)
  • Make sure that the compute doesn't auto-delete after it is idle

Could you help me with some additional checks to verify where this problem comes from? Thank you.

Azure Virtual Network
Azure Virtual Network
An Azure networking service that is used to provision private networks and optionally to connect to on-premises datacenters.
2,645 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,335 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Praveen Bandaru 430 Reputation points Microsoft Vendor
    2025-02-18T13:24:12.55+00:00

    Hello Nathalie
    Greetings!
    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this!

    Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others", I'll repost your solution.

    Please click "Accept" the answer as original posters help the community find answers faster by identifying the correct answer.

    Issue: Troubleshooting Azure Databricks to On-premises MS SQL Server VPN tunnel instability

    Resolution: After you did some troubleshooting on the on-premises side, the problem appeared to be that the tunnel was going down due to the fact that it didn't observe traffic from data bricks to the on-premises server after hours. you resolved this by doing to things:

    • Increasing the keep-alive job to a continuous job in Databricks
    • Putting on a setting in the firewall that always keeps the phase 2 tunnel active despite no data is being processed

    Please don’t forget to close the thread by clicking "Accept the answer" wherever the information provided helps you, as this can be beneficial to other community members.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.