Best practices for serverless compute
This article presents you with best practice recommendations for using serverless compute in your notebooks and jobs.
By following these recommendations, you will enhance the productivity, cost efficiency, and reliability of your workloads on Azure Databricks.
Migrating workloads to serverless compute
To protect the isolation of user code, serverless compute utilizes Azure Databricks secure shared access mode. Because of this, some workloads will require code changes to continue working on serverless compute. For a list of unsupported features, see Serverless compute limitations.
Certain workloads are easier to migrate than others. Workloads that meet the following requirements will be the easiest to migrate:
- The data being accessed must be stored in Unity Catalog.
- The workload should be compatible with shared access mode compute.
- The workload should be compatible with Databricks Runtime 14.3 or above.
To test if a workload will work on serverless compute, run it on a non-serverless compute resource with Shared access mode and a Databricks Runtime of 14.3 or above. If the run is successful, the workload is ready for migration.
Because of the significance of this change and the current list of limitations, many workloads will not migrate seamlessly. Instead of recoding everything, Azure Databricks recommends prioritizing serverless compute compatibility as you create new workloads.
Ingesting data from external systems
Because serverless compute does not support JAR file installation, you cannot use a JDBC or ODBC driver to ingest data from an external data source.
Alternative strategies you can use for ingestion include:
SQL-based building blocks like COPY INTO and streaming tables.
Auto Loader to incrementally and efficiently processes new data files as they arrive in cloud storage. See What is Auto Loader?.
Data ingestion partner solutions. See Connect to ingestion partners using Partner Connect.
The add data UI to directly upload files. See Upload files to Azure Databricks.
Ingestion alternatives
When using serverless compute, you can also use the following features to query your data without moving it.
- If you want to limit data duplication or guarantee that you are querying the freshest possible data, Databricks recommends using Delta Sharing. See What is Delta Sharing?.
- If you want to do ad hoc reporting and proof-of-concept work, Databricks recommends trying the right choice, which might be Lakehouse Federation. Lakehouse Federation enables syncing entire databases to Azure Databricks from external systems and is governed by Unity Catalog. See What is Lakehouse Federation?.
Try one or both of these features and see whether they satisfy your query performance requirements.
Monitor the cost of serverless compute
There are multiple features you can use to help you monitor the cost of serverless compute:
Use system tables to create dashboards, set up alerts, and perform ad hoc queries. See Monitor the cost of serverless compute.
Set up budget alerts in your account. See Use budgets to monitor account spending.
Import a pre-configured usage dashboard. See Import a usage dashboard.