Compute creation cheat sheet

This article aims to provide clear and opinionated guidance for compute creation. By using the right compute types for your workflow, you can improve performance and save on costs.

Best Practice Impact Docs
If you are new to Azure Databricks, start by using general all-purpose instance types Selecting the appropriate instance type for the workload results in higher efficiency. - Create a cluster
Use shared access mode unless your required functionality isn’t supported Compute with shared access mode can be used by multiple users with data isolation among users. - Access modes
Use the latest generation instance types if there is enough availability The latest generation of instance types provide the best performance and latest features. - Azure instance types
Set your on-demand and spot-instance balance based on how quickly you need your workload to run Spot instances save on cost but can affect the overall run time of an operation if the spot instances are reclaimed. - Compute configuration recommendations
Choose the size of your nodes and the number of workers based on the types of operations your workload performs For example, if you expect a lot of shuffles, it can be more efficient to use a large single node instead of multiple smaller nodes. - Compute sizing considerations
Run vacuum on a cluster with auto-scaling set for 1-4 workers, where each worker has 8 cores.

Select a driver with between 8 and 32 cores. Increase the size of the driver if you get out-of-memory (OOM) errors.
Vacuum statements happen in two phases, the second of which is driver-heavy. If you don’t use the right-sized cluster, the operation could cause a slowdown and might not succeed. - What size cluster does vacuum need?
- VACUUM best practices
Assess whether your batch workflow would benefit from Photon Photon provides faster queries and reduces your total cost per workload. - Photon advantages