Hi @Jayaswaroop Adabala
Welcome to Microsoft Q&A platform and thanks for posting your query here.
It's great to see that you are looking for ways to optimize your solution and save costs. Based on the information you provided, I can suggest a few potential cost savings that you could achieve by switching to the new proposed solution.
In your current setup you're running a Databricks cluster with 1 driver node and 8 worker nodes (Standard_DS3 instance)
Your cluster configuration uses a Standard_DS3 for both the driver and worker nodes:
- VM Cost: $0.185/hour per node.
- DBU Cost: $0.400/hour per node.
- Total Cost per Node: $0.585/hour.
For your cluster setup with 1 driver + 8 worker nodes (9 nodes):
- Total Hourly Cluster Cost: $0.585 × 9 = $5.265/hour.
Assuming a refresh schedule of 1 hour per week:
- Weekly Cost: $5.265 × 1 = $5.27/week.
- Monthly Cost (4 weeks): ~$21.06/month.
Besides direct costs, maintaining notebooks, staging tables, and schedulers adds to development and operational effort. These are important but harder to quantify as they depend on your team's time and complexity.
By moving business logic directly into Power BI and querying the Hive Metastore:
- Databricks Cluster Cost: Eliminated entirely for this workflow, saving ~$21/month in direct cluster costs.
- Operational Efficiency: No need to manage notebooks, staging tables, or schedulers. Simplifies the overall architecture, saving time and reducing potential points of failure.
Also, it's difficult to provide an exact estimate of the cost savings without knowing more about your specific usage patterns and pricing model. However, based on the information you provided, I would expect that you could achieve significant cost savings by switching to the new proposed solution.
Reference:
I hope this helps! Let me know if you have any other questions.