Share via


Azure Databricks: Walkthrough

Introduction

In order to create Azure Databricks workspace, please follow this tutorial if you haven’t done already.

https://social.technet.microsoft.com/wiki/contents/articles/52605.how-to-create-a-azure-databricks-workspace.aspx

Azure Databricks Workspace Home

This is the first page you can see when you sign into Azure Databricks Workspace. As in below image, it is kind of a dashboard you can click and perform various activities. In this article I’ll go through one by one.

1. Explore the quickstart tutorial

If you are using the Azure Databricks for the first time or new to Apache Spark, this would be the ideal place to start and see what kind of data engineering and analytics you can perform on Big Data using Azure Databricks

2. Import and Export Data

In here you can simply upload files in to DBFS (Databricks File System). Once you specify the path, all users will be able to access these data files using the path.

In the second tab you can browse the DBFS and at the below two buttons. Using the data files you can straightaway create Table in Notebook for analysing data.

Then again if you try to run cells you will be asking to attach the notebook into a cluster. Since you don’t have a cluster created, first you need to create a one. Will discuss this in next tutorial.

In the third tab for other Data Sources like Azure Data Lake Store, Kafka, Cassandra, Elasticsearch etc..

3. Create a Blank Notebook

By clicking this button you can create and start with a blank notebook.

Common Tasks

4 New Notebook

Again as previously you can create a New Notebook in Databricks

5 Upload Data

In here also you will be redirect to the view where you can upload/connect new data sources and move to DBFS.

6 Create Table

Which is quite similar to the Upload Data,

7 New Cluster

In this link you will be redirect to a page where you can create a new  spark cluster

8 New Job

You can create a Job in order to run a Notebook immediately or scheduled basis manner

9 New ML Flow Experiment

This is a new feature added, you can create Machine Learning Flow experiment in Databricks. You can learn more about this in here https://docs.azuredatabricks.net/applications/mlflow/index.html

 

10 Importing Library

In the notebooks when you working with coding with Python and other languages, you may require certain necessary libraries to refer. This is the place where you can import libraries

 

11 Read Documentation

To read complete azure databricks documentation, you can click this link and go through

 

Recents

It will list down the most recent files and resources you have been used. So you can quickly access those by clicking here.

Documentation

Databricks Guide:

Again you will redirect to databricks documentation.

Python:

Gentle introduction to Spark https://docs.azuredatabricks.net/_static/notebooks/gentle-introduction-to-apache-spark.html

R:

SparkR Overview https://docs.azuredatabricks.net/spark/latest/sparkr/overview.html

Scala:

Databricks for Data Scientists https://docs.azuredatabricks.net/_static/notebooks/databricks-for-data-scientists.html

SQL:

This also redirect to Databricks for Data Scientists. I believe this must redirect to SQL Guide. https://docs.azuredatabricks.net/spark/latest/spark-sql/index.html

Import Data:

This contains guideline for whole range of data sources which you can connect with Azure Databricks https://docs.azuredatabricks.net/spark/latest/data-sources/index.html

Sidebar Menu

Next will move to sidebar menu. That menu also provide quick access to your existing resources and to create new ones.

Azure Databricks

This will always redirect to workspace home.

Home

This will open up resources created under your account. Also it is a path to create new resources.

Workspace

This is also very similar to the Home, you can access existing resources and create new ones like, Notebooks, Library, Folder etc

 

Recent

Shows recent files you accessed

Data

In here you can see the Databases and tables you already created and also you can add new Data using the button at top right corner.

Clusters

Clusters button allow you to view the clusters page. It shows existing clusters, and state of cluster also you can create a new cluster using that.

 

Jobs

As above you can create a new job to schedule Notebooks

You can search workspace items using Search option.

Top Menu

PORTAL: this button will redirect you to azure portal. https://portal.azure.com/#home

User Settings

Here you can generate tokens to secure access to Databricks API instead of passwords, integrate your project with Git repo and configure notebook settings.

Admin Console

This admin panel allow you (administrator) to perform administrative tasks like, create user groups, manage access control, manage storage etc