Azure Compute: Introduction To Azure Batch
Introduction
In this post, we talk about Azure Batch Service. This service was created to execute parallels batch workloads without having the stress of the resources that they need to run these jobs. And that happens because Azure Batch is responsible to create and manage pools of compute nodes. Imagine a lot of parallel batch workloads separated each from another. In a quick thought we can say that is similar to another Azure service VMSS (Virtual Machine Scale Sets).
We can start processing parallels workloads via Azure Batch service ONLY by using APIs and tools. These can be used to Create and manage a pool of compute nodes, and then we can schedule to run different jobs and tasks.
For Azure Batch service we need to pay for the used resources (compute & storage).
Azure Batch Use Cases
In the table below we can see some of the Azure Batch use cases with an example to understand where Batch service used in every case.
Use Case | Example |
Financial Data Analysis | Imagine the financial data analysis for a bank |
Software Testing | A developing software team can run multiple parallel tests for an application |
ΑI Solution | A good example is a Face recognition service |
Azure Batch Architecture
At the image below we can see a sample of Azure Batch architecture.
Note
A Pool execute a job and the Nodes(VMs) of the Pool executes one or more tasks/jobs.
Parallel Job Processing
What Azure Batch service do well is to break down a job to several tasks, with this technique the application runs independently in each VM of the pool and each result of these tasks complete the parts of the work until we have the final result. The following example show exactly how this works.
Azure Batch Features
Azure Batch service features are divided in three major categories:
- Pools Management
- Jobs And Tasks Management
- Batch Solutions Monitoring
Pools Management
The resource management feature refers to Nodes, Auto-Scaling, Low Priority Nodes and Application Packages.
Nodes
In this section there are three categories, Cloud Service (Web and Worker Roles), Virtual Machines, and Custom Image VM.
Auto-Scaling
In Azure Batch service we can define parameters based on the deployment needs which enable the auto-scale.
Low Priority Nodes
Azure Batch also offers Low Priority VMs which, are offered for low-priority workloads. This feature is NOT for critical workloads and is recommended if someone needs to reduce the Batch Workload cost.
Application Packages
The Application feature is about the app packages we are able to upload to the Batch Service Account and then are automatically deployed on one or more nodes in the pool.
Jobs And Tasks Management
Azure Batch manages Jobs and Tasks execution and scheduling. In some cases the input of task A could be the output of task B, that means that the tasks could depend on the one from another. Another great benefit is that the tasks can also run on multiple computer nodes.
Batch Solutions Monitoring
There are several tools we can monitor the nodes and the jobs with tasks in the Azure Batch Service. These are:
- Azure Portal
- Batch Explorer (ex Batch Labs)
- Application Insights
- Metrics Using API
Azure Batch Concepts
Service Quotas And Limits
In this point of the post, we are going to read about the Quotas and the Limits of Azure Batch Service. We should read carefully the following tables because if we don't understand the meaning of the values for the quotas and the limitations might have future problems with the workloads.
Resource Quotas
Service Quota is quite important for the Azure Batch workloads because it is very likely in a rough design might be reached this limit.
Resource | Default Limit | Maximum Limit |
Batch accounts per region per subscription | 1-3 | 50 |
Dedicated cores per Batch account | 10-100 | N/A |
Low-priority cores per Batch account | 10-100 | N/A |
Active jobs and job schedules per Batch account | 100-300 | 1000 |
Pools per Batch account | 20-100 | 500 |
Pool Size Limits
The Pool Size is the number of the nodes, as a single node consider a virtual machine.
The next table shows the limits for the pool size.
Resource | Maximum Limit |
Compute nodes in inter-node communication enabled pool | |
Batch service pool allocation mode | 100 |
Batch subscription pool allocation mode | 80 |
Compute nodes in | |
Dedicated nodes | 2000 |
Low-priority nodes | 1000 |
Note
We can choose high-priority nodes, which are dedicated VMs and low-priority nodes, of course there some limitations which can found in this section of the post.
Other Limits
All the other limits are relevant with the Azure Batch Workloads details.
Resource | Maximum Limit |
Concurrent tasks per compute node | 4 x number of node cores |
Applications per Batch account | 20 |
Application packages per application | 40 |
Maximum task lifetime | 180 days |
If the workloads need to increase the quota on an Azure Batch Account, then we can follow the directions at this link.
Supported VM Sizes
When we create an Azure Batch Pool, it is very important to select the correct VM size for the nodes of the Pool.
At the tables below we can see what are the sizes that the Azure Batch Pool DOES NOT support.
Family | Unsupported sizes |
Basic A-series | Basic_A0(A0) |
A-series | Standard_A0 |
B-series | All |
DC series | All |
Extreme memory optimized | All |
Hb-series* | All |
Hc-series* | All |
Lsv2-series* | All |
NDv2-series* | All |
NVv2-series* | All |
SAP-HANA | All |
Note
* Not currently supported, but will be supported in the future
VM size which are supported for Low-Priority nodes
Family | Supported Sizes |
M-Series | Standard_M64ms |
M-Series | Standard_M128ms |
Virtual Machine Image Type
There are two types of images that we can choose between Pre-configured and Custom. Of course, there are some differences between those two types :
Pre-configured Image | Custom Image |
The image already exists | Need to create a new one |
No need for updating & patching | Need patching & updating |
All custom software need to be installed via pool config | No need for large changes in the pool config |
How It Works
In the following steps, we will see a quick demo of the Azure Batch Service.
Prerequisites
To proceed further with the demo we must be sure that we have all the following:
- Microsoft Azure Batch Account linked with an Azure Storage Account
- Visual Studio 2017or .Net Core 2.1
Create The Azure Batch Account
Search for the Azure Batch Service
From the Azure Portal left main blade, select + Create a resource, type [Batch Service], and select to Create the Batch Service.
Basics Tab
In the Basics Tab we have to fill in few fields and move to the tab "Advanced"
Setting | Value |
Subscription | Select a valid subscription |
Resource group | Select an existing or create a new Resource group |
Account name | Type a name for the Azure Batch Account, MUST be unique |
Location | Select a Location for the Batch Instance |
Select a Storage account | Select an existing storage account or after deployment complete, create a storage account and link it with the Azure Batch account. |
Advanced Tab
In the Advanced tab, we must choose a Pool allocation mode. The choices are two Batch service and User subscription, for the demo purposes we select Batch service.
Batch service | The pool VMs are created using behind-the-scenes Batch service subscriptions. |
User subscription | The pool VMs are created directly in the same subscription as the Batch account. |
Check this blog post about Azure Batch capabilities for more details.
Review + Create Tab
In the Review + create tab, we just need to check if the validation passed and click Create to start the Azure Batch Account deployment.
After the Azure Batch account is created we are ready to see how this works. And that is the juicy part of this post.
Note
For the part of the demo we will use an existing project from GitHub, which is coded from the user dlepow.
Azure Batch Sample
First, we must connect to GitHub and move on the "Azure-Samples/batch-dotnet-ffmpeg-tutorial" section, by clicking here.
Run the file BatchDotnetTutorialFfmpeg.sln
As we can see there are some Dependencies missing, for that reason we select Built - Rebuild Project
We must download and install the .Net Core SDK, and then Build the Project (Build - Build Solution).
After the Build is complete successfully, the Dependencies looks fine.
Download & Install Application Packages
One of the prerequisites for this Azure Batch app is to download and upload the ffmpeg3.4 to the Azure Batch Service. This can be done by following the next steps:
- Download the 64bit ffmpeg 3.4 file from this here.
- Upload the zip file "ffmpeg-3.4-win64-static.zip" to Azure Batch Service, from the left menu blade Features - Applications - + Add
The Code Part
After we successfully complete the Build of the solution then we must make some changes to the code.
public class Program
{
// Update the Batch and Storage account credential strings below with the values unique to your accounts.
// These are used when constructing connection strings for the Batch and Storage client objects.
// Batch account credentials
private const string BatchAccountName = "xxxxxxxxx";
private const string BatchAccountKey = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx==";
private const string BatchAccountUrl = "https://xxxxxxxxxxxx.westeurope.batch.azure.com";
// Storage account credentials
private const string StorageAccountName = "xxxxxxxxxxxxxxxxxxxxx";
private const string StorageAccountKey = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx==";
The Batch and Storage Account Credentials are in the Batch Account dashboard in Azure Portal (see the image below).
Running The App
We complete all the necessary steps with download, installations, configurations etc. And the next thing to do is just Running the Application.
Note
This App is processing media files in parallel using the ffmpeg tool
The first thing when the app runs is the following cmd prompt console
Create Containers [Input] - [Output]
The Console App creates two Storage Containers (Input, Output).
Upload The Media Files
The second step is to begin media files uploading
Create The Batch Pool
The App creates 5 low-priority nodes inside the Batch Pool and the 5 Tasks that will run parallel in every node. At the two next images we can see exactly the Pool with the nodes in the Azure Portal.
The Running Tasks
At the image below we can clearly see, what about Azure Batch parallel workload works
The Final Results
For the final step we don't have something to do, all the 5 files are processed and created in the Output folder.
Conclusion
In this post, we made a quick intro to Azure Batch a service that is basically addressed to developers but also can be useful and a very important tool for other groups like IT or in our days much better DevOps.
See Also
- Batch service quotas and limits
- VM sizes for compute nodes in an Azure Batch pool
- Create an automatic scaling formula for scaling compute nodes in a Batch pool
- Use multi-instance tasks to run Message Passing Interface (MPI) applications in Batch
- Tutorial: Run a parallel workload with Azure Batch using the .NET API
- Azure Batch Forum
- Azure Batch Videos - Channel 9
- Batch Pricing