Introducing Job Preparation and Release Task on Batch Service
As required by a lot of our customers, Azure Batch Service has recently introduced job preparation and release task. Job preparation task allows user to run a command line on a TVM if it's the first time of the TVM to run any job from that task. Similarly, job release task will run a user specified command line when the TVM will not run any tasks from that job.
Installation
The current Azure Batch service supports this feature and there is no action required for end user. However, to use this feature, client application must use the latest Azure Batch Client library from NuGet (https://www.nuget.org/packages/Azure.Batch/). Version 1.3.0 is the first version that supports job preparation and release tasks.
This feature can be handy in a lot of use cases. Here’s are a few.
Scenario
Job level data downloading. It's quite common that a Batch job requires a set of common data as input for all tasks. For example, when running daily risk analysis job, market data is job specific but common to all tasks in that job. The market data (often as big as several GBs) should be downloaded to TVM for only once and all tasks can use that. On the other hand, it should not be downloaded to all VMs when they are created because it might be a potential waste of bandwidth and time - this is very important for a shared pool environment.
Job clean up. Again, when running in a shared pool environment, a sensible thing to do is to deleting all job data when it's completed because in the long run, the VM disk will be filled by historic tasks. Job release task provides a chance for user to clean up the data generated during task run.
Keep application logs. Sometime it's useful to keep a copy of logs or dump files generated by failed application. Job release task can be used to upload those data to user specified Azure Storage location.
Features
- Specify a command line to run before and/or after all tasks of a job run on VM.
- Normal Batch task capabilities. E.g., run as elevated user, environment variables, download blob from Azure storage before task start, max running time, max retry count, and max file retention time.
- Batch Service will wait for the successful completion of the job preparation task before scheduling any tasks on the TVM. A job preparation task execution is considered successful if its exit code is 0. It can be configured to not wait the completion of the job preparation task. Default is wait.
- Rerun the job preparation task after a reboot of a TVM which had a previously successful run the job preparation task. This can be configured. Default is true.
Pool startup task and Job Preparation Task
User can specify a startup task on Batch pool already. The difference is that pool start task will execute only when the VM joins the pool. In the case of automatic pool when pool is created and destroyed with job, pool start up task and job preparation task behave the same. However, when pool life time span across multiple jobs, job preparation task is appropriate for job specific actions while pool start up task should be used for global actions.
Also, there is no correspondence to job release task. To perform VM clean up or result upload, it must be done in job release task. Of course, VM cleanup is not necessary if the VM is ready to depart the pool and about to get reimaged.
Using Job Preparation Task
As long as you have download Azure Batch client library 1.3.0 or later, you have the capability to use job preparation task. A sample can be found on Azure Batch Sample repo on github at https://github.com/Azure/azure-batch-samples/blob/master/CSharp/JobPrep/Program.cs#L334-341.
If you want to set a job preparation task on workitem, the 2 mandatory fields are name and command line. You can set it as below.
cloudWorkItem.JobSpecification = new JobSpecification()
{
JobPreparationTask = new JobPreparationTask()
{
Name = "jobprep",
CommandLine = "cmd /c ping 127.0.0.1"
}
};
MSDN document will come soon.