AzureBatchStep Class

Reference

Creates an Azure ML Pipeline step for submitting jobs to Azure Batch.

Note: This step does not support upload/download of directories and their contents.

For an example of using AzureBatchStep, see the notebook https://aka.ms/pl-azbatch.

Create an Azure ML Pipeline step for submitting jobs to Azure Batch.

Inheritance: azureml.pipeline.core._azurebatch_step_base._AzureBatchStepBase

AzureBatchStep

Constructor

AzureBatchStep(name, create_pool=False, pool_id=None, delete_batch_job_after_finish=True, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', source_directory=None, executable=None, arguments=None, inputs=None, outputs=None, allow_reuse=True, compute_target=None, version=None)

Parameters

Name	Description
name Required	str [Required] The name of the step.
create_pool	bool Indicates whether to create the pool before running the jobs. Default value: False
pool_id	str [Required] The ID of the pool where the job runs. The ID can be an existing pool, or one that will be created when the job is submitted. Default value: None
delete_batch_job_after_finish	bool Indicates whether to delete the job from Batch account after it's finished. Default value: True
delete_batch_pool_after_finish	bool Indicates whether to delete the pool after the job finishes. Default value: False
is_positive_exit_code_failure	bool Indicates whether the job fails if the task exists with a positive code. Default value: True
vm_image_urn	str If `create_pool` is True and VM uses VirtualMachineConfiguration. Value format: `urn:publisher:offer:sku`. Example: `urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter`. Default value: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter
run_task_as_admin	bool Indicates whether the task should run with admin privileges. Default value: False
target_compute_nodes	int If `create_pool` is True, indicates how many compute nodes will be added to the pool. Default value: 1
vm_size	str If `create_pool` is True, indicates the virtual machine size of the compute nodes. Default value: standard_d1_v2
source_directory	str A local folder that contains the module binaries, executable, assemblies, etc. Default value: None
executable	str [Required] The name of the command/executable that will be executed as part of the job. Default value: None
arguments	str Arguments for the command/executable. Default value: None
inputs	list[Union[InputPortBinding, DataReference, PortDataReference, PipelineData]] A list of input port bindings. Before the job runs, a folder is created for each input. The files for each input will be copied from the storage to the respective folder on the compute node. For example, if the input name is input1, and the relative path on storage is some/relative/path/that/can/be/really/long/inputfile.txt, then the file path on the compute will be: ./input1/inputfile.txt. When the input name is longer than 32 characters, it will be truncated and appended with a unique suffix so the folder name can be created successfully on the compute target. Default value: None
outputs	list[Union[PipelineData, PipelineOutputAbstractDataset, OutputPortBinding]] A list of output port bindings. Similar to inputs, before the job runs, a folder is created for each output. The folder name will be the same as the output name. The assumption is that the job will put the output into that folder. Default value: None
allow_reuse	bool Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed. Default value: True
compute_target	BatchCompute, str [Required] A BatchCompute compute where the job runs. Default value: None
version	str An optional version tag to denote a change in functionality for the module. Default value: None
name Required	str [Required] The name of the step.
create_pool Required	bool Indicates whether to create the pool before running the jobs.
pool_id Required	str [Required] The ID of the pool where the job runs. The ID can be an existing pool, or one that will be created when the job is submitted.
delete_batch_job_after_finish Required	bool Indicates whether to delete the job from Batch account after it's finished.
delete_batch_pool_after_finish Required	bool Indicates whether to delete the pool after the job finishes.
is_positive_exit_code_failure Required	bool Indicates whether the job fails if the task exists with a positive code.
vm_image_urn Required	str If `create_pool` is True and VM uses VirtualMachineConfiguration. Value format: `urn:publisher:offer:sku`. Example: `urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter`.
run_task_as_admin Required	bool Indicates whether the task should run with admin privileges.
target_compute_nodes Required	int If `create_pool` is True, indicates how many compute nodes will be added to the pool.
vm_size Required	str If `create_pool` is True, indicates the Virtual machine size of the compute nodes.
source_directory Required	str A local folder that contains the module binaries, executable, assemblies etc.
executable Required	str [Required] The name of the command/executable that will be executed as part of the job.
arguments Required	list Arguments for the command/executable.
inputs Required	list[Union[InputPortBinding, DataReference, PortDataReference, PipelineData]] A list of input port bindings. Before the job runs, a folder is created for each input. The files for each input will be copied from the storage to the respective folder on the compute node. For example, if the input name is input1, and the relative path on storage is some/relative/path/that/can/be/really/long/inputfile.txt, then the file path on the compute will be: ./input1/inputfile.txt. In case the input name is longer than 32 characters, it will be truncated and appended with a unique suffix, so the folder name could be created successfully on the compute.
outputs Required	list[Union[PipelineData, PipelineOutputAbstractDataset, OutputPortBinding]] A list of output port bindings. Similar to inputs, before the job runs, a folder is created for each output. The folder name will be the same as the output name. The assumption is that the job will have the output into that folder.
allow_reuse Required	bool Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed.
compute_target Required	BatchCompute, str [Required] A BatchCompute compute where the job runs.
version Required	str An optional version tag to denote a change in functionality for the module.

Remarks

The following example shows how to use AzureBatchStep in an Azure Machine Learning Pipeline.


   step = AzureBatchStep(
               name="Azure Batch Job",
               pool_id="MyPoolName", # Replace this with the pool name of your choice
               inputs=[testdata],
               outputs=[outputdata],
               executable="azurebatch.cmd",
               arguments=[testdata, outputdata],
               compute_target=batch_compute,
               source_directory=binaries_folder,
   )

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb

Methods

create_node

Create a node from the AzureBatch step and add it to the specified graph.

This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.

create_node

Create a node from the AzureBatch step and add it to the specified graph.

create_node(graph, default_datastore, context)

Parameters

Name	Description
graph Required	Graph The graph object to add the node to.
default_datastore Required	Union[AbstractAzureStorageDatastore, AzureDataLakeDatastore] The default datastore.
context Required	<xref:azureml.pipeline.core._GraphContext> The graph context.

Returns

Type	Description
Node	The created node.

Share via

AzureBatchStep Class

Constructor

Parameters

Remarks

Methods

create_node

Parameters

Returns

Feedback

Additional resources