How to properly set up an MLOps pipeline in Azure Machine Learning with an event-based trigger

Question

I want to correctly set up an MLOps pipeline in Azure Machine Learning for training an object detection model. For this I want to create several components that read in my data, augment my data, split it into train/val/test, train a model, etc.

I have read through the entire MLOps documentation and also worked through many of the learning paths. However I do not feel that there was an actual good explanation of how I have to set up my components Input and Output so they can work together properly. My questions are the following:

If I label my data in Azure ML and export these labels, I get an MLTable. How do I have to define my input so that I can extract my image and label data out of this? For example to rotate my images and masks a random amount for data augmentation.
How do I have to write the Output so that I can use this data in the next step. Can my Output really only be uri_folder, uri_file, mltable, string, bool and int? If so, how am I supposed to pass my image and label data from one step to the other?
After I have set up this pipeline I want to run it every time the Data Asset that was created by exporting my labels gets a new version. However in the UI and SDK you can only set up timer-based triggers. Is there a way to create an event-based trigger? If not, is this a planned feature? It was a feature in the SDK v1.

Answer

Hi Benedikt Schmitt,

Welcome to Microsoft Q&A forum. Thank you for posting your query.

To properly set up an MLOps pipeline in Azure Machine Learning (AML) with an event-based trigger, follow these steps:

Your object detection pipeline will consist of multiple steps, each reading input data and producing output that the next step will consume.

Handling Input and Output:

Since Azure ML exports labelled data as MLTable, your pipeline components should be designed to read MLTable format correctly.

The Input should reference the MLTable path to extract images and labels.

The Output should be stored as either:

uri_folder (if passing multiple images and labels)

MLTable (if passing structured metadata along with images)

Addressing your Questions:

If I label my data in Azure ML and export these labels, I get an MLTable. How do I have to define my input so that I can extract my image and label data out of this?

MLTable is a structured format that contains metadata and references to your images and labels.

To use this in your pipeline, define the input as an MLTable data asset in your component YAML or pipeline settings.

Your component should read the MLTable, extract image file paths and labels, and then load the corresponding data.

If you need to apply data augmentation, your code should:

Read image paths and labels from the MLTable.

Load the actual images from Azure Blob Storage (or wherever they are stored).

Apply transformations (e.g., rotation, flipping, cropping).

Save the augmented images and corresponding labels for the next step.

How do I have to write the Output so that I can use this data in the next step? Can my Output really only be uri_folder, uri_file, mltable, string, bool, and int? If so, how am I supposed to pass my image and label data from one step to the other?

Yes, Azure ML only supports uri_folder, uri_file, mltable, string, bool, and int as output types. However, you can effectively pass images and labels by structuring your outputs properly:

If you need to pass multiple image files and labels__:__ Use uri_folder.

Store all processed images and corresponding labels in a directory.
The next pipeline step can read from this directory and use the data.

If you need structured metadata along with image references: Use mltable.

Create an MLTable from your processed images and labels.
The next component can consume this MLTable and process the data accordingly.

If you need to pass a single file (e.g., a CSV with image-label mappings): Use uri_file.

Best Practice:

Use uri_folder if you need direct access to images.
Use mltable if you need structured metadata alongside images and labels.

After I have set up this pipeline, I want to run it every time the Data Asset that was created by exporting my labels gets a new version. However, in the UI and SDK, you can only set up timer-based triggers. Is there a way to create an event-based trigger? If not, is this a planned feature?

Azure ML SDK v2 does not natively support event-based triggers, but you can use Azure Event Grid to achieve this.

Workaround: Instead of relying on Azure ML’s built-in triggers, set up an Event Grid trigger when new data is added.

Steps to implement event-based pipeline execution:

Enable Event Grid on your Azure Storage account (where the MLTable is stored).

Go to Azure Portal → Storage Account → Events → Create Event Subscription.
Choose Blob Created as the event type (this fires when a new version of the dataset is uploaded).

Use Azure Logic Apps or Azure Functions to trigger the ML pipeline:

The Logic App/Azure Function will listen for the Blob Created event.
Once detected, it will trigger the Azure ML pipeline run using the REST API or SDK.

Ensure the pipeline uses the latest version of the data asset:

In Azure ML, set the data asset input version to latest.

This ensures that every pipeline run processes the newly added data.

Is this a planned feature?

Currently, event-based triggers for Azure ML pipelines are not available natively in SDK v2. This was a feature in SDK v1, and there have been discussions about reintroducing it, but no official confirmation yet. The best approach is to use Azure Event Grid + Azure Functions as a workaround.

Hope this helps. Do let us know if you any further queries.

-------------

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Thank you.

Share via

How to properly set up an MLOps pipeline in Azure Machine Learning with an event-based trigger

1 answer

Your answer