Working with labeled dataset on azure ml studio and how to consume it to train yolo model

Abhishek Rajendra Jain 0 Reputation points
2025-03-05T16:35:48.5066667+00:00

I am working on training yolov8 model using azure ml studio and I use data labeling tool to label my dataset but after this I have difficulty in understanding how this labeled dataset when exported (in MLTable format) can be consumed for training yolo model. I found some reference code for doing so but there are many gaps with this documentation provided by azure like here azureml-examples

  1. How do I split my dataset which I exported it to MLtable? below code does not give clarity on this. I did run an AutomatedML job using yolov5 model and it has the functionality of splitting the exported dataset but don't know the implementation.
from
from

# Training MLTable defined locally, with local data to be uploaded
my_training_data_input 

# Validation MLTable defined locally, with local data to be uploaded
my_validation_data_input 


# WITH REMOTE PATH: If available already in the cloud/workspace-blob-store
# my_training_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/vision-classification/train")
# my_validation_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/vision-classification/valid"

  1. Do I need to download the images from blob container and annotation from labeling tool then convert it into yolo format and then upload it again to data-asset in ml workspace to train the model? because this post here suggests the same, but I found this time-consuming process if we can directly consume it.
  2. If I have to go with option 2 How do it convert exported annotation which is in MLTable format to yolo format? Is there any reference code available please share it will be highly appreciated
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,170 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vikram Singh 2,240 Reputation points Microsoft Employee
    2025-03-06T09:18:30.29+00:00

    Hi @Abhishek Rajendra Jain

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    Let's break down the steps and your questions one by one.

    How do I split my dataset which I exported it to MLtable? below code does not give clarity on this. I did run an AutomatedML job using yolov5 model and it has the functionality of splitting the exported dataset but don't know the implementation.

    To split your dataset which you exported to MLTable, you can use the Split Data component in Azure ML Studio. This component allows you to divide your dataset into training and validation sets. Here's a basic example of how to use it:

    from azure.ai.ml import Input
    from azure.ai.ml.constants import AssetTypes
    
    # Define the paths to your MLTable files
    my_training_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/vision-classification/train")
    my_validation_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/vision-classification/valid")
    

    You can configure the Split Data component to specify the percentage of data to be used for training and validation.

    Do I need to download the images from blob container and annotation from labeling tool then convert it into yolo format and then upload it again to data-asset in ml workspace to train the model? because this post here suggests the same, but I found this time-consuming process if we can directly consume it.

    If you need to convert the exported annotations from MLTable format to YOLO format, you can use helper scripts provided by Azure ML. These scripts can convert data from formats like Pascal VOC or COCO to JSONL, which can then be used to create an MLTable. Here’s an example of how to create an MLTable from JSONL format:

    paths:
      - file: ./train_annotations.jsonl
    transformations:
      - read_json_lines:
          encoding: utf8
          invalid_lines: error
          include_path_column: false
      - convert_column_types:
          - columns: image_url
            column_type: stream_info
    

    For converting to YOLO format, you might need to write a custom script to transform the JSONL annotations into the YOLO format. Unfortunately, there isn't a direct built-in function for this conversion, but you can refer to the Set up AutoML for computer vision - Azure Machine Learning | Microsoft Learn for more details on data preparation.

    If I have to go with option 2 How do it convert exported annotation which is in MLTable format to yolo format? Is there any reference code available please share it will be highly appreciated

    Regarding the direct consumption of labeled data without downloading and re-uploading, you can use the Azure ML Data Labeling tool to label your data and export it directly as an MLTable. This MLTable can then be used as input for training your YOLO model. Here’s how you can set up your training job:

    from azure.ai.ml import automl
    
    image_object_detection_job = automl.image_object_detection(
        training_data=my_training_data_input,
        validation_data=my_validation_data_input,
        target_column_name="label"
    )
    

    This setup allows you to use the labeled data directly from the cloud without the need for manual conversion and re-uploading.

    I hope you understand. If you have any further query do let us know.

    For more detailed guidance, you can refer to the official Microsoft documentation:

    If the reply was helpful, please don't forget to upvote and/or Accept the answer, this can be beneficial to other community members.

    Thank you


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.