Welcome to the Microsoft Q&A and thank you for posting your questions here.
Let's break down the steps and your questions one by one.
How do I split my dataset which I exported it to MLtable? below code does not give clarity on this. I did run an AutomatedML job using yolov5 model and it has the functionality of splitting the exported dataset but don't know the implementation.
To split your dataset which you exported to MLTable, you can use the Split Data
component in Azure ML Studio. This component allows you to divide your dataset into training and validation sets. Here's a basic example of how to use it:
from azure.ai.ml import Input
from azure.ai.ml.constants import AssetTypes
# Define the paths to your MLTable files
my_training_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/vision-classification/train")
my_validation_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/vision-classification/valid")
You can configure the Split Data
component to specify the percentage of data to be used for training and validation.
Do I need to download the images from blob container and annotation from labeling tool then convert it into yolo format and then upload it again to data-asset in ml workspace to train the model? because this post here suggests the same, but I found this time-consuming process if we can directly consume it.
If you need to convert the exported annotations from MLTable format to YOLO format, you can use helper scripts provided by Azure ML. These scripts can convert data from formats like Pascal VOC or COCO to JSONL, which can then be used to create an MLTable. Here’s an example of how to create an MLTable from JSONL format:
paths:
- file: ./train_annotations.jsonl
transformations:
- read_json_lines:
encoding: utf8
invalid_lines: error
include_path_column: false
- convert_column_types:
- columns: image_url
column_type: stream_info
For converting to YOLO format, you might need to write a custom script to transform the JSONL annotations into the YOLO format. Unfortunately, there isn't a direct built-in function for this conversion, but you can refer to the Set up AutoML for computer vision - Azure Machine Learning | Microsoft Learn for more details on data preparation.
If I have to go with option 2 How do it convert exported annotation which is in MLTable format to yolo format? Is there any reference code available please share it will be highly appreciated
Regarding the direct consumption of labeled data without downloading and re-uploading, you can use the Azure ML Data Labeling tool to label your data and export it directly as an MLTable. This MLTable can then be used as input for training your YOLO model. Here’s how you can set up your training job:
from azure.ai.ml import automl
image_object_detection_job = automl.image_object_detection(
training_data=my_training_data_input,
validation_data=my_validation_data_input,
target_column_name="label"
)
This setup allows you to use the labeled data directly from the cloud without the need for manual conversion and re-uploading.
I hope you understand. If you have any further query do let us know.
For more detailed guidance, you can refer to the official Microsoft documentation:
- Split Data: Component reference - Azure Machine Learning | Microsoft Learn
- Set up AutoML for computer vision - Azure Machine Learning | Microsoft Learn
- Prepare data for computer vision tasks - Azure Machine Learning | Microsoft Learn
If the reply was helpful, please don't forget to upvote and/or Accept the answer, this can be beneficial to other community members.
Thank you