Azure ML label-import COCO-structure

Benedikt Schmitt 90 Reputation points
2025-02-10T07:09:50.4933333+00:00

For anyone finding this answer through google:
The solution is not the answer that was marked but instead a comment by Philip Emanuel in the comments of this answer. Just to save people that have the same problem a lot of reading.

I have a Data Labeling project in Azure ML for which I have previously labeled data in CVAT. I now wanted to export the data from CVAT as a COCO-file and import the labels to Azure ML so I can continue the labeling there. However I'm having trouble importing because the structure of my COCO-file is incorrect and the problem is most likely the file-paths. I have tried the following so far:

-just the image names how they are called in the data asset

-relative path from the blobstore

-Storage URI-path for every image

After trying to import the labels I have encountered either one of two errors:

-the import was successful however none of the images were labeled

-the import was successful but every time I want to check the labels it couldn't find the image and gave the following error:
"File not Found. Media could not be loaded. The file is missing from storage. It may have been deleted by the owner."

The problem can not be explained due to access rights.

Is there maybe an example .json-file someone can provide, showing how the data needs to be structured to import it into my Data Labeling Project? I couldn't find any examples in the documentation.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,141 questions
0 comments No comments
{count} vote

Accepted answer
  1. Sina Salam 18,201 Reputation points
    2025-02-10T11:34:23.2466667+00:00

    Hello Benedikt Schmitt,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having Azure ML label-import COCO-structure issue.

    Try to follow the steps below to resolve the issue:

    Step 1:

    Register Your Data Asset Correctly in Azure ML by ensuring your images are stored in a blob container (e.g., my-container/images/*.jpg).

    When registering the dataset in Azure ML:

    • Set the datastore to point to your blob container.
    • Set the path to the folder containing your images (e.g., images/).
    • This defines the "root" directory for your images in Azure ML.

    Step 2:

    In CVAT, export annotations as COCO. By default, CVAT might use absolute paths or paths relative to its own system. So, modify the COCO file_name to match the relative path from your Azure ML data asset’s root. For an example: If your data asset is registered at images/, and your image is in images/folder1/img.jpg, the file_name should be folder1/img.jpg.

    Step 3:

    This is an example of COCO File for Azure ML

    ```json
    {
      "images": [
        {
          "id": 1,
          "file_name": "folder1/image1.jpg",  // Relative to Azure ML data asset root
          "width": 640,
          "height": 480
        }
      ],
      "annotations": [
        {
          "id": 1,
          "image_id": 1,
          "category_id": 1,
          "bbox": [x, y, width, height],
          "area": (width * height),  // Required field; compute if missing
          "segmentation": [],
          "iscrowd": 0
        }
      ],
      "categories": [
        {
          "id": 1,
          "name": "cat"
        }
      ]
    }
    

    Step 4:

    Before importing, verify that paths in the COCO file match the data asset:

    from azure.ai.ml import MLClient
    from azure.identity import DefaultAzureCredential
    ml_client = MLClient.from_config(DefaultAzureCredential())
    dataset = ml_client.data.get(name="your-data-asset-name", version="1")
    # List files in the data asset
    files = [f.path for f in dataset.paths]
    print(files)  # Check if "folder1/image1.jpg" exists here
    

    Step 5:

    Use the Azure ML labeling UI to import the COCO file. Ensure you select the correct data asset during import.

    Step 6:

    For "File not found" Error: Use the SDK code above to confirm the file_name in the COCO JSON exists in the data asset’s paths.

    For "Empty Labels After Import": Validate the area field is populated (Azure ML requires it). Compute it as area = bbox[2] * bbox[3] if missing.


    OPTION 2:

    Step 1: Ensure that your Azure ML data asset is correctly registered and points to the right storage location. You can use the following command to list the contents of your registered dataset:

    from azure.ai.ml import MLClient
    from azure.ai.ml.entities import Data
    from azure.identity import DefaultAzureCredential
    ml_client = MLClient.from_config(DefaultAzureCredential())
    # Replace with your actual data asset name
    dataset_name = "your-data-asset-name"
    dataset_version = "latest"
    # Retrieve dataset details
    dataset = ml_client.data.get(name=dataset_name, version=dataset_version)
    # List all file paths in the dataset
    file_paths = [file.path for file in dataset.path]
    print(file_paths)
    

    This will help confirm whether the dataset contains the expected image file paths.

    Step 2: Azure ML requires that the file_name field in the COCO file must match the relative path within the registered dataset. A properly formatted COCO JSON should look like this:

    {
      "images": [
        {
          "id": 1,
          "file_name": "images/folder1/image1.jpg",  // Ensure this matches your dataset structure
          "width": 640,
          "height": 480
        }
      ],
      "annotations": [
        {
          "id": 1,
          "image_id": 1,
          "category_id": 1,
          "bbox": [100, 50, 200, 300],
          "area": 60000,
          "segmentation": [],
          "iscrowd": 0
        }
      ],
      "categories": [
        {
          "id": 1,
          "name": "cat"
        }
      ]
    }
    

    Step 3: Use the following code snippet to check if a specific file path exists in the dataset:

    search_file = "images/folder1/image1.jpg"  # Adjust based on your dataset
    exists = any(search_file in path for path in file_paths)
    if exists:
        print(f"File {search_file} exists in dataset.")
    else:
        print(f"File {search_file} NOT found. Check dataset structure.")
    

    This helps verify if your COCO file paths match those in Azure ML.

    Step 4: Once the COCO file is structured correctly, then re-import Labels and Validate.

    • Go to Azure ML Labeling UI.
    • Select the correct dataset (verify that images are loading properly).
    • Import the COCO file and check if labels appear correctly.

    The below is an official documentation or examples on COCO dataset structure for Azure ML: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-image-labeling

    NOTE: If the link brings up error 404, kindly select the specific topic to navigate among the lists.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


1 additional answer

Sort by: Most helpful
  1. Manas Mohanty 745 Reputation points Microsoft Vendor
    2025-02-13T12:16:40.9+00:00

    Hi Benedikt Schmitt

    We have noticed that you rated an answer as not helpful. We appreciate your feedback and are committed to improving your experience with the Q&A.

    It seems you are facing issue in using CVAT labelled dataset in your AML project

    Ideally, the process in Azure ML labelling project suggests using labelling manually/ml assisted/from labelers from marketplace, The annotation of dataset gets stored in same storage location along the original data set.

    Downloading it as Coco Json file and modifying the paths as per storage path might be hectic. So, it would be better to directly connect azure storage account to CVAT and label the dataset then we can create data asset out of it to use it in our Azure ML labelling project

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.

    Attach Cloud storage to CVAT

    Reference on creating URI_folders

    Reference on Azure ML labelling

    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    Thank You.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.