How to cancel a hanging Custom Vision training iteration

bjsmiley 0 Reputation points
2024-12-30T15:58:07.4566667+00:00

A Custom Vision training iteration was started 4 days ago with a 4-hour reserved budget, but it still shows as training. Retraining and exporting the project are not possible until this iteration stops hanging. What are the recommendations for canceling this training to proceed? There is a thread from a few years ago discussing a similar issue, which required direct intervention from Microsoft to resolve.

Edit:
more recent similar thread
Location: South Central US
Iteration ID: bd318e3f-59d4-4019-ba1f-9a0bfba7b1e8

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,060 questions
Azure AI Custom Vision
Azure AI Custom Vision
An Azure artificial intelligence service and end-to-end platform for applying computer vision to specific domains.
255 questions
{count} votes

1 answer

Sort by: Most helpful
  1. hossein jalilian 9,535 Reputation points
    2024-12-30T21:40:22.7533333+00:00

    Thanks for posting your question in the Microsoft Q&A forum.

    Here are some recommendations to address this problem:

    • Wait for automatic resolution: In some cases, the issue may resolve itself after a certain period. As mentioned in one of the search results, a training job that was stuck for 24 hours eventually succeeded
    • Try deleting the iteration: If possible, attempt to delete the stuck iteration using the Custom Vision API. This approach was successful for some users in similar situations1.
    • Export and reimport the project: As a workaround, you could try exporting your project data and importing it into a new project. This method helped some users overcome stuck iterations1.
    • Check for backend issues: There have been instances where backend issues caused training jobs to fail or get stuck. These are usually fixed by the Azure team, so it's worth checking if there are any known ongoing issues
    • Avoid certain labeling practices: In some cases, using auto-generated labels like [Auto-Generated] Other Products can cause training failures. Ensure you're not using such labels in your dataset
    • Contact Azure Support: Since this issue has persisted for 4 days, it's advisable to reach out to Azure Support directly. They can check the status of your training request from the backend and potentially resolve the issue

    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.