What is the optimal training time for a dataset in Azure Custom Vision?

Alex Hotianovich 20 Reputation points
2025-02-18T19:38:23.1233333+00:00

I am using Azure Custom Vision to train a custom image classification model with a dataset of approximately 3,000 images. The platform allows me to set the training time between 1 hour and 96 hours.

Could you please advise on the following:

  1. What training duration would be optimal for a dataset of this size to achieve the best model quality?
  2. What factors (e.g., dataset complexity, image resolution, or number of tags) should I consider when choosing the training time?
  3. How does increasing the training time impact model accuracy, and is there a point of diminishing returns?

I want to ensure that the model is trained effectively without unnecessary delays. Any guidance or best practices would be greatly appreciated!

Azure AI Custom Vision
Azure AI Custom Vision
An Azure artificial intelligence service and end-to-end platform for applying computer vision to specific domains.
268 questions
0 comments No comments
{count} votes

Accepted answer
  1. Prashanth Veeragoni 400 Reputation points Microsoft Vendor
    2025-02-19T06:08:47.45+00:00

    Hi Alex Hotianovich,

    Welcome to Azure ML Q and A forum. Thank you for posting your query.

    As I understood that You are using Azure Custom Vision to train an image classification model with a dataset of ~3,000 images. The platform allows setting training durations between 1 hour and 96 hours.

    Recommended Training Duration__:__

    Start with 2–4 hours as a baseline.

    If accuracy is low (<85%), extend to 6–12 hours for better feature learning.

    12–24 hours may offer small accuracy improvements but has diminishing returns.

    Avoid 96 hours, unless the dataset has extreme complexity (e.g., very high resolution, fine-grained classes).

    Factors Influencing Training Time__:__

    Dataset Complexity__:__ More diverse images (e.g., different angles, lighting conditions) need longer training

    Number of Tags (Classes):

    <10 classes → 2–4 hours is sufficient

    50+ classes → May require 6–12+ hours

    Image Resolution__:__ High-resolution images (1024px+) require more compute and may benefit from longer training.

    Model Type:

    Compact models (for edge deployment) train faster but may have lower accuracy.

    Standard models (for cloud inference) take longer but are more precise.

    Impact of Training Time on Accuracy__:__

    Major accuracy gains happen in the first 2–6 hours.

    Beyond 12–24 hours, accuracy improvement is usually __<__1-2%.

    Training too long (>24–48 hours__)__ risks overfitting, especially with smaller datasets.

    Hope this helps. Do let us know if you any further queries.  

    ------------- 

    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    Thank you.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.