Optimal Labeling Approach for Date Extraction in Custom Model

Remy 0 Reputation points
2025-01-28T16:05:41.45+00:00

I’m seeking guidance on the best way to label date-related data for a custom extraction model. Below is an example of the type of data I need to extract, which varies in format as shown in the table.

Key Questions:

  • Should I label Day, Month, Year, and AM/PM separately, or is it better to use a single bounding box covering all or a subset?
    • If using a single box, what sub-type should I assign? (Date > dmy?)
    • If using separate boxes, what sub-types should I assign for each field based on the variations in format?

I asked CoPilot the same question twice and received completely contradictory answers:

  • First response: Label separately as Date > Not Specified
  • Second response: Label as a single bounding box as Date > dmy
Field Example Values
Day 1st, 2nd, 3rd, 3, 15, 25
Month Dec, December, 12
Year 2023, 2024, 23, 24
AM / PM Checkbox or other label?

Thanks for your support!

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,984 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. SriLakshmi C 3,415 Reputation points Microsoft External Staff
    2025-01-28T20:36:56.27+00:00

    Hello Remy,

    Greetings and Welcome to Microsoft Q&A! Thanks for posting the question.

    I understand that you are looking for Labeling date-related data for a custom extraction model,

    If your dataset has consistent date formats, you can use a single bounding box to label the entire date. In this case, assign the sub-type based on the format, such as Date > dmy for day-month-year formats or Date > mdy for month-day-year formats. This approach is ideal when the dataset contains uniform and predictable date structures.

    However, if your dataset is highly inconsistent or you need to extract specific date components, it is better to use separate bounding boxes for each part of the date. Label each component individually with specific sub-types:

    • Day: Assign Date > Day (e.g., "1st", "15", "3rd").
    • Month: Assign Date > Month (e.g., "Dec", "December", "12").
    • Year: Assign Date > Year (e.g., "2023", "23").

    Choosing between these two approaches depends on the variability of your dataset and the granularity of the information you need to extract.

    I Hope this helps. Do let me know if you have any further queries.


    If the response helped, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.