문서 인텔리전스 추가 기능

아티클
12/12/2024

이 콘텐츠는 v4.0(GA) | 이전 버전: v3.1(GA) :::moniker-end에 적용됩니다.

이 콘텐츠는 v3.1(GA) | 최신 버전: v4.0(GA)에 적용됩니다.

참고 항목

추가 기능은 비즈니스 카드 모델을 제외한 모든 모델 내에서 사용할 수 있습니다.

기능

문서 인텔리전스는 더욱 정교한 모듈식 분석 기능을 지원합니다. 추가 기능을 사용하여 문서에서 추출된 더 많은 기능이 포함되도록 결과를 확장합니다. 일부 추가 기능에서는 추가 비용이 발생합니다. 이러한 선택적 기능을 문서 추출 시나리오에 따라 사용하거나 사용하지 않도록 설정할 수 있습니다. 기능을 사용하도록 설정하려면 연결된 기능 이름을 features 쿼리 문자열 속성에 추가합니다. 쉼표로 구분된 기능 목록을 제공하여 요청에서 추가 기능을 2개 이상 사용하도록 설정할 수 있습니다. 다음 추가 기능은 2023-07-31 (GA) 이상 릴리스에 사용 가능합니다.

ocrHighResolution
formulas
styleFont
barcodes
languages
Searchable PDF 지원
queryFields
keyValuePairs

참고 항목

일부 추가 기능만 모든 모델에서 지원됩니다. 자세한 내용은 모델 데이터 추출을 참조하세요.
현재 Microsoft Office 파일 형식에는 추가 기능이 지원되지 않습니다.

버전 가용성

추가 기능	추가 기능/무료	2024-11-30(GA)	`2023-07-31`(GA)	`2022-08-31`(GA)	v2.1(GA)
Font 속성 추출	추가 기능	✔️	✔️	해당 없음	해당 없음
수식 추출	추가 기능	✔️	✔️	해당 없음	해당 없음
고해상도 추출	추가 기능	✔️	✔️	해당 없음	해당 없음
바코드 추출	Free	✔️	✔️	해당 없음	해당 없음
언어 감지	Free	✔️	✔️	해당 없음	해당 없음
키 값 쌍	Free	✔️	해당 없음	해당 없음	해당 없음
쿼리 필드	추가 기능*	✔️	해당 없음	해당 없음	해당 없음
Searhable pdf	추가 기능**	✔️	해당 없음	해당 없음	해당 없음

✱ 추가 기능 - 쿼리 필드 가격은 다른 추가 기능과 다르게 책정됩니다. 자세한 내용은 가격 책정을 참조하세요.
** 추가 기능 - 검색 가능한 pdf는 추가 기능으로 읽기 모델에서만 사용할 수 있습니다.

지원되는 파일 형식

PDF
이미지: JPEG/JPG, PNG, BMP, TIFF, HEIF

✱ Microsoft Office 파일은 현재 지원되지 않습니다.

고해상도 추출

엔지니어링 드로잉과 같은 대형 문서에서 작은 텍스트를 인식하는 작업은 어려운 일입니다. 텍스트가 다른 그래픽 요소와 혼합되어 있는 경우가 많으며 글꼴, 크기 및 방향이 다양합니다. 또한 텍스트를 별도의 부분으로 나누거나 다른 기호와 연결할 수 있습니다. 이제 문서 인텔리전스는 ocr.highResolution 기능을 사용하여 이런 유형의 문서에서 콘텐츠 추출을 지원합니다. 이 추가 기능 기능을 사용하도록 설정하면 A1/A2/A3 문서에서 콘텐츠 추출 품질이 향상됩니다.

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=ocrHighResolution

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-highres.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.OCR_HIGH_RESOLUTION],  # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_with_highres]
if result.styles and any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

for page in result.pages:
    print(f"----Analyzing layout from page #{page.page_number}----")
    print(f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}")

    if page.lines:
        for line_idx, line in enumerate(page.lines):
            words = get_words(page, line)
            print(
                f"...Line # {line_idx} has word count {len(words)} and text '{line.content}' "
                f"within bounding polygon '{line.polygon}'"
            )

            for word in words:
                print(f"......Word '{word.content}' has a confidence of {word.confidence}")

    if page.selection_marks:
        for selection_mark in page.selection_marks:
            print(
                f"Selection mark is '{selection_mark.state}' within bounding polygon "
                f"'{selection_mark.polygon}' and has a confidence of {selection_mark.confidence}"
            )

if result.tables:
    for table_idx, table in enumerate(result.tables):
        print(f"Table # {table_idx} has {table.row_count} rows and " f"{table.column_count} columns")
        if table.bounding_regions:
            for region in table.bounding_regions:
                print(f"Table # {table_idx} location on page: {region.page_number} is {region.polygon}")
        for cell in table.cells:
            print(f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'")
            if cell.bounding_regions:
                for region in cell.bounding_regions:
                    print(f"...content on page {region.page_number} is within bounding polygon '{region.polygon}'")

GitHub에서 샘플 보기

"styles": [true],
"pages": [
  {
    "page_number": 1,
    "width": 1000,
    "height": 800,
    "unit": "px",
    "lines": [
      {
        "line_idx": 1,
        "content": "This",
        "polygon": [10, 20, 30, 40],
        "words": [
          {
            "content": "This",
            "confidence": 0.98
          }
        ]
      }
    ],
    "selection_marks": [
      {
        "state": "selected",
        "polygon": [50, 60, 70, 80],
        "confidence": 0.91
      }
    ]
  }
],
"tables": [
  {
    "table_idx": 1,
    "row_count": 3,
    "column_count": 4,
    "bounding_regions": [
      {
        "page_number": 1,
        "polygon": [100, 200, 300, 400]
      }
    ],
    "cells": [
      {
        "row_index": 1,
        "column_index": 1,
        "content": "Content 1",
        "bounding_regions": [
          {
            "page_number": 1,
            "polygon": [110, 210, 310, 410]
          }
        ]
      }
    ]
  }
]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=ocrHighResolution

# Analyze a document at a URL:
url = "(https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-highres.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.OCR_HIGH_RESOLUTION]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_with_highres]
if any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

for page in result.pages:
    print(f"----Analyzing layout from page #{page.page_number}----")
    print(
        f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}"
    )

    for line_idx, line in enumerate(page.lines):
        words = line.get_words()
        print(
            f"...Line # {line_idx} has word count {len(words)} and text '{line.content}' "
            f"within bounding polygon '{format_polygon(line.polygon)}'"
        )

        for word in words:
            print(
                f"......Word '{word.content}' has a confidence of {word.confidence}"
            )

    for selection_mark in page.selection_marks:
        print(
            f"Selection mark is '{selection_mark.state}' within bounding polygon "
            f"'{format_polygon(selection_mark.polygon)}' and has a confidence of {selection_mark.confidence}"
        )

for table_idx, table in enumerate(result.tables):
    print(
        f"Table # {table_idx} has {table.row_count} rows and "
        f"{table.column_count} columns"
    )
    for region in table.bounding_regions:
        print(
            f"Table # {table_idx} location on page: {region.page_number} is {format_polygon(region.polygon)}"
        )
    for cell in table.cells:
        print(
            f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'"
        )
        for region in cell.bounding_regions:
            print(
                f"...content on page {region.page_number} is within bounding polygon '{format_polygon(region.polygon)}'"
            )

GitHub에서 샘플 보기

"styles": [true],
"pages": [
  {
    "page_number": 1,
    "width": 1000,
    "height": 800,
    "unit": "px",
    "lines": [
      {
        "line_idx": 1,
        "content": "This",
        "polygon": [10, 20, 30, 40],
        "words": [
          {
            "content": "This",
            "confidence": 0.98
          }
        ]
      }
    ],
    "selection_marks": [
      {
        "state": "selected",
        "polygon": [50, 60, 70, 80],
        "confidence": 0.91
      }
    ]
  }
],
"tables": [
  {
    "table_idx": 1,
    "row_count": 3,
    "column_count": 4,
    "bounding_regions": [
      {
        "page_number": 1,
        "polygon": [100, 200, 300, 400]
      }
    ],
    "cells": [
      {
        "row_index": 1,
        "column_index": 1,
        "content": "Content 1",
        "bounding_regions": [
          {
            "page_number": 1,
            "polygon": [110, 210, 310, 410]
          }
        ]
      }
    ]
  }
]

수식 추출

ocr.formula 기능은 formulas 컬렉션의 수학적 수식과 같은 식별된 모든 수식을 content 아래의 최상위 개체로 추출합니다. content 내부에서 검색된 수식은 :formula:로 표시됩니다. 이 컬렉션의 각 항목은 수식 형식을 inline 또는 display로 포함하고 해당 polygon 좌표와 함께 LaTeX 표현을 value로 포함하는 수식을 나타냅니다. 처음에는 수식이 각 페이지의 끝에 표시됩니다.

참고 항목

confidence 점수는 하드 코딩됩니다.

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=formulas

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/layout-formulas.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.FORMULAS],  # Specify which add-on capabilities to enable
)
result: AnalyzeResult = poller.result()

# [START analyze_formulas]
for page in result.pages:
    print(f"----Formulas detected from page #{page.page_number}----")
    if page.formulas:
        inline_formulas = [f for f in page.formulas if f.kind == "inline"]
        display_formulas = [f for f in page.formulas if f.kind == "display"]

        # To learn the detailed concept of "polygon" in the following content, visit: https://aka.ms/bounding-region
        print(f"Detected {len(inline_formulas)} inline formulas.")
        for formula_idx, formula in enumerate(inline_formulas):
            print(f"- Inline #{formula_idx}: {formula.value}")
            print(f"  Confidence: {formula.confidence}")
            print(f"  Bounding regions: {formula.polygon}")

        print(f"\nDetected {len(display_formulas)} display formulas.")
        for formula_idx, formula in enumerate(display_formulas):
            print(f"- Display #{formula_idx}: {formula.value}")
            print(f"  Confidence: {formula.confidence}")
            print(f"  Bounding regions: {formula.polygon}")

GitHub에서 샘플 보기

"content": ":formula:",
 "pages": [
   {
     "pageNumber": 1,
     "formulas": [
       {
         "kind": "inline",
         "value": "\\frac { \\partial a } { \\partial b }",
         "polygon": [...],
         "span": {...},
         "confidence": 0.99
       },
       {
         "kind": "display",
         "value": "y = a \\times b + a \\times c",
         "polygon": [...],
         "span": {...},
         "confidence": 0.99
       }
     ]
   }
 ]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=formulas

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/layout-formulas.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.FORMULAS]    # Specify which add-on capabilities to enable
)
result = poller.result()

# [START analyze_formulas]
for page in result.pages:
    print(f"----Formulas detected from page #{page.page_number}----")
    inline_formulas = [f for f in page.formulas if f.kind == "inline"]
    display_formulas = [f for f in page.formulas if f.kind == "display"]

    print(f"Detected {len(inline_formulas)} inline formulas.")
    for formula_idx, formula in enumerate(inline_formulas):
        print(f"- Inline #{formula_idx}: {formula.value}")
        print(f"  Confidence: {formula.confidence}")
        print(f"  Bounding regions: {format_polygon(formula.polygon)}")

    print(f"\nDetected {len(display_formulas)} display formulas.")
    for formula_idx, formula in enumerate(display_formulas):
        print(f"- Display #{formula_idx}: {formula.value}")
        print(f"  Confidence: {formula.confidence}")
        print(f"  Bounding regions: {format_polygon(formula.polygon)}")

GitHub에서 샘플 보기

 "content": ":formula:",
   "pages": [
     {
       "pageNumber": 1,
       "formulas": [
         {
           "kind": "inline",
           "value": "\\frac { \\partial a } { \\partial b }",
           "polygon": [...],
           "span": {...},
           "confidence": 0.99
         },
         {
           "kind": "display",
           "value": "y = a \\times b + a \\times c",
           "polygon": [...],
           "span": {...},
           "confidence": 0.99
         }
       ]
     }
   ]

Font 속성 추출

ocr.font 기능은 styles 컬렉션에서 추출된 텍스트의 모든 글꼴 속성을 content 아래의 최상위 개체로 추출합니다. 각 스타일 개체는 단일 글꼴 속성, 적용되는 텍스트 범위 및 해당 신뢰도 점수를 지정합니다. 기존 스타일 속성은 텍스트의 글꼴의 경우 similarFontFamily, 기울임꼴 및 일반 스타일의 경우 fontStyle, 굵게 또는 일반 스타일의 경우 fontWeight, 텍스트 색의 경우 color, 텍스트 경계 상자의 색의 경우 backgroundColor와 같은 더 많은 글꼴 속성으로 확장됩니다.

  {your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=styleFont

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/receipt/receipt-with-tips.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.STYLE_FONT]    # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_fonts]
# DocumentStyle has the following font related attributes:
similar_font_families = defaultdict(list)  # e.g., 'Arial, sans-serif
font_styles = defaultdict(list)  # e.g, 'italic'
font_weights = defaultdict(list)  # e.g., 'bold'
font_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format
font_background_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format

if result.styles and any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")
    return

print("\n----Fonts styles detected in the document----")

# Iterate over the styles and group them by their font attributes.
for style in result.styles:
    if style.similar_font_family:
        similar_font_families[style.similar_font_family].append(style)
    if style.font_style:
        font_styles[style.font_style].append(style)
    if style.font_weight:
        font_weights[style.font_weight].append(style)
    if style.color:
        font_colors[style.color].append(style)
    if style.background_color:
        font_background_colors[style.background_color].append(style)

print(f"Detected {len(similar_font_families)} font families:")
for font_family, styles in similar_font_families.items():
    print(f"- Font family: '{font_family}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_styles)} font styles:")
for font_style, styles in font_styles.items():
    print(f"- Font style: '{font_style}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_weights)} font weights:")
for font_weight, styles in font_weights.items():
    print(f"- Font weight: '{font_weight}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_colors)} font colors:")
for font_color, styles in font_colors.items():
    print(f"- Font color: '{font_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_background_colors)} font background colors:")
for font_background_color, styles in font_background_colors.items():
    print(f"- Font background color: '{font_background_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

GitHub에서 샘플 보기

"content": "Foo bar",
"styles": [
   {
     "similarFontFamily": "Arial, sans-serif",
     "spans": [ { "offset": 0, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "similarFontFamily": "Times New Roman, serif",
     "spans": [ { "offset": 4, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "fontStyle": "italic",
     "spans": [ { "offset": 1, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "fontWeight": "bold",
     "spans": [ { "offset": 2, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "color": "#FF0000",
     "spans": [ { "offset": 4, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "backgroundColor": "#00FF00",
     "spans": [ { "offset": 5, "length": 2 } ],
     "confidence": 0.98
   }
 ]

  {your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=styleFont

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/receipt/receipt-with-tips.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.STYLE_FONT]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_fonts]
# DocumentStyle has the following font related attributes:
similar_font_families = defaultdict(list)   # e.g., 'Arial, sans-serif
font_styles = defaultdict(list)             # e.g, 'italic'
font_weights = defaultdict(list)            # e.g., 'bold'
font_colors = defaultdict(list)             # in '#rrggbb' hexadecimal format
font_background_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format

if any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

print("\n----Fonts styles detected in the document----")

# Iterate over the styles and group them by their font attributes.
for style in result.styles:
    if style.similar_font_family:
        similar_font_families[style.similar_font_family].append(style)
    if style.font_style:
        font_styles[style.font_style].append(style)
    if style.font_weight:
        font_weights[style.font_weight].append(style)
    if style.color:
        font_colors[style.color].append(style)
    if style.background_color:
        font_background_colors[style.background_color].append(style)

print(f"Detected {len(similar_font_families)} font families:")
for font_family, styles in similar_font_families.items():
    print(f"- Font family: '{font_family}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_styles)} font styles:")
for font_style, styles in font_styles.items():
    print(f"- Font style: '{font_style}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_weights)} font weights:")
for font_weight, styles in font_weights.items():
    print(f"- Font weight: '{font_weight}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_colors)} font colors:")
for font_color, styles in font_colors.items():
    print(f"- Font color: '{font_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_background_colors)} font background colors:")
for font_background_color, styles in font_background_colors.items():
    print(f"- Font background color: '{font_background_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

GitHub에서 샘플 보기

"content": "Foo bar",
"styles": [
   {
     "similarFontFamily": "Arial, sans-serif",
     "spans": [ { "offset": 0, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "similarFontFamily": "Times New Roman, serif",
     "spans": [ { "offset": 4, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "fontStyle": "italic",
     "spans": [ { "offset": 1, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "fontWeight": "bold",
     "spans": [ { "offset": 2, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "color": "#FF0000",
     "spans": [ { "offset": 4, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "backgroundColor": "#00FF00",
     "spans": [ { "offset": 5, "length": 2 } ],
     "confidence": 0.98
   }
 ]

바코드 속성 추출

ocr.barcode 기능은 barcodes 컬렉션에서 식별된 모든 바코드를 content 아래의 최상위 개체로 추출합니다. content 내에서 검색된 바코드는 :barcode:로 표시됩니다. 이 컬렉션의 각 항목은 바코드를 나타내며 바코드 형식을 kind로, 포함된 바코드 콘텐츠를 value와 polygon 좌표로 포함합니다. 처음에는 바코드가 각 페이지의 끝에 표시됩니다. confidence는 1로 하드 코딩됩니다.

지원되는 바코드 유형

바코드 유형	예제
`QR Code`
`Code 39`
`Code 93`
`Code 128`
`UPC (UPC-A & UPC-E)`
`PDF417`
`EAN-8`
`EAN-13`
`Codabar`
`Databar`
`Databar` Expanded
`ITF`
`Data Matrix`

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=barcodes

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-barcodes.jpg?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-read",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.BARCODES]    # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_barcodes]
# Iterate over extracted barcodes on each page.
for page in result.pages:
    print(f"----Barcodes detected from page #{page.page_number}----")
    if page.barcodes:
        print(f"Detected {len(page.barcodes)} barcodes:")
        for barcode_idx, barcode in enumerate(page.barcodes):
            print(f"- Barcode #{barcode_idx}: {barcode.value}")
            print(f"  Kind: {barcode.kind}")
            print(f"  Confidence: {barcode.confidence}")
            print(f"  Bounding regions: {barcode.polygon}")

GitHub에서 샘플 보기

----Barcodes detected from page #1----
Detected 2 barcodes:
- Barcode #0: 123456
  Kind: QRCode
  Confidence: 0.95
  Bounding regions: [10.5, 20.5, 30.5, 40.5]
- Barcode #1: 789012
  Kind: QRCode
  Confidence: 0.98
  Bounding regions: [50.5, 60.5, 70.5, 80.5]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=barcodes

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-barcodes.jpg?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.BARCODES]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_barcodes]
# Iterate over extracted barcodes on each page.
for page in result.pages:
    print(f"----Barcodes detected from page #{page.page_number}----")
    print(f"Detected {len(page.barcodes)} barcodes:")
    for barcode_idx, barcode in enumerate(page.barcodes):
        print(f"- Barcode #{barcode_idx}: {barcode.value}")
        print(f"  Kind: {barcode.kind}")
        print(f"  Confidence: {barcode.confidence}")
        print(f"  Bounding regions: {format_polygon(barcode.polygon)}")

GitHub에서 샘플 보기

----Barcodes detected from page #1----
Detected 2 barcodes:
- Barcode #0: 123456
  Kind: QRCode
  Confidence: 0.95
  Bounding regions: [10.5, 20.5, 30.5, 40.5]
- Barcode #1: 789012
  Kind: QRCode
  Confidence: 0.98
  Bounding regions: [50.5, 60.5, 70.5, 80.5]

언어 감지

languages 기능을 analyzeResult 요청에 추가하면 analyzeResult의 languages 컬렉션에서 confidence와 함께 텍스트 줄마다 검색된 기본 언어가 예측됩니다.

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=languages

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-fonts_and_languages.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.LANGUAGES]     # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_languages]
print("----Languages detected in the document----")
if result.languages:
    print(f"Detected {len(result.languages)} languages:")
    for lang_idx, lang in enumerate(result.languages):
        print(f"- Language #{lang_idx}: locale '{lang.locale}'")
        print(f"  Confidence: {lang.confidence}")
        print(
            f"  Text: '{','.join([result.content[span.offset : span.offset + span.length] for span in lang.spans])}'"
        )

GitHub에서 샘플 보기

"languages": [
    {
        "spans": [
            {
                "offset": 0,
                "length": 131
            }
        ],
        "locale": "en",
        "confidence": 0.7
    },
]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=languages

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-fonts_and_languages.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.LANGUAGES]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_languages]
print("----Languages detected in the document----")
print(f"Detected {len(result.languages)} languages:")
for lang_idx, lang in enumerate(result.languages):
    print(f"- Language #{lang_idx}: locale '{lang.locale}'")
    print(f"  Confidence: {lang.confidence}")
    print(f"  Text: '{','.join([result.content[span.offset : span.offset + span.length] for span in lang.spans])}'")

GitHub에서 샘플 보기

"languages": [
    {
        "spans": [
            {
                "offset": 0,
                "length": 131
            }
        ],
        "locale": "en",
        "confidence": 0.7
    },
]

검색 가능한 PDF

검색 가능한 PDF 기능을 사용하면 검사한 이미지 PDF 파일 등의 아날로그 PDF를 텍스트가 포함된 PDF로 변환할 수 있습니다. 포함된 텍스트를 사용하면 검색된 텍스트 엔터티를 이미지 파일 위에 중첩하여 PDF에서 추출된 콘텐츠 내에서 심층적인 텍스트 검색이 가능합니다.

Important

현재 검색 가능한 PDF 기능은 읽기 OCR 모델 prebuilt-read에서만 지원됩니다. 이 기능을 사용할 때는 다음을 modelId 지정 prebuilt-read하세요.
검색 가능한 PDF는 일반 PDF 사용에 대한 사용 비용 없이 2024-11-30(GA) prebuilt-read 모델에 포함되어 있습니다.

검색 가능한 PDF 사용

검색 가능한 PDF를 사용하려면 Analyze 작업을 사용하여 POST 요청을 만들고 출력 형식을 pdf로 지정합니다.


POST /documentModels/prebuilt-read:analyze?output=pdf
{...}
202

Analyze 작업이 완료되면 GET 요청을 수행하여 Analyze 작업 결과를 검색합니다.

성공적으로 완료되면 PDF를 검색하여 application/pdf로 다운로드할 수 있습니다. 이 작업을 사용하면 Base64로 인코딩된 JSON 대신 PDF의 포함된 텍스트 형식을 직접 다운로드할 수 있습니다.


// Monitor the operation until completion.
GET /documentModels/prebuilt-read/analyzeResults/{resultId}
200
{...}

// Upon successful completion, retrieve the PDF as application/pdf.
GET /documentModels/prebuilt-read/analyzeResults/{resultId}/pdf
200 OK
Content-Type: application/pdf

키-값 쌍

이전 API 버전 prebuilt-document 에서 모델은 양식 및 문서에서 키-값 쌍을 추출했습니다. keyValuePairs 기능이 미리 빌드된 레이아웃에 추가되면서 이제 레이아웃 모델에서 같은 결과를 생성합니다.

키-값 쌍은 레이블 또는 키 및 이와 관련된 응답 또는 값을 식별하는 문서 내의 특정 범위입니다. 구조화된 양식에서 이러한 쌍은 레이블 및 사용자가 해당 필드에 입력한 값일 수 있습니다. 구조화되지 않은 문서에서는 단락의 텍스트를 기준으로 계약이 실행된 날짜일 수 있습니다. AI 모델은 다양한 문서 유형, 형식 및 구조를 기반으로 식별 가능한 키와 값을 추출하도록 학습되었습니다.

모델이 연결된 값이 없는 키가 존재하는 것을 감지하거나 선택적 필드를 처리할 때는 키가 격리되어 있을 수도 있습니다. 예를 들어 경우에 따라 양식에서 중간 이름 필드를 비워 둘 수 있습니다. 키-값 쌍은 문서에 포함된 텍스트 범위입니다. 동일한 값이 다른 방식으로 설명되는 문서(예: 고객/사용자)가 있는 경우 연관된 키는 컨텍스트에 따라 고객 또는 사용자가 됩니다.

REST API

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=keyValuePairs

쿼리 필드

쿼리 필드는 미리 빌드된 모델에서 추출된 스키마를 확장하거나 키 이름이 변수일 때 특정 키 이름을 정의하는 추가 기능입니다. 쿼리 필드를 사용하려면 기능을 queryFields로 설정하고 queryFields 속성에 쉼표로 구분된 필드 이름 목록을 제공합니다.

이제 문서 인텔리전스에서 쿼리 필드 추출을 지원합니다. 쿼리 필드 추출을 사용하면 추가 학습 없이도 쿼리 요청을 사용하여 추출 프로세스에 필드를 추가할 수 있습니다.
미리 빌드된 또는 사용자 지정 모델의 스키마를 확장하거나 레이아웃의 출력으로 몇 가지 필드를 추출해야 하는 경우에 쿼리 필드를 사용합니다.
쿼리 필드는 프리미엄 추가 기능입니다. 최상의 결과를 위해 여러 단어 필드 이름에 카멜 표기법 또는 파스칼 표기법 필드 이름을 사용하여 추출하려는 필드를 정의합니다.
쿼리 필드는 필드를 요청당 최대 20개까지 지원합니다. 문서에 필드 값이 포함되어 있으면 필드와 값이 반환됩니다.
이 릴리스에는 이전 구현보다 가격이 저렴하고 유효성을 검사해야 하는 쿼리 필드 기능이 새롭게 구현되어 있습니다.

참고 항목

Document Intelligence Studio 쿼리 필드 추출은 현재 레이아웃 및 미리 빌드된 모델 2024-11-30 (GA) API with the exception of the 미국 세금 모델(W2, 1098 및 1099s 모델)에서 사용할 수 있습니다.

쿼리 필드 추출

쿼리 필드 추출을 위해 추출하려는 필드를 지정하면 문서 인텔리전스가 그에 따라 문서를 분석합니다. 예를 들면 다음과 같습니다.

Document Intelligence Studio에서 계약을 처리하는 경우 2024-11-30(GA) 버전을 사용합니다.
analyze document 요청의 일부로 Party1, Party2, TermsOfUse, PaymentTerms, PaymentDate, TermEndDate와(과) 같은 필드 레이블 목록을 전달할 수 있습니다.
문서 인텔리전스는 필드 데이터를 분석 및 추출하고 구조화된 JSON 출력 값을 반환할 수 있습니다.
쿼리 필드 외에도 응답에는 텍스트, 테이블, 선택 표시 및 기타 관련 데이터가 포함됩니다.

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=queryFields&queryFields=TERMS

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/invoice/simple-invoice.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.QUERY_FIELDS],    # Specify which add-on capabilities to enable.
    query_fields=["Address", "InvoiceNumber"],  # Set the features and provide a comma-separated list of field names.
)
result: AnalyzeResult = poller.result()
print("Here are extra fields in result:\n")
if result.documents:
    for doc in result.documents:
        if doc.fields and doc.fields["Address"]:
            print(f"Address: {doc.fields['Address'].value_string}")
        if doc.fields and doc.fields["InvoiceNumber"]:
            print(f"Invoice number: {doc.fields['InvoiceNumber'].value_string}")

GitHub에서 샘플 보기

Address: 1 Redmond way Suite 6000 Redmond, WA Sunnayvale, 99243
Invoice number: 34278587

다음 단계

자세히 알아보기: 모델 읽기레이아웃 모델

SDK 샘플: python

더 많은 샘플 찾기: 추가 기능

다음을 통해 공유

문서 인텔리전스 추가 기능

기능

버전 가용성

지원되는 파일 형식

고해상도 추출

수식 추출

Font 속성 추출

바코드 속성 추출

지원되는 바코드 유형

언어 감지

검색 가능한 PDF

검색 가능한 PDF 사용

키-값 쌍

REST API

쿼리 필드

쿼리 필드 추출

다음 단계

피드백

추가 리소스