Analyze - Image

Analyze the input image. The request either contains image stream with any content type ['image/*', 'application/octet-stream'], or a JSON payload which includes an url property to be used to retrieve the image stream.

POST /imageanalysis:analyze?api-version=2024-02-01
POST /imageanalysis:analyze?features={features}&language={language}&model-version={model-version}&smartcrops-aspect-ratios={smartcrops-aspect-ratios}&gender-neutral-caption={gender-neutral-caption}&api-version=2024-02-01

URI Parameters

Name In Required Type Description
api-version
query True

string

Requested API version.

features
query

VisualFeature[]

The visual features requested. At least one visual feature must be specified.

gender-neutral-caption
query

boolean

Boolean flag for enabling gender-neutral captioning for caption and denseCaptions features. If this parameter is not specified, the default value is "false".

language
query

string

The desired language for output generation. If this parameter is not specified, the default value is "en". See https://aka.ms/cv-languages for a list of supported languages.

model-version
query

string

Model version.

smartcrops-aspect-ratios
query

array[]

A list of aspect ratios to use for smartCrops feature. Aspect ratios are calculated by dividing the target crop width by the height. Supported values are between 0.75 and 1.8 (inclusive). Multiple values should be comma-separated. If this parameter is not specified, the service will return one crop suggestion with an aspect ratio it sees fit between 0.5 and 2.0 (inclusive).

Request Body

Name Required Type Description
url True

string

Publicly reachable URL of an image.

Responses

Name Type Description
200 OK

ImageAnalysisResult

Success

Other Status Codes

ErrorResponse

Error

Headers

x-ms-error-code: string

Examples

ImageAnalysis_Analyze_MaximumSet_Gen
ImageAnalysis_Analyze_MinimumSet_Gen

ImageAnalysis_Analyze_MaximumSet_Gen

Sample request

POST /imageanalysis:analyze?features=tags&language=hduryxtlvjjvwnmpjiojibvjy&model-version=kkblitshktun&smartcrops-aspect-ratios=&gender-neutral-caption=True&api-version=2024-02-01

{
  "url": "https://microsoft.com/a"
}

Sample response

{
  "captionResult": {
    "text": "azcggjzjuvbytsq",
    "confidence": 0
  },
  "objectsResult": {
    "values": [
      {
        "id": "iaofvdltgfjrsffgltupmo",
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 27,
          "h": 13
        },
        "tags": [
          {
            "name": "expoctetvqe",
            "confidence": 0
          }
        ]
      }
    ]
  },
  "readResult": {
    "blocks": [
      {
        "lines": [
          {
            "text": "npk",
            "boundingPolygon": [
              {
                "x": 0,
                "y": 0
              },
              {
                "x": 0,
                "y": 0
              },
              {
                "x": 0,
                "y": 0
              },
              {
                "x": 0,
                "y": 0
              }
            ],
            "words": [
              {
                "text": "wljuxeeadklupdpxgcinka",
                "boundingPolygon": [
                  {
                    "x": 0,
                    "y": 0
                  },
                  {
                    "x": 0,
                    "y": 0
                  },
                  {
                    "x": 0,
                    "y": 0
                  },
                  {
                    "x": 0,
                    "y": 0
                  }
                ],
                "confidence": 0
              }
            ]
          }
        ]
      }
    ]
  },
  "denseCaptionsResult": {
    "values": [
      {
        "text": "pqrcyrtz",
        "confidence": 0,
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 27,
          "h": 13
        }
      }
    ]
  },
  "modelVersion": "hslbdtpcuyabri",
  "metadata": {
    "width": 10,
    "height": 27
  },
  "tagsResult": {
    "values": [
      {
        "name": "expoctetvqe",
        "confidence": 0
      }
    ]
  },
  "smartCropsResult": {
    "values": [
      {
        "aspectRatio": 23,
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 27,
          "h": 13
        }
      }
    ]
  },
  "peopleResult": {
    "values": [
      {
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 27,
          "h": 13
        },
        "confidence": 0
      }
    ]
  }
}

ImageAnalysis_Analyze_MinimumSet_Gen

Sample request

POST /imageanalysis:analyze?api-version=2024-02-01

{
  "url": "https://www.abc.com"
}

Sample response

{
  "modelVersion": "cvhbhwpfswz",
  "metadata": {
    "width": 10,
    "height": 23
  }
}

Definitions

Name Description
BoundingBox

A bounding box for an area inside an image.

CaptionResult

A brief description of what the image depicts.

ContentTag

An entity observation in the image, along with the confidence score.

CropRegion

A region identified for smart cropping. There will be one region returned for each requested aspect ratio.

DenseCaption

A brief description of what the image depicts.

DenseCaptionsResult

A list of captions.

DetectedObject

Describes a detected object in an image.

DetectedPerson

A person detected in an image.

DetectedTextBlock

A detected text block.

DetectedTextLine

A detected text line.

DetectedTextWord

A detected word consisting of a contiguous sequence of characters. For non-space delimited languages, such as Chinese, Japanese, and Korean, each character is represented as its own word.

ErrorResponse

Response returned when an error occurs.

ErrorResponseDetails

Error info.

ErrorResponseInnerError

Detailed error.

ImageAnalysisResult

Describe the combined results of different types of image analysis.

ImageMetadata

The image metadata information such as height and width.

ImagePoint

An object representing a point in the image.

ImageUrl

A JSON document with a URL pointing to the publicly accessible image to be analyzed.

ObjectsResult

Describes detected objects in an image.

PeopleResult

An object describing whether the image contains people.

ReadResult

The results of an Read operation.

SmartCropsResult

Smart cropping result.

TagsResult

A list of tags with confidence level.

VisualFeature

The visual features requested. At least one visual feature must be specified.

BoundingBox

A bounding box for an area inside an image.

Name Type Description
h

integer

Height measured from the top-left point of the area, in pixels.

w

integer

Width measured from the top-left point of the area, in pixels.

x

integer

Left-coordinate of the top left point of the area, in pixels.

y

integer

Top-coordinate of the top left point of the area, in pixels.

CaptionResult

A brief description of what the image depicts.

Name Type Description
confidence

number

The level of confidence the service has in the caption. Confidence scores span the range of 0.0 to 1.0 (inclusive), with higher values indicating a higher confidence of a match.

text

string

The text of the caption.

ContentTag

An entity observation in the image, along with the confidence score.

Name Type Description
confidence

number

The level of confidence that the entity was observed. Confidence scores span the range of 0.0 to 1.0 (inclusive), with higher values indicating a higher confidence of a match.

name

string

Name of the entity.

CropRegion

A region identified for smart cropping. There will be one region returned for each requested aspect ratio.

Name Type Description
aspectRatio

number

The aspect ratio of the crop region.

boundingBox

BoundingBox

A bounding box for an area inside an image.

DenseCaption

A brief description of what the image depicts.

Name Type Description
boundingBox

BoundingBox

A bounding box for an area inside an image.

confidence

number

The level of confidence the service has in the caption. Confidence scores span the range of 0.0 to 1.0 (inclusive), with higher values indicating a higher confidence of a match.

text

string

The text of the caption.

DenseCaptionsResult

A list of captions.

Name Type Description
values

DenseCaption[]

A list of captions.

DetectedObject

Describes a detected object in an image.

Name Type Description
boundingBox

BoundingBox

A bounding box for an area inside an image.

id

string

Id of the detected object.

tags

ContentTag[]

Classification confidences of the detected object.

DetectedPerson

A person detected in an image.

Name Type Description
boundingBox

BoundingBox

A bounding box for an area inside an image.

confidence

number

Confidence score of having observed the person in the image. Confidence scores span the range of 0.0 to 1.0 (inclusive), with higher values indicating a higher confidence of a match.

DetectedTextBlock

A detected text block.

Name Type Description
lines

DetectedTextLine[]

List of text lines in the text block.

DetectedTextLine

A detected text line.

Name Type Description
boundingPolygon

ImagePoint[]

Bounding polygon of the text line.

text

string

Text content of the detected text line.

words

DetectedTextWord[]

List of words in the text line.

DetectedTextWord

A detected word consisting of a contiguous sequence of characters. For non-space delimited languages, such as Chinese, Japanese, and Korean, each character is represented as its own word.

Name Type Description
boundingPolygon

ImagePoint[]

Bounding polygon of the word.

confidence

number

The level of confidence that the word was detected. Confidence scores span the range of 0.0 to 1.0 (inclusive), with higher values indicating a higher confidence of a match.

text

string

Text content of the word.

ErrorResponse

Response returned when an error occurs.

Name Type Description
error

ErrorResponseDetails

Error info.

ErrorResponseDetails

Error info.

Name Type Description
code

string

Error code.

details

ErrorResponseDetails[]

List of detailed errors.

innererror

ErrorResponseInnerError

Detailed error.

message

string

Error message.

target

string

Target of the error.

ErrorResponseInnerError

Detailed error.

Name Type Description
code

string

Error code.

innererror

ErrorResponseInnerError

Detailed error.

message

string

Error message.

ImageAnalysisResult

Describe the combined results of different types of image analysis.

Name Type Description
captionResult

CaptionResult

A brief description of what the image depicts.

denseCaptionsResult

DenseCaptionsResult

A list of captions.

metadata

ImageMetadata

The image metadata information such as height and width.

modelVersion

string

Model Version.

objectsResult

ObjectsResult

Describes detected objects in an image.

peopleResult

PeopleResult

An object describing whether the image contains people.

readResult

ReadResult

The results of an Read operation.

smartCropsResult

SmartCropsResult

Smart cropping result.

tagsResult

TagsResult

A list of tags with confidence level.

ImageMetadata

The image metadata information such as height and width.

Name Type Description
height

integer

The height of the image in pixels.

width

integer

The width of the image in pixels.

ImagePoint

An object representing a point in the image.

Name Type Description
x

integer

The x-coordinate of this point.

y

integer

The y-coordinate of this point.

ImageUrl

A JSON document with a URL pointing to the publicly accessible image to be analyzed.

Name Type Description
url

string

Publicly reachable URL of an image.

ObjectsResult

Describes detected objects in an image.

Name Type Description
values

DetectedObject[]

An array of detected objects.

PeopleResult

An object describing whether the image contains people.

Name Type Description
values

DetectedPerson[]

An array of detected people.

ReadResult

The results of an Read operation.

Name Type Description
blocks

DetectedTextBlock[]

A list of text blocks.

SmartCropsResult

Smart cropping result.

Name Type Description
values

CropRegion[]

Recommended regions for cropping the image.

TagsResult

A list of tags with confidence level.

Name Type Description
values

ContentTag[]

A list of tags with confidence level.

VisualFeature

The visual features requested. At least one visual feature must be specified.

Name Type Description
caption

string

A description or a caption summarizing the content of the image.

denseCaptions

string

Detailed captions providing in-depth descriptions of the image content.

objects

string

Specific objects recognized and labeled in the image.

people

string

Detection and analysis of people in the image.

read

string

Textual content extracted from the image, such as signs or labels.

smartCrops

string

Automatically generated cropped versions of the image focusing on important content.

tags

string

Visual tags representing objects detected in the image.