Analyze - Image
Analyze the input image. The request either contains image stream with any content type ['image/*', 'application/octet-stream'], or a JSON payload which includes an url property to be used to retrieve the image stream.
POST /imageanalysis:analyze?api-version=2024-02-01
POST /imageanalysis:analyze?features={features}&language={language}&model-version={model-version}&smartcrops-aspect-ratios={smartcrops-aspect-ratios}&gender-neutral-caption={gender-neutral-caption}&api-version=2024-02-01
URI Parameters
Name | In | Required | Type | Description |
---|---|---|---|---|
api-version
|
query | True |
string |
Requested API version. |
features
|
query |
The visual features requested. At least one visual feature must be specified. |
||
gender-neutral-caption
|
query |
boolean |
Boolean flag for enabling gender-neutral captioning for caption and denseCaptions features. If this parameter is not specified, the default value is "false". |
|
language
|
query |
string |
The desired language for output generation. If this parameter is not specified, the default value is "en". See https://aka.ms/cv-languages for a list of supported languages. |
|
model-version
|
query |
string |
Model version. |
|
smartcrops-aspect-ratios
|
query |
array[] |
A list of aspect ratios to use for smartCrops feature. Aspect ratios are calculated by dividing the target crop width by the height. Supported values are between 0.75 and 1.8 (inclusive). Multiple values should be comma-separated. If this parameter is not specified, the service will return one crop suggestion with an aspect ratio it sees fit between 0.5 and 2.0 (inclusive). |
Request Body
Name | Required | Type | Description |
---|---|---|---|
url | True |
string |
Publicly reachable URL of an image. |
Responses
Name | Type | Description |
---|---|---|
200 OK |
Success |
|
Other Status Codes |
Error Headers x-ms-error-code: string |
Examples
Image |
Image |
ImageAnalysis_Analyze_MaximumSet_Gen
Sample request
POST /imageanalysis:analyze?features=tags&language=hduryxtlvjjvwnmpjiojibvjy&model-version=kkblitshktun&smartcrops-aspect-ratios=&gender-neutral-caption=True&api-version=2024-02-01
{
"url": "https://microsoft.com/a"
}
Sample response
{
"captionResult": {
"text": "azcggjzjuvbytsq",
"confidence": 0
},
"objectsResult": {
"values": [
{
"id": "iaofvdltgfjrsffgltupmo",
"boundingBox": {
"x": 0,
"y": 0,
"w": 27,
"h": 13
},
"tags": [
{
"name": "expoctetvqe",
"confidence": 0
}
]
}
]
},
"readResult": {
"blocks": [
{
"lines": [
{
"text": "npk",
"boundingPolygon": [
{
"x": 0,
"y": 0
},
{
"x": 0,
"y": 0
},
{
"x": 0,
"y": 0
},
{
"x": 0,
"y": 0
}
],
"words": [
{
"text": "wljuxeeadklupdpxgcinka",
"boundingPolygon": [
{
"x": 0,
"y": 0
},
{
"x": 0,
"y": 0
},
{
"x": 0,
"y": 0
},
{
"x": 0,
"y": 0
}
],
"confidence": 0
}
]
}
]
}
]
},
"denseCaptionsResult": {
"values": [
{
"text": "pqrcyrtz",
"confidence": 0,
"boundingBox": {
"x": 0,
"y": 0,
"w": 27,
"h": 13
}
}
]
},
"modelVersion": "hslbdtpcuyabri",
"metadata": {
"width": 10,
"height": 27
},
"tagsResult": {
"values": [
{
"name": "expoctetvqe",
"confidence": 0
}
]
},
"smartCropsResult": {
"values": [
{
"aspectRatio": 23,
"boundingBox": {
"x": 0,
"y": 0,
"w": 27,
"h": 13
}
}
]
},
"peopleResult": {
"values": [
{
"boundingBox": {
"x": 0,
"y": 0,
"w": 27,
"h": 13
},
"confidence": 0
}
]
}
}
ImageAnalysis_Analyze_MinimumSet_Gen
Sample request
POST /imageanalysis:analyze?api-version=2024-02-01
{
"url": "https://www.abc.com"
}
Sample response
{
"modelVersion": "cvhbhwpfswz",
"metadata": {
"width": 10,
"height": 23
}
}
Definitions
Name | Description |
---|---|
Bounding |
A bounding box for an area inside an image. |
Caption |
A brief description of what the image depicts. |
Content |
An entity observation in the image, along with the confidence score. |
Crop |
A region identified for smart cropping. There will be one region returned for each requested aspect ratio. |
Dense |
A brief description of what the image depicts. |
Dense |
A list of captions. |
Detected |
Describes a detected object in an image. |
Detected |
A person detected in an image. |
Detected |
A detected text block. |
Detected |
A detected text line. |
Detected |
A detected word consisting of a contiguous sequence of characters. For non-space delimited languages, such as Chinese, Japanese, and Korean, each character is represented as its own word. |
Error |
Response returned when an error occurs. |
Error |
Error info. |
Error |
Detailed error. |
Image |
Describe the combined results of different types of image analysis. |
Image |
The image metadata information such as height and width. |
Image |
An object representing a point in the image. |
Image |
A JSON document with a URL pointing to the publicly accessible image to be analyzed. |
Objects |
Describes detected objects in an image. |
People |
An object describing whether the image contains people. |
Read |
The results of an Read operation. |
Smart |
Smart cropping result. |
Tags |
A list of tags with confidence level. |
Visual |
The visual features requested. At least one visual feature must be specified. |
BoundingBox
A bounding box for an area inside an image.
Name | Type | Description |
---|---|---|
h |
integer |
Height measured from the top-left point of the area, in pixels. |
w |
integer |
Width measured from the top-left point of the area, in pixels. |
x |
integer |
Left-coordinate of the top left point of the area, in pixels. |
y |
integer |
Top-coordinate of the top left point of the area, in pixels. |
CaptionResult
A brief description of what the image depicts.
Name | Type | Description |
---|---|---|
confidence |
number |
The level of confidence the service has in the caption. Confidence scores span the range of 0.0 to 1.0 (inclusive), with higher values indicating a higher confidence of a match. |
text |
string |
The text of the caption. |
ContentTag
An entity observation in the image, along with the confidence score.
Name | Type | Description |
---|---|---|
confidence |
number |
The level of confidence that the entity was observed. Confidence scores span the range of 0.0 to 1.0 (inclusive), with higher values indicating a higher confidence of a match. |
name |
string |
Name of the entity. |
CropRegion
A region identified for smart cropping. There will be one region returned for each requested aspect ratio.
Name | Type | Description |
---|---|---|
aspectRatio |
number |
The aspect ratio of the crop region. |
boundingBox |
A bounding box for an area inside an image. |
DenseCaption
A brief description of what the image depicts.
Name | Type | Description |
---|---|---|
boundingBox |
A bounding box for an area inside an image. |
|
confidence |
number |
The level of confidence the service has in the caption. Confidence scores span the range of 0.0 to 1.0 (inclusive), with higher values indicating a higher confidence of a match. |
text |
string |
The text of the caption. |
DenseCaptionsResult
A list of captions.
Name | Type | Description |
---|---|---|
values |
A list of captions. |
DetectedObject
Describes a detected object in an image.
Name | Type | Description |
---|---|---|
boundingBox |
A bounding box for an area inside an image. |
|
id |
string |
Id of the detected object. |
tags |
Classification confidences of the detected object. |
DetectedPerson
A person detected in an image.
Name | Type | Description |
---|---|---|
boundingBox |
A bounding box for an area inside an image. |
|
confidence |
number |
Confidence score of having observed the person in the image. Confidence scores span the range of 0.0 to 1.0 (inclusive), with higher values indicating a higher confidence of a match. |
DetectedTextBlock
A detected text block.
Name | Type | Description |
---|---|---|
lines |
List of text lines in the text block. |
DetectedTextLine
A detected text line.
Name | Type | Description |
---|---|---|
boundingPolygon |
Bounding polygon of the text line. |
|
text |
string |
Text content of the detected text line. |
words |
List of words in the text line. |
DetectedTextWord
A detected word consisting of a contiguous sequence of characters. For non-space delimited languages, such as Chinese, Japanese, and Korean, each character is represented as its own word.
Name | Type | Description |
---|---|---|
boundingPolygon |
Bounding polygon of the word. |
|
confidence |
number |
The level of confidence that the word was detected. Confidence scores span the range of 0.0 to 1.0 (inclusive), with higher values indicating a higher confidence of a match. |
text |
string |
Text content of the word. |
ErrorResponse
Response returned when an error occurs.
Name | Type | Description |
---|---|---|
error |
Error info. |
ErrorResponseDetails
Error info.
Name | Type | Description |
---|---|---|
code |
string |
Error code. |
details |
List of detailed errors. |
|
innererror |
Detailed error. |
|
message |
string |
Error message. |
target |
string |
Target of the error. |
ErrorResponseInnerError
Detailed error.
Name | Type | Description |
---|---|---|
code |
string |
Error code. |
innererror |
Detailed error. |
|
message |
string |
Error message. |
ImageAnalysisResult
Describe the combined results of different types of image analysis.
Name | Type | Description |
---|---|---|
captionResult |
A brief description of what the image depicts. |
|
denseCaptionsResult |
A list of captions. |
|
metadata |
The image metadata information such as height and width. |
|
modelVersion |
string |
Model Version. |
objectsResult |
Describes detected objects in an image. |
|
peopleResult |
An object describing whether the image contains people. |
|
readResult |
The results of an Read operation. |
|
smartCropsResult |
Smart cropping result. |
|
tagsResult |
A list of tags with confidence level. |
ImageMetadata
The image metadata information such as height and width.
Name | Type | Description |
---|---|---|
height |
integer |
The height of the image in pixels. |
width |
integer |
The width of the image in pixels. |
ImagePoint
An object representing a point in the image.
Name | Type | Description |
---|---|---|
x |
integer |
The x-coordinate of this point. |
y |
integer |
The y-coordinate of this point. |
ImageUrl
A JSON document with a URL pointing to the publicly accessible image to be analyzed.
Name | Type | Description |
---|---|---|
url |
string |
Publicly reachable URL of an image. |
ObjectsResult
Describes detected objects in an image.
Name | Type | Description |
---|---|---|
values |
An array of detected objects. |
PeopleResult
An object describing whether the image contains people.
Name | Type | Description |
---|---|---|
values |
An array of detected people. |
ReadResult
The results of an Read operation.
Name | Type | Description |
---|---|---|
blocks |
A list of text blocks. |
SmartCropsResult
Smart cropping result.
Name | Type | Description |
---|---|---|
values |
Recommended regions for cropping the image. |
TagsResult
A list of tags with confidence level.
Name | Type | Description |
---|---|---|
values |
A list of tags with confidence level. |
VisualFeature
The visual features requested. At least one visual feature must be specified.
Name | Type | Description |
---|---|---|
caption |
string |
A description or a caption summarizing the content of the image. |
denseCaptions |
string |
Detailed captions providing in-depth descriptions of the image content. |
objects |
string |
Specific objects recognized and labeled in the image. |
people |
string |
Detection and analysis of people in the image. |
read |
string |
Textual content extracted from the image, such as signs or labels. |
smartCrops |
string |
Automatically generated cropped versions of the image focusing on important content. |
tags |
string |
Visual tags representing objects detected in the image. |