Scene, shot, keyframe detection insight overview
Scene, shot, keyframe detection
Scene detection detects when a scene changes in a video based on visual cues.
A scene depicts a single event and is composed of a series of shots, which are related.
Shots are a series of frames distinguished by visual cues such as abrupt and gradual transitions in color scheme of adjacent frames. The shot's metadata includes start and end time, as well as a list of keyframes included in the shot.
A keyframe is a frame from a shot that best represents a shot.
Scene, shot and keyframe detection use cases
- Easily browse, manage, and edit your video content based on varying granularities.
- Use editorial shot type detection for editing videos into clips, trailers, or when searching for a specific style of keyframe.
Scene detection
Azure AI Video Indexer determines when a scene changes in video based on visual cues. A scene depicts a single event and it is composed of a series of consecutive shots, which are semantically related.
A scene thumbnail is the first keyframe of its underlying shot.
Azure AI Video Indexer segments a video into scenes based on color coherence across consecutive shots and retrieves the beginning and end time of each scene.
Videos must contain at least three scenes.
Shot detection
Azure AI Video Indexer determines when a shot changes in the video based on visual cues, by detecting both abrupt and gradual transitions in the color scheme and other visual feature of adjacent frames. The shot's metadata includes a start and end time, as well as the list of keyframes included in that shot. The shots are consecutive frames taken from the same camera at the same time.
Note
There might be a gap between shots which includes frames that are part of the transition. Thereofre, these frames are not considered part of the shot.
Keyframe editorial shot type detection
The shot type is determined based on analysis of the first keyframe of each shot. Shots are identified by the scale, size, and location of the faces appearing in their first keyframe.
The shot size and scale are determined based on the distance between the camera and the faces appearing in the frame. Using these properties, Azure AI Video Indexer detects the following shot types:
- Wide: shows an entire person’s body.
- Medium: shows a person's upper-body and face.
- Close up: mainly shows a person’s face.
- Extreme close-up: shows a person’s face filling the screen.
Shot types can also be determined by location of the subject characters with respect to the center of the frame. This property defines the following shot types in Azure AI Video Indexer:
- Left face: a person appears in the left side of the frame.
- Center face: a person appears in the central region of the frame.
- Right face: a person appears in the right side of the frame.
- Outdoor: a person appears in an outdoor setting.
- Indoor: a person appears in an indoor setting.
Additional characteristics:
- Two shots: shows two persons’ faces of medium size.
- Multiple faces: more than two persons.
View the insight JSON with the web portal
Once you have uploaded and indexed a video, insights are available in JSON format for download using the web portal.
- Select the Library tab.
- Select media you want to work with.
- Select Download and the Insights (JSON). The JSON file opens in a new browser tab.
- Look for the key pair described in the example response.
Use the API
- Use the Get Video Index request. We recommend passing
&includeSummarizedInsights=false
. - Look for the key pairs described in the example response.
Example response
"scenes": [
{
"id": 1,
"instances": [
{
"adjustedStart": "0:00:00",
"adjustedEnd": "0:00:09.1333333",
"start": "0:00:00",
"end": "0:00:09.1333333"
}
]
},
{
"id": 2,
"instances": [
{
"adjustedStart": "0:00:09.1333333",
"adjustedEnd": "0:00:10.8",
"start": "0:00:09.1333333",
"end": "0:00:10.8"
}
]
},
{
"id": 3,
"instances": [
{
"adjustedStart": "0:00:10.8",
"adjustedEnd": "0:00:26.9333333",
"start": "0:00:10.8",
"end": "0:00:26.9333333"
}
]
}...
{
"id": 31,
"instances": [
{
"adjustedStart": "0:18:45",
"adjustedEnd": "0:18:50.2",
"start": "0:18:45",
"end": "0:18:50.2"
}
]
}
],
"shots": [
{
"id": 1,
"tags": [
"Wide",
"Medium"
],
"keyFrames": [
{
"id": 1,
"instances": [
{
"thumbnailId": "60152925-0e6d-48cf-be33-aa6c00dfb334",
"adjustedStart": "0:00:00.1666667",
"adjustedEnd": "0:00:00.2",
"start": "0:00:00.1666667",
"end": "0:00:00.2"
}
]
},
{
"id": 2,
"instances": [
{
"thumbnailId": "f1a09cdf-b42b-45f5-bc69-5292d1216e50",
"adjustedStart": "0:00:00.2333333",
"adjustedEnd": "0:00:00.2666667",
"start": "0:00:00.2333333",
"end": "0:00:00.2666667"
}
]
}
],
"instances": [
{
"adjustedStart": "0:00:00",
"adjustedEnd": "0:00:01.9333333",
"start": "0:00:00",
"end": "0:00:01.9333333"
}
]
},
{
"id": 2,
"tags": [
"Medium"
],
"keyFrames": [
{
"id": 3,
"instances": [
{
"thumbnailId": "b17774d0-41cf-4174-9c41-6bc2f17c86e2",
"adjustedStart": "0:00:02",
"adjustedEnd": "0:00:02.0333333",
"start": "0:00:02",
"end": "0:00:02.0333333"
}
]
}
],
"instances": [
{
"adjustedStart": "0:00:01.9333333",
"adjustedEnd": "0:00:02.9666667",
"start": "0:00:01.9333333",
"end": "0:00:02.9666667"
}
]
}...
Download the keyframes with the API
To download each keyframe, use the keyframe IDs with the Get Thumbnails request.
Warning
We do not recommend that you use data directly from the artifacts folder for production purposes. Artifacts are intermediate outputs of the indexing process. They are essentially raw outputs of the various AI engines that analyze the videos; the artifacts schema may change over time.
Important
It is important to read the transparency note overview for all VI features. Each insight also has transparency notes of its own:
Scene, shot, and keyframe detection notes
- The detector works best on media files that have shots and scenes within them.
- If the video is filmed with one camera that never moves, the shot segmentation works poorly, and the keyframes might not be representative.
- Keyframes are selected by taking into account the blurriness level of the frames. If most of the shot is blurry, for example with motion, the keyframe might also be blurry.
- Videos with poor visual quality produce poor results.
- The time of each shot/scene/keyframe might shift (less than a second).
Scene, shot, and keyframe components
No components defined.