What is Video Analysis?

Artikkeli
03/21/2025

Video Analysis includes video-related features like Spatial Analysis and Video Retrieval.

Spatial Analysis

Important

On 30 March 2025, Azure AI Vision Spatial Analysis will be retired. Please transition to Azure AI Video Indexer or another open-source solution before the specified date. We encourage you to make the switch sooner to gain the richer benefits of Azure AI Video Indexer. In addition to the familiar features you are using, here's a quick comparison between Azure AI Vision Spatial Analysis and Azure AI Video Indexer.

Feature	Azure AI Vision Spatial Analysis	Azure AI Video Indexer
Edge support	Yes	Yes
Object Detection	People & Vehicle detection only	Detects 1000+ objects
Audio/Speech Processing	Not supported	Supported (includes speech transcription, translation and summarization) Supported >(includes speech transcription and sentiment analysis)
Event Detection & Tracking	Supported (tracking people & vehicles, event detection)	Not supported at the Edge yet. Is partially supported at the Cloud.
Azure Arc Support	Not supported	Native support
Focus Area	Visual analysis with specialized tracking	Comprehensive analysis of both audio and visual content

From now until 30 March 2025, you can continue to use Azure AI Vision Spatial Analysis or transition to Azure AI Video Indexer before the specified date. After 30 March 2025, the Spatial Analysis container will no longer be supported and will stop processing new streams.

You can use Azure AI Vision Spatial Analysis to detect the presence and movements of people in video. Ingest video streams from cameras, extract insights, and generate events to be used by other systems. The service can do things like count the number of people entering a space or measure compliance with face mask and social distancing guidelines. By processing video streams from physical spaces, you can learn how people use them and maximize the space's value to your organization.

Try out the capabilities of Spatial Analysis quickly and easily in your browser by using Azure AI Vision Studio.

Try Vision Studio

People counting

This operation counts the number of people in a specific zone over time using the PersonCount operation. It generates an independent count for each frame processed without attempting to track people across frames. This operation can be used to estimate the number of people in a space or generate an alert when a person appears.

Animation showing how Spatial Analysis counts the number of people in the cameras field of view.

Entrance Counting

This feature monitors how long people stay in an area or when they enter through a doorway. This monitoring can be done using the PersonCrossingPolygon or PersonCrossingLine operations. In retail scenarios, these operations can be used to measure wait times for a checkout line or engagement at a display. Also, these operations could measure foot traffic in a lobby or a specific floor in other commercial building scenarios.

Animation showing frames of people moving in and out of a bordered space, with rectangles drawn around them.

This feature analyzes how well people follow social distancing requirements in a space. The system uses the PersonDistance operation to automatically calibrate itself as people walk around in the space. Then it identifies when people violate a specific distance threshold (6 ft. or 10 ft.).

Animation showing how Spatial Analysis visualizes social distance violation events showing lines between people showing the distance.

Spatial Analysis can also be configured to detect if a person is wearing a protective face covering such as a mask. A mask classifier can be enabled for the PersonCount, PersonCrossingLine, and PersonCrossingPolygon operations by configuring the ENABLE_FACE_MASK_CLASSIFIER parameter.

Photograph showing how Spatial Analysis classifies whether people have facemasks in an elevator.

Video Retrieval

Important

On 30 June 2025, Azure AI Vision Video Retrieval will be retired. The decision to retire this feature is part of our ongoing effort to improve and simplify and improve the features offered for video processing. Migrate to Azure AI Content Understanding and Azure AI Search to benefit from their additional capabilities.

Video processing: Video Retrieval vs Azure AI Content Understanding

Feature	Video Retrieval for video description	Azure AI Content Understanding
Video Length Supported	Optimized for short videos, up to ~3 minutes	Supports short & long videos, up to 4 hours
Frame Processing	Up to 20 frames	Batch processing, sampling shot-by-shot sampled across entire video
Content Extraction Pre-Processing	Transcription	Transcription, Shot identification, Face grouping
Structured Output Support	Not supported	Supports schema-conforming structured outputs
Data types	Video supported	Video, images, documents, and speech supported
Pricing	Variable Token-based	Fixed cost per minute of video processed

To migrate to Content Understanding for video summaries and descriptions, we'd recommend reviewing the Azure AI Content Understanding documentation.

Video Search: Video Retrieval vs. Azure AI Search and Content Understanding

Feature	Video Retrieval for video search	Azure AI Search and Content Understanding
Visual Embedding type	Frame-based Image Embeddings	Video description text embeddings
Content Extraction Pre-Processing	Transcription, OCR	Transcription, Shot identification, Face grouping
People & Object search support	Strong support	Strong support
Action and Event support	Limited	Strong support
Customization	None	Content Understanding analyzer can be customized to focus using the fields and field descriptions

To start building the search use case with Content Understanding, we recommend starting with this sample which shows how to use Azure AI Search to search videos.

To avoid service disruptions, migrate by 30 June 2025.

Video Retrieval is a service that lets you create a search index, add documents (videos and images) to it, and search with natural language. Developers can define metadata schemas for each index and ingest metadata to the service to help with retrieval. Developers can also specify what features to extract from the index (vision, speech) and filter their search based on features.

Call the Video Retrieval APIs

Spatial Analysis works on videos that meet the following requirements:

The video must be in RTSP, rawvideo, MP4, FLV, or MKV format.
The video codec must be H.264, HEVC(H.265), rawvideo, VP9, or MPEG-4.

Supported formats

File format	Description
`asf`	ASF (Advanced / Active Streaming Format)
`avi`	AVI (Audio Video Interleaved)
`flv`	FLV (Flash Video)
`matroskamm`, `webm`	Matroska / WebM
`mov`,`mp4`,`m4a`,`3gp`,`3g2`,`mj2`	QuickTime / MOV

Supported video codecs

Codec	Format
`h264`	H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
`h265`	H.265/HEVC
`libvpx-vp9`	libvpx VP9 (codec vp9)
`mpeg4`	MPEG-4 part 2

Supported audio codecs

Codec	Format
`aac`	AAC (Advanced Audio Coding)
`mp3`	MP3 (MPEG audio layer 3)
`pcm`	PCM (uncompressed)
`vorbis`	Vorbis
`wmav2`	Windows Media Audio 2

Responsible use of Spatial Analysis technology

To learn how to use Spatial Analysis technology responsibly, see the Transparency note. Microsoft's transparency notes help you understand how our AI technology works and the choices system owners can make that influence system performance and behavior. They focus on the importance of thinking about the whole system including the technology, people, and environment.

Next step

Install and run the Spatial Analysis container

Jaa

What is Video Analysis?

Spatial Analysis

People counting

Entrance Counting

Video Retrieval

Input requirements

Supported formats

Supported video codecs

Supported audio codecs

Responsible use of Spatial Analysis technology

Next step

Palaute

Lisäresursseja

Jaa

What is Video Analysis?

Spatial Analysis

People counting

Entrance Counting

Social distancing and face mask detection

Video Retrieval

Input requirements

Responsible use of Spatial Analysis technology

Next step

Palaute

Lisäresursseja