Getting started with Cognitive Services - Vision

Article
1/17/2024

Computer Vision API

Azure's cloud-based Computer Vision API is easy and a lot of fun to use. This Wiki provides an intro to using the API and is a supplement to the great documentation already available on the Azure docs: Computer Vision Documentation.

Scenario

The sample code will perform the following.

Given a public URL of an image, the image will be retrieved.
The image is sent to the vision API for analysis.
The image is sent to the vision API produce a thumbnail.

Vision API

The Computer Vision API Version 1.0 supports many activities on images. Here is a summary of the supported actions:

The first step using the API is to create a Cognitive Services in an Azure subscription:

There are several APIs available and in our scenario we are interested in the Vision API:

There are two pricing tiers currently available: free and S1 Standard. You can have one free one per subscription as indicated in the image below:

Retrieving the image

This is simple enough thanks to the HttpClient and the GetByteArrayAsync method. After the responses from the vision API are retrieved, they are combined in a view model and returned as a JsonResult:

using (var client = new HttpClient()) 
{ 
    byte[] byteData = await client.GetByteArrayAsync(url); 
 
    var celebritiesResponse = AnalyseCelebrities(byteData); 
    var thumbnailResponse = GetThumbnail(byteData); 
 
    return new  JsonResult(new  Model(celebritiesResponse, thumbnailResponse)); 
}

Vision API Analyze

The Vision API analyze supports several capabilities supported by supplying different arguments. In this example, we are posting the data as a byte array and returning the JSON response as a string:

private string  AnalyseCelebrities(byte[] byteData)
{
    string requestParameters = "visualFeatures=Categories,Tags,Description,Faces,ImageType,Color,Adult&details=Celebrities&language=en";
 
    string uri = "https://westus.api.cognitive.microsoft.com/vision/v1.0/analyze?" + requestParameters;
 
    HttpResponseMessage response;
 
    using(var client = new  HttpClient())
    using (var content = new ByteArrayContent(byteData))
    {
        client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "<key>");
 
        // This example uses content type "application/octet-stream".
        // The other content types you can use are "application/json" and "multipart/form-data".
        content.Headers.ContentType = new  MediaTypeHeaderValue("application/octet-stream");
        response = client.PostAsync(uri, content).Result;
              
        return response.Content.ReadAsStringAsync().Result;
    }
}

Parsing the result is then a simple exercise using Json.Net:

var celebrities = JsonConvert.DeserializeObject<CelebritiesResponse>(celebritiesResponse);

The following is a basic class structure matching the response but care should be used as this was written during the preview of Cognitive Services:

public class  CelebritiesResponse
{
    public CategoriesSection[] categories { get; set; }
    public AdultSection adult { get; set; }
    public TagSection[] tags { get; set; }
    public DescriptionSection description { get; set; }
    public FaceSection[] faces { get; set; }
    public ColorSection color { get; set; }
    public ImageTypeSection imageType { get; set; }
}
 
public class  CategoriesSection
{
    public string  name { get; set; }
    public double  score { get; set; }
    public DetailsSection detail { get; set; }
}
public class  DetailsSection
{
    public CelebritiesSection[] celebrities { get; set; }
}
 
public class  CelebritiesSection
{
    public string  name { get; set; }
    public double  confidence { get; set; }
}
 
public class  ImageTypeSection
{
    public int  clipArtType { get; set; }
    public int  lineDrawingType { get; set; }
}
public class  ColorSection
{
    public string  dominantColorForeground { get; set; }
    public string  dominantColorBackground { get; set; }
    public bool  isBWImg { get; set; }
}
public class  FaceSection
{
    public int  age { get; set; }
    public string  gender { get; set; }
}
public class  AdultSection
{
    public double  adultScore { get; set; }
    public double  racyScore { get; set; }
}
public class  DescriptionSection
{
    public CaptionSection[] captions { get; set; }
}
 
public class  TagSection
{
    public string  name { get; set; }
    public string  confidence { get; set; }
}
 
public class  CaptionSection
{
    public string  text { get; set; }
    public string  confidence { get; set; }
}

Generate Thumbnail

Generating a thumbnail is just as simple. One thing to note is the response is kept as a base64 string to allow for displaying in HTML more efficiently:

private string  GetThumbnail(byte[] byteData, int  width = 300, int  height = 300)
{
    string uri = $"https://westus.api.cognitive.microsoft.com/vision/v1.0/generateThumbnail?width={width}&height={height}&smartCropping=true";
 
    var client = new  HttpClient();
                         
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "<key>");
 
    HttpResponseMessage response;
 
    using (var content = new ByteArrayContent(byteData))
    {
        content.Headers.ContentType = new  MediaTypeHeaderValue("application/octet-stream");
        response = client.PostAsync(uri, content).Result;
 
        var base64 = Convert.ToBase64String(response.Content.ReadAsByteArrayAsync().Result);
        return String.Format("data:image/gif;base64,{0}", base64);
    }
}

Summary

Vision API is one of several powerful and easy to use services available in Azure. The documentation is exceptional and includes an overview and how-to and quick starts in multiple languages.

Additionally there is a live API that can be used to call the services without writing any lines of code and is a great way to get started. The Computer Vision API - v1.0 allows you to specify the query parameters and subscription key (after specifying the appropriate data-center):

Share via