Getting started with Cognitive Services - Vision
Computer Vision API
Azure's cloud-based Computer Vision API is easy and a lot of fun to use. This Wiki provides an intro to using the API and is a supplement to the great documentation already available on the Azure docs: Computer Vision Documentation.
Scenario
The sample code will perform the following.
- Given a public URL of an image, the image will be retrieved.
- The image is sent to the vision API for analysis.
- The image is sent to the vision API produce a thumbnail.
Vision API
The Computer Vision API Version 1.0 supports many activities on images. Here is a summary of the supported actions:
- Tag images based on content.
- Categorize images.
- Identify the type and quality of images.
- Detect human faces and return their coordinates.
- Recognize domain-specific content.
- Generate descriptions of the content.
- Use optical character recognition to identify printed text found in images.
- Recognize handwritten text.
- Distinguish color schemes.
- Flag adult content.
- Crop photos to be used as thumbnails.
The first step using the API is to create a Cognitive Services in an Azure subscription:
There are several APIs available and in our scenario we are interested in the Vision API:
There are two pricing tiers currently available: free and S1 Standard. You can have one free one per subscription as indicated in the image below:
Retrieving the image
This is simple enough thanks to the HttpClient and the GetByteArrayAsync method. After the responses from the vision API are retrieved, they are combined in a view model and returned as a JsonResult:
using (var client = new HttpClient())
{
byte[] byteData = await client.GetByteArrayAsync(url);
var celebritiesResponse = AnalyseCelebrities(byteData);
var thumbnailResponse = GetThumbnail(byteData);
return new JsonResult(new Model(celebritiesResponse, thumbnailResponse));
}
Vision API Analyze
The Vision API analyze supports several capabilities supported by supplying different arguments. In this example, we are posting the data as a byte array and returning the JSON response as a string:
private string AnalyseCelebrities(byte[] byteData)
{
string requestParameters = "visualFeatures=Categories,Tags,Description,Faces,ImageType,Color,Adult&details=Celebrities&language=en";
string uri = "https://westus.api.cognitive.microsoft.com/vision/v1.0/analyze?" + requestParameters;
HttpResponseMessage response;
using(var client = new HttpClient())
using (var content = new ByteArrayContent(byteData))
{
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "<key>");
// This example uses content type "application/octet-stream".
// The other content types you can use are "application/json" and "multipart/form-data".
content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
response = client.PostAsync(uri, content).Result;
return response.Content.ReadAsStringAsync().Result;
}
}
Parsing the result is then a simple exercise using Json.Net:
var celebrities = JsonConvert.DeserializeObject<CelebritiesResponse>(celebritiesResponse);
The following is a basic class structure matching the response but care should be used as this was written during the preview of Cognitive Services:
public class CelebritiesResponse
{
public CategoriesSection[] categories { get; set; }
public AdultSection adult { get; set; }
public TagSection[] tags { get; set; }
public DescriptionSection description { get; set; }
public FaceSection[] faces { get; set; }
public ColorSection color { get; set; }
public ImageTypeSection imageType { get; set; }
}
public class CategoriesSection
{
public string name { get; set; }
public double score { get; set; }
public DetailsSection detail { get; set; }
}
public class DetailsSection
{
public CelebritiesSection[] celebrities { get; set; }
}
public class CelebritiesSection
{
public string name { get; set; }
public double confidence { get; set; }
}
public class ImageTypeSection
{
public int clipArtType { get; set; }
public int lineDrawingType { get; set; }
}
public class ColorSection
{
public string dominantColorForeground { get; set; }
public string dominantColorBackground { get; set; }
public bool isBWImg { get; set; }
}
public class FaceSection
{
public int age { get; set; }
public string gender { get; set; }
}
public class AdultSection
{
public double adultScore { get; set; }
public double racyScore { get; set; }
}
public class DescriptionSection
{
public CaptionSection[] captions { get; set; }
}
public class TagSection
{
public string name { get; set; }
public string confidence { get; set; }
}
public class CaptionSection
{
public string text { get; set; }
public string confidence { get; set; }
}
Generate Thumbnail
Generating a thumbnail is just as simple. One thing to note is the response is kept as a base64 string to allow for displaying in HTML more efficiently:
private string GetThumbnail(byte[] byteData, int width = 300, int height = 300)
{
string uri = $"https://westus.api.cognitive.microsoft.com/vision/v1.0/generateThumbnail?width={width}&height={height}&smartCropping=true";
var client = new HttpClient();
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "<key>");
HttpResponseMessage response;
using (var content = new ByteArrayContent(byteData))
{
content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
response = client.PostAsync(uri, content).Result;
var base64 = Convert.ToBase64String(response.Content.ReadAsByteArrayAsync().Result);
return String.Format("data:image/gif;base64,{0}", base64);
}
}
Summary
Vision API is one of several powerful and easy to use services available in Azure. The documentation is exceptional and includes an overview and how-to and quick starts in multiple languages.
Additionally there is a live API that can be used to call the services without writing any lines of code and is a great way to get started. The Computer Vision API - v1.0 allows you to specify the query parameters and subscription key (after specifying the appropriate data-center):