Microsoft OCR Library for Windows Runtime

Article
1/17/2024

Introduction

Microsoft OCR Library for Windows Runtime was released as a NuGet package in 2014.
It enables developers to easily add text recognition capabilities to Windows Phone 8/8.1 and Windows 8.1 Store apps.
It was designed with flexibility and performance in mind - allowing OCR of a wide variety of image types with numerous performance optimizations.
Another great feature is that the image processing is done on the client side.
This article demonstrates how to get started with the Microsoft OCR Library and provides an example where it is used in a windows Store App.

Using the Microsoft OCR Library

Step 1: Install the nuget package

https://www.nuget.org/packages/Microsoft.Windows.Ocr/

Step 2: Create and instance of OcrEngine.

OcrEngine ocrEngine = new  OcrEngine(OcrLanguage.English);

The code above Initializes a new instance of the OcrEngine class and specifies the language to use for optical character recognition (OCR).
OcrLanguage defines the language of text for OCR to detect in the target image.

Step 3: Select which image file to use and open a random-access stream oven the file.

var file = await Package.Current.InstalledLocation.GetFileAsync("g.jpg");
   using (var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read))
}

Step 4: Create an instance of the image decoder.

var decoder = await BitmapDecoder.CreateAsync(stream);

The code above Asynchronously creates a new BitmapDecoder using a specific bitmap codec and initializes it using a stream.

Step 5: Get the image width and height.

width = decoder.PixelWidth;
height = decoder.PixelHeight;

Step 6: Read the pixels data from the image.

var pixels = await decoder.GetPixelDataAsync(
  BitmapPixelFormat.Bgra8,
  BitmapAlphaMode.Straight,
  new BitmapTransform(),
  ExifOrientationMode.RespectExifOrientation,
  ColorManagementMode.ColorManageToSRgb
);

The method decoder.GetPixelDataAsync takes the following parameters:

a. BitmapPixelFormat: Specifies the pixel format of pixel data. Each enumeration value defines a channel ordering, bit depth, and data type.
b. BitmapAlphaMode: Specifies the alpha mode of pixel data.
c. BitmapTransform: Contains transformations that can be applied to pixel data.
d. ExifOrientationMode: Specifies the EXIF orientation flag behavior when obtaining pixel data.
e. ColorManagementMode: Specifies the color management behavior when obtaining pixel data.

Step 7: Extract text from the image.

OcrResult result = await ocrEngine.RecognizeAsync(height, width, pixels.DetachPixelData());

The method RecognizeAsync scans the specified image for text in the language specified by the Language property.

This method returns an object of type OcrResult which contains a collection of OcrLine objects, which you access through the Lines property of the OcrResult.

Step 8: Loop through the lines and retrieve the text.

string recognizedText = "";
// Check whether text is detected.
if (result.Lines != null)
{
   // Collect recognized text.
   foreach (var line in result.Lines)
   {
      foreach (var word in line.Words) 
      {
            recognizedText += word.Text + " ";
      }
      recognizedText += Environment.NewLine;
    }
}

Each OcrLine object contains a collection of OcrWord objects, which can be accessed through the Words property of each OcrLine.
Each OcrWord object specifies the text, size, and position information of the word in the image.

Example: Microsoft OCR Library in a Windows Store App.

The example below shows how to extract text from an image, display the text and make the App "speak" the contents of the image.

The layout consists of the following elements:


    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}"> 
        <MediaElement Grid.Row="0" x:Name="media" AutoPlay="True"/> 
        <Button x:Name="btnSelectImage" Content="Select Image" HorizontalAlignment="Left" Height="47" Margin="110,435,0,0" VerticalAlignment="Top" Width="136" Click="btnSelectImage_Click"/> 
        <Image x:Name="img" HorizontalAlignment="Left" Height="368" Margin="44,47,0,0" VerticalAlignment="Top" Width="447"/> 
        <TextBlock x:Name="txtTrasnlatedText" HorizontalAlignment="Left" Height="368" Margin="547,47,0,0" TextWrapping="Wrap" VerticalAlignment="Top" Width="437" FontSize="30" /> 
        <Button x:Name="btnSpeak" Content="Speak!" HorizontalAlignment="Left" Height="47" Margin="266,435,0,0" VerticalAlignment="Top" Width="137" Click="btnSpeak_Click" Visibility="Collapsed"/> 
    </Grid>

Step 1: Select the image

The image is loaded using a file picker after which, the image is passed to the method ReadImage.

        private async void btnSelectImage_Click(object sender, RoutedEventArgs e)
        {
            FileOpenPicker openPicker = new  FileOpenPicker();
            openPicker.ViewMode = PickerViewMode.Thumbnail;
            openPicker.SuggestedStartLocation = PickerLocationId.PicturesLibrary; 
            openPicker.FileTypeFilter.Add(".jpg");
            openPicker.FileTypeFilter.Add(".jpeg");
            openPicker.FileTypeFilter.Add(".png");


            StorageFile file = await openPicker.PickSingleFileAsync();
            if (file != null)
            {
                BitmapImage image = new  BitmapImage();
                IRandomAccessStream fileStream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read);
                image.SetSource(fileStream);
                img.Source = image;


                string text = await ReadImage(file);
                txtTrasnlatedText.Text = text;


                btnSpeak.Visibility = Visibility.Visible;
            } 
            else 
            { 
                txtTrasnlatedText.Text = "Could not load image"; 
            } 
        }

Step 2: Retrieve the text from the image

The method ReadImage uses the library discussed above to extract the text from the image.

        public async Task<string> ReadImage(StorageFile file)
        {
            ocrEngine = new  OcrEngine(OcrLanguage.English);


            using (var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read)) 
            {
                // Create image decoder.
                var decoder = await BitmapDecoder.CreateAsync(stream);


                width = decoder.PixelWidth;
                height = decoder.PixelHeight;


                // Get pixels in BGRA format. 
                var pixels = await decoder.GetPixelDataAsync(
                    BitmapPixelFormat.Bgra8,
                    BitmapAlphaMode.Straight,
                    new BitmapTransform(),
                    ExifOrientationMode.RespectExifOrientation,
                    ColorManagementMode.ColorManageToSRgb);


                // Extract text from image.
                OcrResult result = await ocrEngine.RecognizeAsync(height, width, pixels.DetachPixelData());


                string recognizedText = "";
                // Check whether text is detected.
                if (result.Lines != null)
                { 
                    // Collect recognized text.


                    foreach (var line in result.Lines)
                    {
                        foreach (var word in line.Words)
                        {
                            recognizedText += word.Text + " ";
                        }
                        recognizedText += Environment.NewLine;
                    }
                }


                return (recognizedText);
            }
        }

Step 3: The "speak!" method

This method uses speech to read the text extracted from the image.

        private void  btnSpeak_Click(object sender, RoutedEventArgs e)
        {
            Speak(txtTrasnlatedText.Text);
        }


        public async void Speak(string Text)
        {


            // The media object for controlling and playing audio.
            MediaElement mediaElement = this.media;


            // The object for controlling the speech synthesis engine (voice).
            var synth = new  Windows.Media.SpeechSynthesis.SpeechSynthesizer();


            // Generate the audio stream from plain text.
            SpeechSynthesisStream stream = await synth.SynthesizeTextToStreamAsync(Text);


            // Send the stream to the media object.
            mediaElement.SetSource(stream, stream.ContentType);
            mediaElement.Play();
                }

Code Sample

The Code Sample can be download here.

References

Share via