Now in Production: Auto-Extract API for Business Cards, Recipe URLs, and Product URLs

项目
01/20/2015

Last month, I wrote about the exciting new auto-extraction capabilities of the OneNote API. Starting today, auto-extraction is available to all developers via the production OneNote API (https://www.onenote.com/api). We’ve also published official documentation, and you can play with these APIs from our API Reference as usual.

Business Card Extraction Improvements

We heard your feedback in the beta release that business card extraction wasn’t working on certain cards. We’ve taken steps to improve this, and we’ll continue to improve it as time goes on. You can still send us a OneDrive/DropBox/etc link to your scanned business cards to OneNoteBizCards@microsoft.com. We’ll use any images submitted to further expand the language and format coverage.

Rehash: Auto-Extraction Capabilities

Last month, we announced Business Card Scanning in Office Lens, and previously, we announced recipe and product clipping with the OneNote Web Clipper. Today, we’re releasing the same underlying auto-extraction magic for use by anyone via the OneNote API https://www.onenote.com/api.

We've designed auto-extraction to be as easy as possible for developers to use. Just include an empty <div> tag with a few additional properties, and the OneNote API will detect whether it can extract its content onto the page and replace the <div> with a simplified rendering of the content being captured.

Here’s an example of scanning a business card image to OneNote with auto-extraction:

 <div 
    data-render-method="extract" 
    data-render-src="name:scanned-image" />

Types of Auto-Extraction Currently Available

Input Type	Content Types Supported (as of Dec 2014)
Scanned Images	- Business cards
URLs	- Recipes - Products

Input

Scanned Image Extraction

If your app or device captures images to the OneNote API, you can take advantage of business card scanning by including the following markup:

 <div data-render-method="extract" data-render-src="name:scanned-image" />

URL Extraction

If your app captures web content to the OneNote API, you can take advantage of recipe and product clipping by including the following markup:

 <div data-render-method="extract" data-render-src="https://allrecipes.com/recipe/beef-stroganoff-iii/" />

Result

Business Cards

The following business card data is recognized and extracted:

Name
Title
Organization
Phone & fax numbers (made into a tel: link)
Mailing/physical address (with a link to map it on Bing)
Email addresses (made into a mailto: link)
Websites

In addition, a vCard (.VCF file) with the extracted information is embedded in the page so OneNote users can easily import the contact details into Outlook or their phone’s contact list. The vCard is also a convenient way to recall this information from the OneNote API.

Business card recognition works best for English cards right now, but we plan to improve accuracy in other languages in the coming months.

Recipes

The following recipe information is extracted:

Title
Hero image
Rating
Ingredients
Preparation Steps
Prep time
Cook time
Total time

Recipes can be extracted from many top sites such as AllRecipes.com.

Products

The following product detail information is extracted:

Title
Rating
Primary image
Description
Features
Specifications

Products can be extracted from a number of top sites such as Amazon.com, HomeDepot.com, and Sears.com.

Fallback behavior

When using auto-extraction for your user scenario, you should consider what should happen if the OneNote API is unable to extract anything. By default, if OneNote is unable to extract anything, it will render the image or URL onto the page.

You can control the fallback behavior with data-render-fallback. Note that fallback only occurs if OneNote was unable to extract anything – but if extraction was partially successful or contains inaccuracies, fallback is not invoked.

In general, we recommend including the original image or URL on the page. That way, if the OneNote API can’t extract some or all of the information, an image of the original input is always available to the user on the OneNote page. For example:

Business cards

 <div 
    data-render-method="extract" 
    data-render-src="name:scanned-image" 
    data-render-fallback="none" />
<img src="name:scanned-image" />

Recipe and Product URLs

 <div 
    data-render-method="extract" 
    data-render-src="https://allrecipes.com/recipe/beef-stroganoff-iii/" 
    data-render-fallback="none" />
<img data-render-src="https://allrecipes.com/recipe/beef-stroganoff-iii/" />

Reference for Auto-Extraction

Here’s the full syntax:

 <div 
    data-render-method="extract" 
    data-render-src="URL-to-render | name:Multipart-Message-Part-Name" 
    [data-render-fallback="render|none"] />

data-render-method is required and must be set to "extract", "extract.businesscard", "extract.recipe", or "extract.product". If your scenario is general purpose, we recommend using "extract" and letting the API automatically detect the content type. If your scenario is limited to a certain content type, you can specify an explicit content type. In certain cases, specifying an explicit type improve results.
data-render-src can either be an absolute URL or a multipart message part name referencing an image. data-render-src is required.
data-render-fallback controls what should happen if the OneNote API is unable to auto-extract content. If set to "none" and extraction fails, the <div> tag is ignored and does not result in any OneNote content being generated. If set to "render", the content is inserted in the page as an image as if <img data-render-src="…"> was used. If data-render-fallback is omitted, it defaults to "render".

Try It Now!

To try this now, head over to our OneNote API Console and try one of the above HTML snippets in a create page API call. The API is currently in Beta, and we’d love to hear what you think. You can let us know by leaving a comment on this blog post.

Help make it better

Business card scanning works best on English-based business cards right now, but we plan to add additional language support in the future. You can help our recognition algorithms get smarter:

Upload your collection of scanned business cards to a folder on OneDrive.com, Dropbox.com, or any other cloud drive.
Create a sharing link. Here are instructions for: OneDrive and Dropbox.
Email the sharing link to OneNoteBizCards@microsoft.com.

We’ll only use the images to improve our algorithms.

-Greg, Prasad, Ajitesh, Prashant, Julia, Yan, Donny, & Scott with help from Bing and Microsoft Research

Comments

Anonymous
July 11, 2015
The comment has been removed
Anonymous
December 08, 2017
What is the API method name to be used for extracting data from the image

通过