Entity Recognition cognitive skill (v2)
The Entity Recognition skill (v2) extracts entities of different types from text. This skill uses the machine learning models provided by Text Analytics in Azure AI services.
Important
The Entity Recognition skill (v2) (Microsoft.Skills.Text.EntityRecognitionSkill) is now discontinued replaced by Microsoft.Skills.Text.V3.EntityRecognitionSkill. Follow the recommendations in Deprecated skills to migrate to a supported skill.
Note
As you expand scope by increasing the frequency of processing, adding more documents, or adding more AI algorithms, you will need to attach a billable Azure AI services resource. Charges accrue when calling APIs in Azure AI services, and for image extraction as part of the document-cracking stage in Azure AI Search. There are no charges for text extraction from documents.
Execution of built-in skills is charged at the existing Azure AI services pay-as-you go price. Image extraction pricing is described on the Azure AI Search pricing page.
@odata.type
Microsoft.Skills.Text.EntityRecognitionSkill
Data limits
The maximum size of a record should be 50,000 characters as measured by String.Length
. If you need to break up your data before sending it to the key phrase extractor, consider using the Text Split skill. If you do use a text split skill, set the page length to 5000 for the best performance.
Skill parameters
Parameters are case-sensitive and are all optional.
Parameter name | Description |
---|---|
categories |
Array of categories that should be extracted. Possible category types: "Person" , "Location" , "Organization" , "Quantity" , "Datetime" , "URL" , "Email" . If no category is provided, all types are returned. |
defaultLanguageCode |
Language code of the input text. The following languages are supported: ar, cs, da, de, en, es, fi, fr, hu, it, ja, ko, nl, no, pl, pt-BR, pt-PT, ru, sv, tr, zh-hans . Not all entity categories are supported for all languages; see note below. |
minimumPrecision |
A value between 0 and 1. If the confidence score (in the namedEntities output) is lower than this value, the entity is not returned. The default is 0. |
includeTypelessEntities |
Set to true if you want to recognize well-known entities that don't fit the current categories. Recognized entities are returned in the entities complex output field. For example, "Windows 10" is a well-known entity (a product), but since "Products" is not a supported category, this entity would be included in the entities output field. Default is false |
Skill inputs
Input name | Description |
---|---|
languageCode |
Optional. Default is "en" . |
text |
The text to analyze. |
Skill outputs
Note
Not all entity categories are supported for all languages. The "Person"
, "Location"
, and "Organization"
entity category types are supported for the full list of languages above. Only de, en, es, fr, and zh-hans support extraction of "Quantity"
, "Datetime"
, "URL"
, and "Email"
types. For more information, see Language and region support for the Text Analytics API.
Output name | Description |
---|---|
persons |
An array of strings where each string represents the name of a person. |
locations |
An array of strings where each string represents a location. |
organizations |
An array of strings where each string represents an organization. |
quantities |
An array of strings where each string represents a quantity. |
dateTimes |
An array of strings where each string represents a DateTime (as it appears in the text) value. |
urls |
An array of strings where each string represents a URL |
emails |
An array of strings where each string represents an email |
namedEntities |
An array of complex types that contains the following fields:
|
entities |
An array of complex types that contains rich information about the entities extracted from text, with the following fields
|
Sample definition
{
"@odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
"categories": [ "Person", "Email"],
"defaultLanguageCode": "en",
"includeTypelessEntities": true,
"minimumPrecision": 0.5,
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "persons",
"targetName": "people"
},
{
"name": "emails",
"targetName": "contact"
},
{
"name": "entities"
}
]
}
Sample input
{
"values": [
{
"recordId": "1",
"data":
{
"text": "Contoso corporation was founded by John Smith. They can be reached at contact@contoso.com",
"languageCode": "en"
}
}
]
}
Sample output
{
"values": [
{
"recordId": "1",
"data" :
{
"persons": [ "John Smith"],
"emails":["contact@contoso.com"],
"namedEntities":
[
{
"category":"Person",
"value": "John Smith",
"offset": 35,
"confidence": 0.98
}
],
"entities":
[
{
"name":"John Smith",
"wikipediaId": null,
"wikipediaLanguage": null,
"wikipediaUrl": null,
"bingId": null,
"type": "Person",
"subType": null,
"matches": [{
"text": "John Smith",
"offset": 35,
"length": 10
}]
},
{
"name": "contact@contoso.com",
"wikipediaId": null,
"wikipediaLanguage": null,
"wikipediaUrl": null,
"bingId": null,
"type": "Email",
"subType": null,
"matches": [
{
"text": "contact@contoso.com",
"offset": 70,
"length": 19
}]
},
{
"name": "Contoso",
"wikipediaId": "Contoso",
"wikipediaLanguage": "en",
"wikipediaUrl": "https://en.wikipedia.org/wiki/Contoso",
"bingId": "349f014e-7a37-e619-0374-787ebb288113",
"type": null,
"subType": null,
"matches": [
{
"text": "Contoso",
"offset": 0,
"length": 7
}]
}
]
}
}
]
}
Note that the offsets returned for entities in the output of this skill are directly returned from the Text Analytics API, which means if you are using them to index into the original string, you should use the StringInfo class in .NET in order to extract the correct content. More details can be found here.
Warning cases
If the language code for the document is unsupported, a warning is returned and no entities are extracted.