Επεξεργασία

Κοινή χρήση μέσω


How to use Named Entity Recognition (NER)

The NER feature can evaluate unstructured text, and extract named entities from text in several predefined categories, for example: person, location, event, product, and organization.

Development options

To use named entity recognition, you submit raw unstructured text for analysis and handle the API output in your application. Analysis is performed as-is, with no additional customization to the model used on your data. There are two ways to use named entity recognition:

Development option Description
Language studio Language Studio is a web-based platform that lets you try entity linking with text examples without an Azure account, and your own data when you sign up. For more information, see the Language Studio website or language studio quickstart.
REST API or Client library (Azure SDK) Integrate named entity recognition into your applications using the REST API, or the client library available in a variety of languages. For more information, see the named entity recognition quickstart.

Determine how to process the data (optional)

Input languages

When you submit documents to be processed, you can specify which of the supported languages they're written in. if you don't specify a language, key phrase extraction defaults to English. The API may return offsets in the response to support different multilingual and emoji encodings.

Submitting data

Analysis is performed upon receipt of the request. Using the NER feature synchronously is stateless. No data is stored in your account, and results are returned immediately in the response.

When using this feature asynchronously, the API results are available for 24 hours from the time the request was ingested, and is indicated in the response. After this time period, the results are purged and are no longer available for retrieval.

The API attempts to detect the defined entity categories for a given document language.

Getting NER results

When you get results from NER, you can stream the results to an application or save the output to a file on the local system. The API response includes recognized entities, including their categories and subcategories, and confidence scores.

Select which entities to be returned

The API attempts to detect the defined entity types and tags for a given document language. The entity types and tags replace the categories and subcategories structure the older models use to define entities for more flexibility. You can also specify which entities are detected and returned, use the optional includeList and excludeList parameters with the appropriate entity types. The following example would detect only Location. You can specify one or more entity types to be returned. Given the types and tags hierarchy introduced for this version, you have the flexibility to filter on different granularity levels as so:

Input:

Note

In this example, it returns only the "Location" entity type.

{
    "kind": "EntityRecognition",
    "parameters": 
    {
        "includeList" :
        [
            "Location"
        ]
    },
    "analysisInput":
    {
        "documents":
        [
            {
                "id":"1",
                "language": "en",
                "text": "We went to Contoso foodplace located at downtown Seattle last week for a dinner party, and we adore the spot! They provide marvelous food and they have a great menu. The chief cook happens to be the owner (I think his name is John Doe) and he is super nice, coming out of the kitchen and greeted us all. We enjoyed very much dining in the place! The pasta I ordered was tender and juicy, and the place was impeccably clean. You can even pre-order from their online menu at www.contosofoodplace.com, call 112-555-0176 or send email to order@contosofoodplace.com! The only complaint I have is the food didn't come fast enough. Overall I highly recommend it!"
            }
        ]
    }
}

The above examples would return entities falling under the Location entity type such as the GPE, Structural, and Geological tagged entities as outlined by entity types and tags. We could also further filter the returned entities by filtering using one of the entity tags for the Location entity type such as filtering over GPE tag only as outlined:


    "parameters": 
    {
        "includeList" :
        [
            "GPE"
        ]
    }
    

This method returns all Location entities only falling under the GPE tag and ignore any other entity falling under the Location type that is tagged with any other entity tag such as Structural or Geological tagged Location entities. We could also further drill down on our results by using the excludeList parameter. GPE tagged entities could be tagged with the following tags: City, State, CountryRegion, Continent. We could, for example, exclude Continent and CountryRegion tags for our example:


    "parameters": 
    {
        "includeList" :
        [
            "GPE"
        ],
        "excludeList": :
        [
            "Continent",
            "CountryRegion"
        ]
    }
    

Using these parameters we can successfully filter on only Location entity types, since the GPE entity tag included in the includeList parameter, falls under the Location type. We then filter on only Geopolitical entities and exclude any entities tagged with Continent or CountryRegion tags.

Additional output attributes

In order to provide users with more insight into an entity's types and provide increased usability, NER supports these attributes in the output:

Name of the attribute Type Definition
type String The most specific type of detected entity.

For example, “Seattle” is a City, a GPE (Geo Political Entity) and a Location. The most granular classification for “Seattle” is that it is a City. The type would be City for the text “Seattle".
tags List (tags) A list of tag objects which expresses the affinity of the detected entity to a hierarchy or any other grouping.

A tag contains two fields:
1. name: A unique name for the tag.
2. confidenceScore: The associated confidence score for a tag ranging from 0 to 1.

This unique tagName is be used to filter in the inclusionList and exclusionList parameters.
metadata Object Metadata is an object containing more data about the entity type detected. It changes based on the field metadataKind.

Sample output

This sample output includes an example of the additional output attributes.

{ 
    "kind": "EntityRecognitionResults", 
    "results": { 
        "documents": [ 
            { 
                "id": "1", 
                "entities": [ 
                    { 
                        "text": "Microsoft", 
                        "category": "Organization", 
                        "type": "Organization", 
                        "offset": 0, 
                        "length": 9, 
                        "confidenceScore": 0.97, 
                        "tags": [ 
                            { 
                                "name": "Organization", 
                                "confidenceScore": 0.97 
                            } 
                        ] 
                    }, 
                    { 
                        "text": "One", 
                        "category": "Quantity", 
                        "type": "Number", 
                        "subcategory": "Number", 
                        "offset": 21, 
                        "length": 3, 
                        "confidenceScore": 0.9, 
                        "tags": [ 
                            { 
                                "name": "Number", 
                                "confidenceScore": 0.8 
                            }, 
                            { 
                                "name": "Quantity", 
                                "confidenceScore": 0.8 
                            }, 
                            { 
                                "name": "Numeric", 
                                "confidenceScore": 0.8 
                            } 
                        ], 
                        "metadata": { 
                            "metadataKind": "NumberMetadata", 
                            "numberKind": "Integer", 
                            "value": 1.0 
                        } 
                    } 
                ], 
                "warnings": [] 
            } 
        ], 
        "errors": [], 
        "modelVersion": "2023-09-01" 
    } 
} 

Specify the NER model

By default, this feature uses the latest available AI model on your text. You can also configure your API requests to use a specific model version.

Service and data limits

For information on the size and number of requests you can send per minute and second, see the service limits article.

Next steps

Named Entity Recognition overview