提取关键短语
关键短语提取是 Azure AI 语言提供的一项功能, 用于标识文本中的关键短语或主要概念。
有多种方法可以调用关键短语提取 API。 在这里,使用 azure_ai
扩展提取 SQL 查询中的关键短语。
先决条件
你需要具有 Azure Database for PostgreSQL 灵活服务器,且已启用并配置 azure_ai
扩展。 还需要使用 Azure 认知服务对其进行授权,方法是设置语言资源的密钥和终结点。
方案
关键短语提取适用于各种任务:
- 摘要:使用关键短语将冗长的文档精简为核心主题,例如识别音频口述文本或会议记录中讨论的主题。
- 内容分类:使用关键短语为文档编制索引以供搜索和浏览。 关键短语还可用于可视化文字云中的文档。
- 文档聚类分析:可以使用关键短语对各种支持工单、产品评审和其他非结构化输入的集合进行聚集和分析。
将关键短语提取 SQL 与 Azure 认知服务配合使用
Azure Database for PostgreSQL 灵活服务器的 azure_ai 扩展提供用户定义的函数 (UDF) 来直接访问 SQL 内的 AI 功能。 关键短语提取 API 通过 azure_cognitive.extract_key_phrases
函数进行访问:
azure_cognitive.extract_key_phrases(
text TEXT,
language TEXT,
timeout_ms INTEGER DEFAULT 3600000,
throw_on_error BOOLEAN DEFAULT TRUE,
disable_service_logs BOOLEAN DEFAULT FALSE
)
所需的参数包括 text
、输入和 language
,后者是编写 text
时所采用的语言。 例如,en-us
为美国英语,fr
为法语。 有关可用语言的完整列表,请参阅语言支持。
默认情况下,如果关键短语提取未在 3,600,000 毫秒(即 1 小时)内完成,则其会停止。 可以通过更改 timeout_ms
来自定义此延迟。
如果发生错误,则默认行为是引发异常,从而导致事务回滚。 通过将 throw_on_error
设置为 false,可禁用此行为。
有关完整参数文档,请参阅 Azure 认知服务扩展文档。
例如,调用以下查询:
SELECT azure_cognitive.extract_key_phrases('The food was delicious and the staff were wonderful.', 'en-us');
结果如下:
extract_key_phrases
---------------------
{food,staff}
可以对输入文本使用表列:
SELECT description, azure_cognitive.extract_key_phrases(description, 'en-us')
FROM listings LIMIT 1;
返回结果(启用 \x
可进行扩展显示):
description | Welcome! If you stay here you will be living in a light filled two bedroom upper and ground level apartment (in a two apartment home). During your stay you will be welcome to share in our fresh eggs from the chickens and garden produce in season! Welcome! Come enjoy your time in Seattle at a lovely urban farmstead. There are two bedrooms each with a queen bed, full bath, living room and kitchen with wood floors throughout. During your stay you will be welcome to eat fresh eggs from the chickens and possibly fruit/veggies from the garden if you are in luck! We are family friendly and have a down to earth atmosphere. There is a large covered back porch and grill for hanging out especially in summer and a treehouse for up in the trees hammock time! Walking distance to Othello Light Rail Station for easy access to downtown. Also nearby is the fantastic Seward Park and the Kubota Gardens for outdoorsy loveliness. New last year is out beautiful Rainier Beach indoor swimming pool comp
extract_key_phrases | {"beautiful Rainier Beach indoor swimming pool","large covered back porch","Othello Light Rail Station","ground level apartment","lovely urban farmstead","fantastic Seward Park","two bedroom upper","two apartment home","two bedrooms","fresh eggs","queen bed","full bath","living room","wood floors","earth atmosphere","Walking distance","easy access","Kubota Gardens","outdoorsy loveliness","garden produce","hammock time",stay,chickens,season,Seattle,kitchen,fruit/veggies,luck,grill,summer,treehouse,trees,downtown,last}
总结
关键短语提取选择文本中的主要概念。 Azure 认知服务语言模型负责将自然语言归结为关键字或短语。 Azure Database for PostgreSQL 的 azure_ai
扩展提供了 azure_cognitive.extract_key_phrases
API,用于直接在 SQL 查询内访问关键短语提取。