你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn。
适用于 JavaScript 的 Azure 文本分析客户端库 - 版本 6.0.0-beta.1
适用于语言的 Azure 认知服务 是一种基于云的服务,可针对原始文本提供高级自然语言处理,并包含以下主要功能:
注意: 此 SDK 面向适用于语言的 Azure 认知服务 API 版本 2022-04-01-preview。
- 语言检测
- 情绪分析
- 关键短语提取
- 命名实体识别
- 识别个人身份信息
- 实体链接
- 医疗保健分析
- 提取性摘要
- 自定义实体识别
- 自定义文档分类
- 支持每个文档的多个操作
使用客户端库可以执行以下操作:
- 检测写入的语言输入文本。
- 通过分析原始文本以获取有关正面或负面情绪的线索,确定客户对你的品牌或主题的看法。
- 自动提取关键短语,以快速识别要点。
- 识别文本中的实体并将其分类为人员、地点、组织、日期/时间、数量、百分比、货币、医疗保健特定等。
- 一次执行多个上述任务。
关键链接:
入门
目前支持的环境
- LTS 版本的 Node.js
- 最新版本的 Safari、Chrome、Edge 和 Firefox。
有关更多详细信息,请参阅我们的支持政策。
先决条件
如果使用 Azure CLI,请将 和 <your-resource-name>
替换为<your-resource-group-name>
自己的唯一名称:
az cognitiveservices account create --kind TextAnalytics --resource-group <your-resource-group-name> --name <your-resource-name> --sku <your-sku-name> --location <your-location>
安装 @azure/ai-text-analytics
包
使用 安装适用于 JavaScript 的 npm
Azure 文本分析客户端库:
npm install @azure/ai-text-analytics
创建 TextAnalysisClient
并对其进行身份验证
若要创建客户端对象来访问语言 API,需要 endpoint
语言资源的 和 credential
。 文本分析客户端可以使用 Azure Active Directory 凭据或 API 密钥凭据进行身份验证。
可以在 Azure 门户 或使用以下 Azure CLI 代码片段中找到语言资源的终结点:
az cognitiveservices account show --name <your-resource-name> --resource-group <your-resource-group-name> --query "properties.endpoint"
使用 API 密钥
使用 Azure 门户 浏览到语言资源并检索 API 密钥,或使用下面的 Azure CLI 代码片段:
注意: 有时,API 密钥称为“订阅密钥”或“订阅 API 密钥”。
az cognitiveservices account keys list --resource-group <your-resource-group-name> --name <your-resource-name>
拥有 API 密钥和终结点后,可以使用 AzureKeyCredential
类对客户端进行身份验证,如下所示:
const { TextAnalysisClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
使用 Azure Active Directory 凭据
大多数示例都使用客户端 API 密钥身份验证,但也可以使用 Azure 标识库通过 Azure Active Directory 进行身份验证。 若要使用如下所示的 DefaultAzureCredential 提供程序或 Azure SDK 提供的其他凭据提供程序,请安装包 @azure/identity
:
npm install @azure/identity
还需要通过向服务主体分配"Cognitive Services User"
角色来注册新的 AAD 应用程序并授予对语言的访问权限 (注意:其他角色(如)"Owner"
不会授予必要的权限,仅"Cognitive Services User"
足以) 运行示例和示例代码。
将 AAD 应用程序的客户端 ID、租户 ID 和客户端机密的值设置为环境变量: AZURE_CLIENT_ID
、 AZURE_TENANT_ID
、 AZURE_CLIENT_SECRET
。
const { TextAnalysisClient } = require("@azure/ai-text-analytics");
const { DefaultAzureCredential } = require("@azure/identity");
const client = new TextAnalysisClient("<endpoint>", new DefaultAzureCredential());
关键概念
TextAnalysisClient
TextAnalysisClient
是开发人员使用文本分析客户端库的主要接口。 浏览此客户端对象上的方法,了解可以访问的语言服务的不同功能。
输入
文档表示要由语言服务中的预测模型分析的单个输入单位。 上的 TextAnalysisClient
操作将输入集合作为批处理进行分析。 操作方法具有重载,允许输入表示为字符串或具有附加元数据的对象。
例如,每个文档都可以作为数组中的字符串传递,例如
const documents = [
"I hated the movie. It was so slow!",
"The movie made it into my top ten favorites.",
"What a great movie!",
];
或者,如果要传入每个项的文档id
或 language
countryHint
/,可以将其作为 列表TextDocumentInput
提供,或DetectLanguageInput
取决于操作;
const textDocumentInputs = [
{ id: "1", language: "en", text: "I hated the movie. It was so slow!" },
{ id: "2", language: "en", text: "The movie made it into my top ten favorites." },
{ id: "3", language: "en", text: "What a great movie!" },
];
请参阅输入 的服务限制 ,包括文档长度限制、最大批大小和支持的文本编码。
返回值
对应于单个文档的返回值是成功结果或错误对象。 每个方法返回 TextAnalysisClient
一个异类数组,其中包含与按索引的输入对应的结果和错误。 文本输入及其结果在输入和结果集合中具有相同的索引。
结果(如 SentimentAnalysisResult
)是语言操作的结果,其中包含有关单个文本输入的预测。 操作的结果类型还可以选择性地包含有关输入文档及其处理方式的信息。
error 对象 TextAnalysisErrorResult
指示服务在处理文档时遇到错误,并包含有关错误的信息。
文档错误处理
在操作返回的集合中,如果遇到错误,则通过属性的存在 error
将错误与成功响应区分开来,该属性包含内部 TextAnalysisError
对象。 对于成功的结果对象,此属性 始终undefined
为 。
例如,若要筛选出所有错误,可以使用以下 filter
代码:
const results = await client.analyze("SentimentAnalysis", documents);
const onlySuccessful = results.filter((result) => result.error === undefined);
注意:如果在配置中tsconfig.json
将 true
设置为 ,TypeScript compilerOptions.strictNullChecks
用户可以受益于结果和错误对象的更好类型检查。 例如:
const [result] = await client.analyze("SentimentAnalysis", ["Hello world!"]);
if (result.error !== undefined) {
// In this if block, TypeScript will be sure that the type of `result` is
// `TextAnalysisError` if compilerOptions.strictNullChecks is enabled in
// the tsconfig.json
console.log(result.error);
}
示例
情绪分析
分析文本情绪,以确定文本是正面、消极、中性还是混合的,包括每句情绪分析和置信度分数。
const { TextAnalysisClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"I did not like the restaurant. The food was too spicy.",
"The restaurant was decorated beautifully. The atmosphere was unlike any other restaurant I've been to.",
"The food was yummy. :)",
];
async function main() {
const results = await client.analyze("SentimentAnalysis", documents);
for (const result of results) {
if (result.error === undefined) {
console.log("Overall sentiment:", result.sentiment);
console.log("Scores:", result.confidenceScores);
} else {
console.error("Encountered an error:", result.error);
}
}
}
main();
若要获取与产品/服务的各个方面(也称为自然语言处理 (NLP) 中基于方面的情绪分析)相关的更精细的信息,请参阅 此处通过观点挖掘进行情绪分析的示例。
实体识别
识别文本中的实体并将其分类为人员、地点、组织、日期/时间、数量、货币等。
language
参数是可选的。 如果未指定,将使用默认的英语模型。
const { TextAnalysisClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"Microsoft was founded by Bill Gates and Paul Allen.",
"Redmond is a city in King County, Washington, United States, located 15 miles east of Seattle.",
"Jeff bought three dozen eggs because there was a 50% discount.",
];
async function main() {
const results = await client.analyze("EntityRecognition", documents, "en");
for (const result of results) {
if (result.error === undefined) {
console.log(" -- Recognized entities for input", result.id, "--");
for (const entity of result.entities) {
console.log(entity.text, ":", entity.category, "(Score:", entity.confidenceScore, ")");
}
} else {
console.error("Encountered an error:", result.error);
}
}
}
main();
PII 实体识别
有一个单独的操作用于识别个人身份信息 (PII) 文本,如社会保险号、银行帐户信息、信用卡号等。其用法与上述标准实体识别非常相似:
const { TextAnalysisClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"The employee's SSN is 555-55-5555.",
"The employee's phone number is (555) 555-5555.",
];
async function main() {
const results = await client.analyze("PiiEntityRecognition", documents, "en");
for (const result of results) {
if (result.error === undefined) {
console.log(" -- Recognized PII entities for input", result.id, "--");
for (const entity of result.entities) {
console.log(entity.text, ":", entity.category, "(Score:", entity.confidenceScore, ")");
}
} else {
console.error("Encountered an error:", result.error);
}
}
}
main();
实体链接
“链接”实体是存在于维基百科) 等知识库 (中的实体。 该EntityLinking
操作可以通过确定知识库中可能引用 (的条目来消除实体的歧义,例如,在一段文本中,“火星”一词是否指的是行星,或罗马战神) 。 链接实体包含指向提供实体定义的知识库的关联 URL。
const { TextAnalysisClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"Microsoft was founded by Bill Gates and Paul Allen.",
"Easter Island, a Chilean territory, is a remote volcanic island in Polynesia.",
"I use Azure Functions to develop my product.",
];
async function main() {
const results = await client.analyze("EntityLinking", documents, "en");
for (const result of results) {
if (result.error === undefined) {
console.log(" -- Recognized linked entities for input", result.id, "--");
for (const entity of result.entities) {
console.log(entity.name, "(URL:", entity.url, ", Source:", entity.dataSource, ")");
for (const match of entity.matches) {
console.log(
" Occurrence:",
'"' + match.text + '"',
"(Score:",
match.confidenceScore,
")"
);
}
}
} else {
console.error("Encountered an error:", result.error);
}
}
}
main();
关键短语提取
关键短语提取标识文档中的主要谈话点。 例如,给定输入文本“The food was delicious and there were wonderful staff”,服务会返回“food”和“wonderful staff”。
const { TextAnalysisClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"Redmond is a city in King County, Washington, United States, located 15 miles east of Seattle.",
"I need to take my cat to the veterinarian.",
"I will travel to South America in the summer.",
];
async function main() {
const results = await client.analyze("KeyPhraseExtraction", documents, "en");
for (const result of results) {
if (result.error === undefined) {
console.log(" -- Extracted key phrases for input", result.id, "--");
console.log(result.keyPhrases);
} else {
console.error("Encountered an error:", result.error);
}
}
}
main();
语言检测
确定文本段的语言。
参数 countryHint
是可选的,但如果知道来源国家/地区,则可以帮助服务提供正确的输出。 如果提供,应将其设置为 ISO-3166 Alpha-2 双字母国家/地区代码 (例如美国的“us”或“jp”(日本) 或值 "none"
)。 如果未提供 参数,则将使用默认"us"
(美国) 模型。 如果不知道文档的原产国,则应使用 参数 "none"
,语言服务将应用针对未知来源国家/地区优化的模型。
const { TextAnalysisClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"This is written in English.",
"Il documento scritto in italiano.",
"Dies ist in deutscher Sprache verfasst.",
];
async function main() {
const results = await client.analyze("LanguageDetection", documents, "none");
for (const result of results) {
if (result.error === undefined) {
const { primaryLanguage } = result;
console.log(
"Input #",
result.id,
"identified as",
primaryLanguage.name,
"( ISO6391:",
primaryLanguage.iso6391Name,
", Score:",
primaryLanguage.confidenceScore,
")"
);
} else {
console.error("Encountered an error:", result.error);
}
}
}
main();
医疗保健分析
医疗保健分析可识别医疗保健实体。 例如,给定输入文本“处方 100mg 布洛芬,每天服用两次”,服务返回“100mg”分类为剂量,“布洛芬”作为药物名称,“每日两次”作为频率。
const {
AnalyzeBatchAction,
AzureKeyCredential,
TextAnalysisClient,
} = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"Prescribed 100mg ibuprofen, taken twice daily.",
"Patient does not suffer from high blood pressure.",
];
async function main() {
const actions: AnalyzeBatchAction[] = [
{
kind: "Healthcare",
},
];
const poller = await client.beginAnalyzeBatch(actions, documents, "en");
const results = await poller.pollUntilDone();
for await (const actionResult of results) {
if (actionResult.kind !== "Healthcare") {
throw new Error(`Expected a healthcare results but got: ${actionResult.kind}`);
}
if (actionResult.error) {
const { code, message } = actionResult.error;
throw new Error(`Unexpected error (${code}): ${message}`);
}
for (const result of actionResult.results) {
console.log(`- Document ${result.id}`);
if (result.error) {
const { code, message } = result.error;
throw new Error(`Unexpected error (${code}): ${message}`);
}
console.log("\tRecognized Entities:");
for (const entity of result.entities) {
console.log(`\t- Entity "${entity.text}" of type ${entity.category}`);
if (entity.dataSources.length > 0) {
console.log("\t and it can be referenced in the following data sources:");
for (const ds of entity.dataSources) {
console.log(`\t\t- ${ds.name} with Entity ID: ${ds.entityId}`);
}
}
}
}
}
}
main();
提取性摘要
提取式摘要标识汇总其所属文章的句子。
const {
AnalyzeBatchAction,
AzureKeyCredential,
TextAnalysisClient,
} = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"Prescribed 100mg ibuprofen, taken twice daily.",
"Patient does not suffer from high blood pressure.",
];
async function main() {
const actions: AnalyzeBatchAction[] = [
{
kind: "ExtractiveSummarization",
maxSentenceCount: 2,
},
];
const poller = await client.beginAnalyzeBatch(actions, documents, "en");
const results = await poller.pollUntilDone();
for await (const actionResult of results) {
if (actionResult.kind !== "ExtractiveSummarization") {
throw new Error(`Expected extractive summarization results but got: ${actionResult.kind}`);
}
if (actionResult.error) {
const { code, message } = actionResult.error;
throw new Error(`Unexpected error (${code}): ${message}`);
}
for (const result of actionResult.results) {
console.log(`- Document ${result.id}`);
if (result.error) {
const { code, message } = result.error;
throw new Error(`Unexpected error (${code}): ${message}`);
}
console.log("Summary:");
console.log(result.sentences.map((sentence) => sentence.text).join("\n"));
}
}
}
main();
自定义实体识别
使用使用 Azure Language Studio 生成的自定义实体检测模型,识别文本中的实体并将其分类为实体。
const {
AnalyzeBatchAction,
AzureKeyCredential,
TextAnalysisClient,
} = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"We love this trail and make the trip every year. The views are breathtaking and well worth the hike! Yesterday was foggy though, so we missed the spectacular views. We tried again today and it was amazing. Everyone in my family liked the trail although it was too challenging for the less athletic among us.",
"Last week we stayed at Hotel Foo to celebrate our anniversary. The staff knew about our anniversary so they helped me organize a little surprise for my partner. The room was clean and with the decoration I requested. It was perfect!",
];
async function main() {
const actions: AnalyzeBatchAction[] = [
{
kind: "CustomEntityRecognition",
deploymentName,
projectName,
},
];
const poller = await client.beginAnalyzeBatch(actions, documents, "en");
for await (const actionResult of results) {
if (actionResult.kind !== "CustomEntityRecognition") {
throw new Error(`Expected a CustomEntityRecognition results but got: ${actionResult.kind}`);
}
if (actionResult.error) {
const { code, message } = actionResult.error;
throw new Error(`Unexpected error (${code}): ${message}`);
}
for (const result of actionResult.results) {
console.log(`- Document ${result.id}`);
if (result.error) {
const { code, message } = result.error;
throw new Error(`Unexpected error (${code}): ${message}`);
}
console.log("\tRecognized Entities:");
for (const entity of result.entities) {
console.log(`\t- Entity "${entity.text}" of type ${entity.category}`);
}
}
}
}
main();
自定义单标签分类
使用使用 Azure Language Studio 生成的自定义单标签模型对文档进行分类。
const { TextAnalysisClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"The plot begins with a large group of characters where everyone thinks that the two main ones should be together but foolish things keep them apart. Misunderstandings, miscommunication, and confusion cause a series of humorous situations.",
];
async function main() {
const actions: AnalyzeBatchAction[] = [
{
kind: "CustomSingleLabelClassification",
deploymentName,
projectName,
},
];
const poller = await client.beginAnalyzeBatch(actions, documents, "en");
const results = await poller.pollUntilDone();
for await (const actionResult of results) {
if (actionResult.kind !== "CustomSingleLabelClassification") {
throw new Error(
`Expected a CustomSingleLabelClassification results but got: ${actionResult.kind}`
);
}
if (actionResult.error) {
const { code, message } = actionResult.error;
throw new Error(`Unexpected error (${code}): ${message}`);
}
for (const result of actionResult.results) {
console.log(`- Document ${result.id}`);
if (result.error) {
const { code, message } = result.error;
throw new Error(`Unexpected error (${code}): ${message}`);
}
console.log(`\tClassification: ${result.classification.category}`);
}
}
}
main();
自定义多标签分类
使用使用 Azure Language Studio 生成的自定义多标签模型对文档进行分类。
const {
AnalyzeBatchAction,
AzureKeyCredential,
TextAnalysisClient,
} = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"The plot begins with a large group of characters where everyone thinks that the two main ones should be together but foolish things keep them apart. Misunderstandings, miscommunication, and confusion cause a series of humorous situations.",
];
async function main() {
const actions: AnalyzeBatchAction[] = [
{
kind: "CustomMultiLabelClassification",
deploymentName,
projectName,
},
];
const poller = await client.beginAnalyzeBatch(actions, documents, "en");
const results = await poller.pollUntilDone();
for await (const actionResult of results) {
if (actionResult.kind !== "CustomMultiLabelClassification") {
throw new Error(
`Expected a CustomMultiLabelClassification results but got: ${actionResult.kind}`
);
}
if (actionResult.error) {
const { code, message } = actionResult.error;
throw new Error(`Unexpected error (${code}): ${message}`);
}
for (const result of actionResult.results) {
console.log(`- Document ${result.id}`);
if (result.error) {
const { code, message } = result.error;
throw new Error(`Unexpected error (${code}): ${message}`);
}
console.log(`\tClassification:`);
for (const classification of result.classifications) {
console.log(`\t\t-category: ${classification.category}`);
}
}
}
}
main();
操作批处理
对一个服务请求中的每个输入文档应用多个操作。
const {
AnalyzeBatchAction,
AzureKeyCredential,
TextAnalysisClient,
} = require("@azure/ai-text-analytics");
const client = new TextAnalysisClient("<endpoint>", new AzureKeyCredential("<API key>"));
const documents = [
"Microsoft was founded by Bill Gates and Paul Allen.",
"The employee's SSN is 555-55-5555.",
"Easter Island, a Chilean territory, is a remote volcanic island in Polynesia.",
"I use Azure Functions to develop my product.",
];
async function main() {
const actions: AnalyzeBatchAction[] = [
{
kind: "EntityRecognition",
modelVersion: "latest",
},
{
kind: "PiiEntityRecognition",
modelVersion: "latest",
},
{
kind: "KeyPhraseExtraction",
modelVersion: "latest",
},
];
const poller = await client.beginAnalyzeBatch(actions, documents, "en");
const actionResults = await poller.pollUntilDone();
for await (const actionResult of actionResults) {
if (actionResult.error) {
const { code, message } = actionResult.error;
throw new Error(`Unexpected error (${code}): ${message}`);
}
switch (actionResult.kind) {
case "KeyPhraseExtraction": {
for (const doc of actionResult.results) {
console.log(`- Document ${doc.id}`);
if (!doc.error) {
console.log("\tKey phrases:");
for (const phrase of doc.keyPhrases) {
console.log(`\t- ${phrase}`);
}
} else {
console.error("\tError:", doc.error);
}
}
break;
}
case "EntityRecognition": {
for (const doc of actionResult.results) {
console.log(`- Document ${doc.id}`);
if (!doc.error) {
console.log("\tEntities:");
for (const entity of doc.entities) {
console.log(`\t- Entity ${entity.text} of type ${entity.category}`);
}
} else {
console.error("\tError:", doc.error);
}
}
break;
}
case "PiiEntityRecognition": {
for (const doc of actionResult.results) {
console.log(`- Document ${doc.id}`);
if (!doc.error) {
console.log("\tPii Entities:");
for (const entity of doc.entities) {
console.log(`\t- Entity ${entity.text} of type ${entity.category}`);
}
} else {
console.error("\tError:", doc.error);
}
}
break;
}
default: {
throw new Error(`Unexpected action results: ${actionResult.kind}`);
}
}
}
}
main();
疑难解答
日志记录
启用日志记录可能有助于发现有关故障的有用信息。 若要查看 HTTP 请求和响应的日志,请将 AZURE_LOG_LEVEL
环境变量设置为 info
。 或者,可以在运行时通过调用 @azure/logger
中的 setLogLevel
来启用日志记录:
import { setLogLevel } from "@azure/logger";
setLogLevel("info");
有关如何启用日志的更详细说明,请查看 @azure/logger 包文档。
后续步骤
有关如何使用此库的详细示例,请查看 示例 目录。
贡献
若要为此库做出贡献,请阅读贡献指南,详细了解如何生成和测试代码。