ใช้การวิเคราะห์ข้อความจัดทําสําเร็จใน Fabric กับ REST API และ SynapseML (ตัวอย่าง)

บทความ
01/27/2025

สําคัญ

คุณลักษณะนี้อยู่ใน แสดงตัวอย่าง

Text Analytics เป็นบริการ Azure AI ที่ช่วยให้คุณสามารถดําเนินการขุดข้อความและวิเคราะห์ข้อความด้วยคุณลักษณะการประมวลผลภาษาธรรมชาติ (NLP)

บทช่วยสอนนี้สาธิตโดยใช้การวิเคราะห์ข้อความใน Fabric ด้วย RESTful API เพื่อ:

ตรวจหาป้ายชื่อความคิดเห็นในระดับประโยคหรือเอกสาร
ระบุภาษาสําหรับการป้อนข้อความที่กําหนด
แยกระยะคีย์ออกจากข้อความ
ระบุเอนทิตี้อื่นในข้อความและจัดประเภทเอนทิตี้ลงในคลาสหรือประเภทที่กําหนดไว้ล่วงหน้า

# Get workload endpoints and access token

from synapse.ml.mlflow import get_mlflow_env_config
import json

mlflow_env_configs = get_mlflow_env_config()
access_token = access_token = mlflow_env_configs.driver_aad_token
prebuilt_AI_base_host = mlflow_env_configs.workload_endpoint + "cognitive/textanalytics/"
print("Workload endpoint for AI service: \n" + prebuilt_AI_base_host)

service_url = prebuilt_AI_base_host + "language/:analyze-text?api-version=2022-05-01"

# Make a RESful request to AI service

post_headers = {
    "Content-Type" : "application/json",
    "Authorization" : "Bearer {}".format(access_token)
}

def printresponse(response):
    print(f"HTTP {response.status_code}")
    if response.status_code == 200:
        try:
            result = response.json()
            print(json.dumps(result, indent=2, ensure_ascii=False))
        except:
            print(f"pasre error {response.content}")
    else:
        print(response.headers)
        print(f"error message: {response.content}")

import synapse.ml.core
from synapse.ml.cognitive.language import AnalyzeText
from pyspark.sql.functions import col

การวิเคราะห์ความคิดเห็น

Rest API ของ
SynapseML

คุณลักษณะการวิเคราะห์ความคิดเห็นมีวิธีในการตรวจหาป้ายชื่อความคิดเห็น (เช่น "ลบ", "กลาง" และ "บวก") และคะแนนความเชื่อมั่นที่ประโยคและระดับเอกสาร คุณลักษณะนี้ยังส่งกลับคะแนนความเชื่อมั่นระหว่าง 0 และ 1 สําหรับแต่ละเอกสารและประโยคที่อยู่ภายในสําหรับความคิดเห็นที่เป็นบวก กลาง และลบ ดูการสนับสนุนภาษาการวิเคราะห์ความคิดเห็น และการขุดความคิดเห็น สําหรับรายการของภาษาที่เปิดใช้งาน

import requests
from pprint import pprint
import uuid

post_body = {
    "kind": "SentimentAnalysis",
    "parameters": {
        "modelVersion": "latest",
        "opinionMining": "True"
    },
    "analysisInput":{
        "documents":[
            {
                "id":"1",
                "language":"en",
                "text": "The food and service were unacceptable. The concierge was nice, however."
            }
        ]
    }
} 

post_headers["x-ms-workload-resource-moniker"] = str(uuid.uuid1())
response = requests.post(service_url, json=post_body, headers=post_headers)

# Output all information of the request process
printresponse(response)

ผลิตภัณฑ์

    HTTP 200
    {
      "kind": "SentimentAnalysisResults",
      "results": {
        "documents": [
          {
            "id": "1",
            "sentiment": "mixed",
            "confidenceScores": {
              "positive": 0.43,
              "neutral": 0.04,
              "negative": 0.53
            },
            "sentences": [
              {
                "sentiment": "negative",
                "confidenceScores": {
                  "positive": 0.0,
                  "neutral": 0.01,
                  "negative": 0.99
                },
                "offset": 0,
                "length": 40,
                "text": "The food and service were unacceptable. ",
                "targets": [
                  {
                    "sentiment": "negative",
                    "confidenceScores": {
                      "positive": 0.01,
                      "negative": 0.99
                    },
                    "offset": 4,
                    "length": 4,
                    "text": "food",
                    "relations": [
                      {
                        "relationType": "assessment",
                        "ref": "#/documents/0/sentences/0/assessments/0"
                      }
                    ]
                  },
                  {
                    "sentiment": "negative",
                    "confidenceScores": {
                      "positive": 0.01,
                      "negative": 0.99
                    },
                    "offset": 13,
                    "length": 7,
                    "text": "service",
                    "relations": [
                      {
                        "relationType": "assessment",
                        "ref": "#/documents/0/sentences/0/assessments/0"
                      }
                    ]
                  }
                ],
                "assessments": [
                  {
                    "sentiment": "negative",
                    "confidenceScores": {
                      "positive": 0.01,
                      "negative": 0.99
                    },
                    "offset": 26,
                    "length": 12,
                    "text": "unacceptable",
                    "isNegated": false
                  }
                ]
              },
              {
                "sentiment": "positive",
                "confidenceScores": {
                  "positive": 0.86,
                  "neutral": 0.08,
                  "negative": 0.07
                },
                "offset": 40,
                "length": 32,
                "text": "The concierge was nice, however.",
                "targets": [
                  {
                    "sentiment": "positive",
                    "confidenceScores": {
                      "positive": 1.0,
                      "negative": 0.0
                    },
                    "offset": 44,
                    "length": 9,
                    "text": "concierge",
                    "relations": [
                      {
                        "relationType": "assessment",
                        "ref": "#/documents/0/sentences/1/assessments/0"
                      }
                    ]
                  }
                ],
                "assessments": [
                  {
                    "sentiment": "positive",
                    "confidenceScores": {
                      "positive": 1.0,
                      "negative": 0.0
                    },
                    "offset": 58,
                    "length": 4,
                    "text": "nice",
                    "isNegated": false
                  }
                ]
              }
            ],
            "warnings": []
          }
        ],
        "errors": [],
        "modelVersion": "2022-11-01"
      }
    }

คุณลักษณะการวิเคราะห์ความคิดเห็นมีวิธีในการตรวจหาป้ายชื่อความคิดเห็น (เช่น "ลบ", "กลาง" และ "บวก") และคะแนนความเชื่อมั่นที่ประโยคและระดับเอกสาร คุณลักษณะนี้ยังส่งกลับคะแนนความเชื่อมั่นระหว่าง 0 และ 1 สําหรับแต่ละเอกสาร & ประโยคภายในเอกสารดังกล่าวสําหรับความคิดเห็นที่เป็นบวก เป็นกลาง และเชิงลบ ดูการสนับสนุนภาษาการวิเคราะห์ความคิดเห็น และการขุดความคิดเห็น สําหรับรายการของภาษาที่เปิดใช้งาน

df = spark.createDataFrame([
    ("Great atmosphere. Close to plenty of restaurants, hotels, and transit! Staff are friendly and helpful.",),
    ("What a sad story!",)
], ["text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("SentimentAnalysis")
        .setOutputCol("response"))

result = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("sentiment", col("documents.sentiment"))

display(result.select("text", "sentiment"))

ตัวตรวจหาภาษา

Rest API ของ
SynapseML

ตัวตรวจหาภาษาจะประเมินการป้อนข้อความสําหรับแต่ละเอกสาร และส่งกลับตัวระบุภาษาที่มีคะแนนที่ระบุความเข้มของการวิเคราะห์ ความสามารถนี้มีประโยชน์สําหรับร้านค้าเนื้อหาที่รวบรวมข้อความตามอําเภอใจที่ไม่รู้จักภาษา ดู ภาษาที่รองรับสําหรับการตรวจหาภาษา สําหรับรายการของภาษาที่เปิดใช้งาน

post_body = {
    "kind": "LanguageDetection",
    "parameters": {
        "modelVersion": "latest"
    },
    "analysisInput":{
        "documents":[
            {
                "id":"1",
                "text": "This is a document written in English."
            }
        ]
    }
}

post_headers["x-ms-workload-resource-moniker"] = str(uuid.uuid1())
response = requests.post(service_url, json=post_body, headers=post_headers)

# Output all information of the request process
printresponse(response)

ผลิตภัณฑ์

    HTTP 200
    {
      "kind": "LanguageDetectionResults",
      "results": {
        "documents": [
          {
            "id": "1",
            "detectedLanguage": {
              "name": "English",
              "iso6391Name": "en",
              "confidenceScore": 0.99
            },
            "warnings": []
          }
        ],
        "errors": [],
        "modelVersion": "2022-10-01"
      }
    }

df = spark.createDataFrame([
    (["Hello world"],),
    (["Bonjour tout le monde", "Hola mundo", "Tumhara naam kya hai?"],),
    (["你好"],),
    (["日本国（にほんこく、にっぽんこく、英"],)
], ["text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("LanguageDetection")
        .setOutputCol("response"))

result = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("detectedLanguage", col("documents.detectedLanguage.name"))

display(result.select("text", "detectedLanguage"))

ตัวแยกวลีหลัก

Rest API ของ
SynapseML

การแยกวลีหลักจะประเมินข้อความที่ไม่มีโครงสร้างและส่งกลับรายการของวลีหลัก ความสามารถนี้มีประโยชน์ถ้าคุณต้องการระบุจุดหลักในคอลเลกชันของเอกสารอย่างรวดเร็ว ดู ภาษาที่รองรับสําหรับการแยกวลีหลัก สําหรับรายการของภาษาที่เปิดใช้งาน

post_body = {
    "kind": "KeyPhraseExtraction",
    "parameters": {
        "modelVersion": "latest"
    },
    "analysisInput":{
        "documents":[
            {
                "id":"1",
                "language":"en",
                "text": "Dr. Smith has a very modern medical office, and she has great staff."
            }
        ]
    }
}

post_headers["x-ms-workload-resource-moniker"] = str(uuid.uuid1())
response = requests.post(service_url, json=post_body, headers=post_headers)

# Output all information of the request process
printresponse(response)

ผลิตภัณฑ์

    HTTP 200
    {
      "kind": "KeyPhraseExtractionResults",
      "results": {
        "documents": [
          {
            "id": "1",
            "keyPhrases": [
              "modern medical office",
              "Dr. Smith",
              "great staff"
            ],
            "warnings": []
          }
        ],
        "errors": [],
        "modelVersion": "2022-10-01"
      }
    }

df = spark.createDataFrame([
    ("en", "Microsoft was founded by Bill Gates and Paul Allen."),
    ("en", "Text Analytics is one of the Azure Cognitive Services."),
    ("en", "My cat might need to see a veterinarian.")
], ["language", "text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("KeyPhraseExtraction")
        .setOutputCol("response"))

result = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("keyPhrases", col("documents.keyPhrases"))

display(result.select("text", "keyPhrases"))

การจดจําเอนทิตีที่มีชื่อ (NER)

Rest API ของ
SynapseML

การรู้จําเอนทิตี้ที่มีชื่อ (NER) คือความสามารถในการระบุเอนทิตี้ที่แตกต่างกันในข้อความและจัดประเภทลงในคลาสหรือประเภทที่กําหนดไว้ล่วงหน้า เช่น บุคคล ตําแหน่งที่ตั้ง เหตุการณ์ ผลิตภัณฑ์ และองค์กร ดูภาษา NER รองรับ สําหรับรายการของภาษาที่เปิดใช้งาน

post_body = {
    "kind": "EntityRecognition",
    "parameters": {
        "modelVersion": "latest"
    },
    "analysisInput":{
        "documents":[
            {
                "id":"1",
                "language": "en",
                "text": "I had a wonderful trip to Seattle last week."
            }
        ]
    }
}

post_headers["x-ms-workload-resource-moniker"] = str(uuid.uuid1())
response = requests.post(service_url, json=post_body, headers=post_headers)

# Output all information of the request process
printresponse(response)

ผลิตภัณฑ์

    HTTP 200
    {
      "kind": "EntityRecognitionResults",
      "results": {
        "documents": [
          {
            "id": "1",
            "entities": [
              {
                "text": "trip",
                "category": "Event",
                "offset": 18,
                "length": 4,
                "confidenceScore": 0.74
              },
              {
                "text": "Seattle",
                "category": "Location",
                "subcategory": "GPE",
                "offset": 26,
                "length": 7,
                "confidenceScore": 1.0
              },
              {
                "text": "last week",
                "category": "DateTime",
                "subcategory": "DateRange",
                "offset": 34,
                "length": 9,
                "confidenceScore": 0.8
              }
            ],
            "warnings": []
          }
        ],
        "errors": [],
        "modelVersion": "2021-06-01"
      }
    }

df = spark.createDataFrame([
    ("en", "Microsoft was founded by Bill Gates and Paul Allen."),
    ("en", "Pike place market is my favorite Seattle attraction.")
], ["language", "text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("EntityRecognition")
        .setOutputCol("response"))

result = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("entityNames", col("documents.entities.text"))

display(result.select("text", "entityNames"))

การเชื่อมโยงเอนทิตี

Rest API ของ
SynapseML

ไม่มีขั้นตอนสําหรับ REST API ในส่วนนี้

การเชื่อมโยงเอนทิตีระบุและแยกแยะข้อมูลประจําตัวของเอนทิตีที่พบในข้อความ ตัวอย่างเช่น ในประโยค "เราไปซีแอตเทิลเมื่อสัปดาห์ที่แล้ว" จะมีการระบุคําว่า "Seattle" พร้อมลิงก์ไปยังข้อมูลเพิ่มเติมเกี่ยวกับวิกิพีเดีย ดู ภาษาที่สนับสนุนสําหรับการเชื่อมโยงเอนทิตี สําหรับรายการของภาษาที่เปิดใช้งาน

df = spark.createDataFrame([
    ("en", "Microsoft was founded by Bill Gates and Paul Allen."),
    ("en", "Pike place market is my favorite Seattle attraction.")
], ["language", "text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("EntityLinking")
        .setOutputCol("response"))

result = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("entityNames", col("documents.entities.name"))

display(result)

แชร์ผ่าน

ใช้การวิเคราะห์ข้อความจัดทําสําเร็จใน Fabric กับ REST API และ SynapseML (ตัวอย่าง)

ข้อกําหนดเบื้องต้น

การวิเคราะห์ความคิดเห็น

ผลิตภัณฑ์

ตัวตรวจหาภาษา

ผลิตภัณฑ์

ตัวแยกวลีหลัก

ผลิตภัณฑ์

การจดจําเอนทิตีที่มีชื่อ (NER)

ผลิตภัณฑ์

การเชื่อมโยงเอนทิตี

คำติชม

แหล่งทรัพยากรเพิ่มเติม

แชร์ผ่าน

ใช้การวิเคราะห์ข้อความจัดทําสําเร็จใน Fabric กับ REST API และ SynapseML (ตัวอย่าง)

ข้อกําหนดเบื้องต้น

การวิเคราะห์ความคิดเห็น

ผลิตภัณฑ์

ตัวตรวจหาภาษา

ผลิตภัณฑ์

ตัวแยกวลีหลัก

ผลิตภัณฑ์

การจดจําเอนทิตีที่มีชื่อ (NER)

ผลิตภัณฑ์

การเชื่อมโยงเอนทิตี

เนื้อหาที่เกี่ยวข้อง

คำติชม

แหล่งทรัพยากรเพิ่มเติม