ใช้บริการ Azure AI กับ SynapseML ใน Microsoft Fabric

บทความ
12/12/2023

บริการ Azure AI ช่วยให้นักพัฒนาและองค์กรสามารถสร้าง API และโมเดลที่ชาญฉลาด ทันสมัย และพร้อมสําหรับตลาดได้อย่างรวดเร็ว ในบทความนี้ คุณจะใช้บริการต่าง ๆ ที่พร้อมใช้งานในบริการ Azure AI เพื่อทํางานต่าง ๆ ที่รวมถึง: การวิเคราะห์ข้อความ การแปล การแปล ระบบอัจฉริยะเอกสาร การค้นหารูปภาพ เสียงพูดเป็นข้อความ และการแปลงคําพูด การตรวจหาสิ่งผิดปกติ และการแยกข้อมูลจาก API ของเว็บ

เป้าหมายของบริการ Azure AI คือการช่วยให้นักพัฒนาสร้างแอปพลิเคชันที่สามารถเห็น ได้ยิน พูด ทําความเข้าใจ และแม้แต่เริ่มต้นด้วยเหตุผล แค็ตตาล็อกของบริการภายในบริการ Azure AI สามารถแบ่งออกเป็นห้าเสาหลัก: VisionSpeechLanguageการค้นหาเว็บและการตัดสินใจ

ข้อกําหนดเบื้องต้น

รับ การสมัครใช้งาน Microsoft Fabric หรือลงทะเบียนสําหรับ Microsoft Fabric รุ่นทดลองใช้ฟรี
ลงชื่อเข้าใช้ Microsoft Fabric
ใช้ตัวสลับประสบการณ์การใช้งานที่ด้านล่างซ้ายของหน้าหลักของคุณเพื่อเปลี่ยนเป็น Fabric

สร้าง สมุดบันทึกใหม่
แนบสมุดบันทึกของคุณเข้ากับเลคเฮ้าส์ ทางด้านซ้ายของสมุดบันทึกของคุณ ให้เลือก เพิ่ม เพื่อเพิ่มเลคเฮ้าส์ที่มีอยู่แล้ว หรือสร้างขึ้นใหม่
รับคีย์บริการ Azure AI โดยการทําตาม เริ่มต้นใช้งานด่วน: สร้างทรัพยากรแบบหลายบริการสําหรับบริการ Azure AI คัดลอกค่าของคีย์เพื่อใช้ในตัวอย่างโค้ดด้านล่าง

เตรียมระบบของคุณ

เมื่อต้องการเริ่มต้น ให้นําเข้าไลบรารีที่จําเป็นและเริ่มต้นเซสชัน Spark ของคุณ

from pyspark.sql.functions import udf, col
from synapse.ml.io.http import HTTPTransformer, http_udf
from requests import Request
from pyspark.sql.functions import lit
from pyspark.ml import PipelineModel
from pyspark.sql.functions import col
import os

from pyspark.sql import SparkSession
from synapse.ml.core.platform import *

# Bootstrap Spark Session
spark = SparkSession.builder.getOrCreate()

นําเข้าไลบรารีบริการ Azure AI และแทนที่คีย์และตําแหน่งที่ตั้งในส่วนย่อยของโค้ดต่อไปนี้ด้วยคีย์และตําแหน่งบริการ Azure AI ของคุณ

from synapse.ml.cognitive import *

# A general Azure AI services key for Text Analytics, Vision and Document Intelligence (or use separate keys that belong to each service)
service_key = "<YOUR-KEY-VALUE>" # Replace <YOUR-KEY-VALUE> with your Azure AI service key, check prerequisites for more details
service_loc = "eastus"

# A Bing Search v7 subscription key
bing_search_key =  "<YOUR-KEY-VALUE>" # Replace <YOUR-KEY-VALUE> with your Bing v7 subscription key, check prerequisites for more details

# An Anomaly Detector subscription key
anomaly_key = <"YOUR-KEY-VALUE"> # Replace <YOUR-KEY-VALUE> with your anomaly service key, check prerequisites for more details
anomaly_loc = "westus2"

# A Translator subscription key
translator_key = "<YOUR-KEY-VALUE>" # Replace <YOUR-KEY-VALUE> with your translator service key, check prerequisites for more details
translator_loc = "eastus"

# An Azure search key
search_key = "<YOUR-KEY-VALUE>" # Replace <YOUR-KEY-VALUE> with your search key, check prerequisites for more details

ดําเนินการวิเคราะห์ความคิดเห็นเกี่ยวกับข้อความ

บริการ การวิเคราะห์ข้อความ มีอัลกอริทึมหลายอัลกอริทึมสําหรับการแยกข้อมูลเชิงลึกอัจฉริยะจากข้อความ ตัวอย่างเช่น คุณสามารถใช้บริการเพื่อค้นหาความคิดเห็นของข้อความที่ป้อนเข้าบางอย่างได้ บริการจะส่งกลับคะแนนระหว่าง 0.0 และ 1.0 โดยที่คะแนนต่ําแสดงถึงความคิดเห็นลบและคะแนนสูงแสดงถึงความคิดเห็นบวก

ตัวอย่างโค้ดต่อไปนี้จะส่งกลับความคิดเห็นสําหรับประโยคสามประโยคง่ายๆ

# Create a dataframe that's tied to it's column names
df = spark.createDataFrame(
    [
        ("I am so happy today, its sunny!", "en-US"),
        ("I am frustrated by this rush hour traffic", "en-US"),
        ("The cognitive services on spark aint bad", "en-US"),
    ],
    ["text", "language"],
)

# Run the Text Analytics service with options
sentiment = (
    TextSentiment()
    .setTextCol("text")
    .setLocation(service_loc)
    .setSubscriptionKey(service_key)
    .setOutputCol("sentiment")
    .setErrorCol("error")
    .setLanguageCol("language")
)

# Show the results of your text query in a table format
display(
    sentiment.transform(df).select(
        "text", col("sentiment.document.sentiment").alias("sentiment")
    )
)

ดําเนินการวิเคราะห์ข้อความสําหรับข้อมูลสถานภาพ

Text Analytics สําหรับบริการด้านสุขภาพ แยกและป้ายชื่อข้อมูลทางการแพทย์ที่เกี่ยวข้องจากข้อความที่ไม่มีโครงสร้างเช่นบันทึกย่อของแพทย์สรุปการคายประจุเอกสารทางคลินิกและบันทึกสุขภาพอิเล็กทรอนิกส์

ตัวอย่างโค้ดต่อไปนี้วิเคราะห์และแปลงข้อความจากบันทึกย่อแพทย์ลงในข้อมูลที่มีโครงสร้าง

df = spark.createDataFrame(
    [
        ("20mg of ibuprofen twice a day",),
        ("1tsp of Tylenol every 4 hours",),
        ("6-drops of Vitamin B-12 every evening",),
    ],
    ["text"],
)

healthcare = (
    AnalyzeHealthText()
    .setSubscriptionKey(service_key)
    .setLocation(service_loc)
    .setLanguage("en")
    .setOutputCol("response")
)

display(healthcare.transform(df))

แปลข้อความเป็นภาษาอื่น

Translator เป็นบริการแปลภาษาด้วยเครื่องบนระบบคลาวด์และเป็นส่วนหนึ่งของกลุ่มบริการ Azure AI ของ API ด้านความรู้ความเข้าใจที่ใช้ในการสร้างแอปอัจฉริยะ ตัวแปลภาษาจะง่ายต่อการรวมเข้ากับแอปพลิเคชัน เว็บไซต์ เครื่องมือ และโซลูชันของคุณ ช่วยให้คุณสามารถเพิ่มประสบการณ์ผู้ใช้แบบหลายภาษาใน 90 ภาษาและภาษาเฉพาะและสามารถใช้สําหรับการแปลข้อความด้วยระบบปฏิบัติการใด ๆ

ตัวอย่างโค้ดต่อไปนี้เป็นการแปลข้อความอย่างง่ายโดยให้ประโยคที่คุณต้องการแปลและภาษาเป้าหมายที่คุณต้องการแปล

from pyspark.sql.functions import col, flatten

# Create a dataframe including sentences you want to translate
df = spark.createDataFrame(
    [(["Hello, what is your name?", "Bye"],)],
    [
        "text",
    ],
)

# Run the Translator service with options
translate = (
    Translate()
    .setSubscriptionKey(translator_key)
    .setLocation(translator_loc)
    .setTextCol("text")
    .setToLanguage(["zh-Hans"])
    .setOutputCol("translation")
)

# Show the results of the translation.
display(
    translate.transform(df)
    .withColumn("translation", flatten(col("translation.translations")))
    .withColumn("translation", col("translation.text"))
    .select("translation")
)

แยกข้อมูลจากเอกสารลงในข้อมูลที่มีโครงสร้าง

Azure AI Document Intelligence เป็นส่วนหนึ่งของบริการ Azure AI ที่ช่วยให้คุณสร้างซอฟต์แวร์การประมวลผลข้อมูลอัตโนมัติโดยใช้เทคโนโลยีการเรียนรู้ของเครื่อง ด้วย Azure AI Document Intelligence คุณสามารถระบุและแยกข้อความ คู่คีย์/ค่า เครื่องหมายการเลือก ตาราง และโครงสร้างจากเอกสารของคุณ บริการจะส่งออกข้อมูลที่มีโครงสร้างซึ่งรวมถึงความสัมพันธ์ในไฟล์ต้นฉบับ กล่องแสดงขอบเขต ความเชื่อมั่น และอื่น ๆ

ตัวอย่างรหัสต่อไปนี้วิเคราะห์ภาพนามบัตรและคัดแยกข้อมูลลงในข้อมูลที่มีโครงสร้าง

from pyspark.sql.functions import col, explode

# Create a dataframe containing the source files
imageDf = spark.createDataFrame(
    [
        (
            "https://mmlspark.blob.core.windows.net/datasets/FormRecognizer/business_card.jpg",
        )
    ],
    [
        "source",
    ],
)

# Run the Form Recognizer service
analyzeBusinessCards = (
    AnalyzeBusinessCards()
    .setSubscriptionKey(service_key)
    .setLocation(service_loc)
    .setImageUrlCol("source")
    .setOutputCol("businessCards")
)

# Show the results of recognition.
display(
    analyzeBusinessCards.transform(imageDf)
    .withColumn(
        "documents", explode(col("businessCards.analyzeResult.documentResults.fields"))
    )
    .select("source", "documents")
)

วิเคราะห์และแท็กรูปภาพ

Computer Vision วิเคราะห์รูปภาพเพื่อระบุโครงสร้างเช่น ใบหน้า วัตถุ และคําอธิบายภาษาธรรมชาติ

ตัวอย่างรหัสต่อไปนี้วิเคราะห์รูปภาพและป้ายชื่อด้วยแท็ก แท็กคือคําอธิบายเพียงคําเดียวของสิ่งต่างๆ ในรูปภาพ เช่น วัตถุ ผู้คน ทิวทัศน์ และการดําเนินการที่รู้จัก

# Create a dataframe with the image URLs
base_url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/"
df = spark.createDataFrame(
    [
        (base_url + "objects.jpg",),
        (base_url + "dog.jpg",),
        (base_url + "house.jpg",),
    ],
    [
        "image",
    ],
)

# Run the Computer Vision service. Analyze Image extracts information from/about the images.
analysis = (
    AnalyzeImage()
    .setLocation(service_loc)
    .setSubscriptionKey(service_key)
    .setVisualFeatures(
        ["Categories", "Color", "Description", "Faces", "Objects", "Tags"]
    )
    .setOutputCol("analysis_results")
    .setImageUrlCol("image")
    .setErrorCol("error")
)

# Show the results of what you wanted to pull out of the images.
display(analysis.transform(df).select("image", "analysis_results.description.tags"))

การค้นหารูปภาพ Bing ค้นหาเว็บเพื่อเรียกใช้รูปภาพที่เกี่ยวข้องกับคิวรีภาษาธรรมชาติของผู้ใช้

ตัวอย่างโค้ดต่อไปนี้ใช้คิวรีข้อความที่ค้นหารูปภาพที่มีเครื่องหมายอัญประกาศ ผลลัพธ์ของโค้ดคือรายการของ URL รูปภาพที่มีรูปภาพที่เกี่ยวข้องกับคิวรี

# Number of images Bing will return per query
imgsPerBatch = 10
# A list of offsets, used to page into the search results
offsets = [(i * imgsPerBatch,) for i in range(100)]
# Since web content is our data, we create a dataframe with options on that data: offsets
bingParameters = spark.createDataFrame(offsets, ["offset"])

# Run the Bing Image Search service with our text query
bingSearch = (
    BingImageSearch()
    .setSubscriptionKey(bing_search_key)
    .setOffsetCol("offset")
    .setQuery("Martin Luther King Jr. quotes")
    .setCount(imgsPerBatch)
    .setOutputCol("images")
)

# Transformer that extracts and flattens the richly structured output of Bing Image Search into a simple URL column
getUrls = BingImageSearch.getUrlTransformer("images", "url")

# This displays the full results returned, uncomment to use
# display(bingSearch.transform(bingParameters))

# Since we have two services, they are put into a pipeline
pipeline = PipelineModel(stages=[bingSearch, getUrls])

# Show the results of your search: image URLs
display(pipeline.transform(bingParameters))

แปลงคําพูดเป็นข้อความ

บริการการเปลี่ยนเสียงเป็นข้อความ แปลงสตรีมหรือแฟ้มเสียงที่พูดเป็นข้อความ ตัวอย่างรหัสต่อไปนี้จะถอดรหัสไฟล์เสียงหนึ่งไฟล์เป็นข้อความ

# Create a dataframe with our audio URLs, tied to the column called "url"
df = spark.createDataFrame(
    [("https://mmlspark.blob.core.windows.net/datasets/Speech/audio2.wav",)], ["url"]
)

# Run the Speech-to-text service to translate the audio into text
speech_to_text = (
    SpeechToTextSDK()
    .setSubscriptionKey(service_key)
    .setLocation(service_loc)
    .setOutputCol("text")
    .setAudioDataCol("url")
    .setLanguage("en-US")
    .setProfanity("Masked")
)

# Show the results of the translation
display(speech_to_text.transform(df).select("url", "text.DisplayText"))

แปลงข้อความเป็นคําพูด

Text to speech เป็นบริการที่ช่วยให้คุณสามารถสร้างแอปและบริการที่พูดตามธรรมชาติได้ โดยเลือกจากเสียงประสาทมากกว่า 270 เสียงในหลากหลายภาษาและหลากหลายรูปแบบ

ตัวอย่างโค้ดต่อไปนี้จะแปลงข้อความเป็นไฟล์เสียงที่มีเนื้อหาของข้อความ

from synapse.ml.cognitive import TextToSpeech

fs = ""
if running_on_databricks():
    fs = "dbfs:"
elif running_on_synapse_internal():
    fs = "Files"

# Create a dataframe with text and an output file location
df = spark.createDataFrame(
    [
        (
            "Reading out loud is fun! Check out aka.ms/spark for more information",
            fs + "/output.mp3",
        )
    ],
    ["text", "output_file"],
)

tts = (
    TextToSpeech()
    .setSubscriptionKey(service_key)
    .setTextCol("text")
    .setLocation(service_loc)
    .setVoiceName("en-US-JennyNeural")
    .setOutputFileCol("output_file")
)

# Check to make sure there were no errors during audio creation
display(tts.transform(df))

ตรวจหาความผิดปกติในชุดข้อมูลเวลา

ตัวตรวจจับความผิดปกติ เหมาะสําหรับการตรวจจับความผิดปกติในข้อมูลอนุกรมเวลาของคุณ ตัวอย่างรหัสต่อไปนี้ใช้บริการตัวตรวจหาความผิดปกติเพื่อค้นหาความผิดปกติในข้อมูลชุดข้อมูลเวลาทั้งหมด

# Create a dataframe with the point data that Anomaly Detector requires
df = spark.createDataFrame(
    [
        ("1972-01-01T00:00:00Z", 826.0),
        ("1972-02-01T00:00:00Z", 799.0),
        ("1972-03-01T00:00:00Z", 890.0),
        ("1972-04-01T00:00:00Z", 900.0),
        ("1972-05-01T00:00:00Z", 766.0),
        ("1972-06-01T00:00:00Z", 805.0),
        ("1972-07-01T00:00:00Z", 821.0),
        ("1972-08-01T00:00:00Z", 20000.0),
        ("1972-09-01T00:00:00Z", 883.0),
        ("1972-10-01T00:00:00Z", 898.0),
        ("1972-11-01T00:00:00Z", 957.0),
        ("1972-12-01T00:00:00Z", 924.0),
        ("1973-01-01T00:00:00Z", 881.0),
        ("1973-02-01T00:00:00Z", 837.0),
        ("1973-03-01T00:00:00Z", 9000.0),
    ],
    ["timestamp", "value"],
).withColumn("group", lit("series1"))

# Run the Anomaly Detector service to look for irregular data
anamoly_detector = (
    SimpleDetectAnomalies()
    .setSubscriptionKey(anomaly_key)
    .setLocation(anomaly_loc)
    .setTimestampCol("timestamp")
    .setValueCol("value")
    .setOutputCol("anomalies")
    .setGroupbyCol("group")
    .setGranularity("monthly")
)

# Show the full results of the analysis with the anomalies marked as "True"
display(
    anamoly_detector.transform(df).select("timestamp", "value", "anomalies.isAnomaly")
)

รับข้อมูลจาก API เว็บโดยพลการ

ด้วย HTTP บน Spark คุณสามารถใช้บริการเว็บใด ๆ ในไปป์ไลน์ข้อมูลขนาดใหญ่ของคุณได้ ตัวอย่างรหัสต่อไปนี้ใช้ World Bank API เพื่อรับข้อมูลเกี่ยวกับประเทศต่าง ๆ ทั่วโลก

# Use any requests from the python requests library


def world_bank_request(country):
    return Request(
        "GET", "http://api.worldbank.org/v2/country/{}?format=json".format(country)
    )


# Create a dataframe with specifies which countries we want data on
df = spark.createDataFrame([("br",), ("usa",)], ["country"]).withColumn(
    "request", http_udf(world_bank_request)(col("country"))
)

# Much faster for big data because of the concurrency :)
client = (
    HTTPTransformer().setConcurrency(3).setInputCol("request").setOutputCol("response")
)

# Get the body of the response


def get_response_body(resp):
    return resp.entity.content.decode()


# Show the details of the country data returned
display(
    client.transform(df).select(
        "country", udf(get_response_body)(col("response")).alias("response")
    )
)

วิธีดําเนินการจัดประเภทงานเดียวกันที่มีและไม่มี SynapseML
วิธีใช้แบบจําลอง KNN กับ SynapseML
วิธีใช้ ONNX กับ SynapseML - การเรียนรู้เชิงลึก

แชร์ผ่าน

ใช้บริการ Azure AI กับ SynapseML ใน Microsoft Fabric

ข้อกําหนดเบื้องต้น

เตรียมระบบของคุณ

ดําเนินการวิเคราะห์ความคิดเห็นเกี่ยวกับข้อความ

ดําเนินการวิเคราะห์ข้อความสําหรับข้อมูลสถานภาพ

แปลข้อความเป็นภาษาอื่น

แยกข้อมูลจากเอกสารลงในข้อมูลที่มีโครงสร้าง

วิเคราะห์และแท็กรูปภาพ

แปลงคําพูดเป็นข้อความ

แปลงข้อความเป็นคําพูด

ตรวจหาความผิดปกติในชุดข้อมูลเวลา

รับข้อมูลจาก API เว็บโดยพลการ

คำติชม

แหล่งทรัพยากรเพิ่มเติม

แชร์ผ่าน

ใช้บริการ Azure AI กับ SynapseML ใน Microsoft Fabric

ข้อกําหนดเบื้องต้น

เตรียมระบบของคุณ

ดําเนินการวิเคราะห์ความคิดเห็นเกี่ยวกับข้อความ

ดําเนินการวิเคราะห์ข้อความสําหรับข้อมูลสถานภาพ

แปลข้อความเป็นภาษาอื่น

แยกข้อมูลจากเอกสารลงในข้อมูลที่มีโครงสร้าง

วิเคราะห์และแท็กรูปภาพ

ค้นหารูปภาพที่เกี่ยวข้องกับการคิวรีภาษาธรรมชาติ

แปลงคําพูดเป็นข้อความ

แปลงข้อความเป็นคําพูด

ตรวจหาความผิดปกติในชุดข้อมูลเวลา

รับข้อมูลจาก API เว็บโดยพลการ

เนื้อหาที่เกี่ยวข้อง

คำติชม

แหล่งทรัพยากรเพิ่มเติม