クイックスタート: .NET または Python を使用したセマンティックランキング

[アーティクル]
10/28/2024

Azure AI 検索のセマンティックランク付けは、Microsoft の機械読解を使用して検索結果を再スコアリングして、セマンティック関連性が最も高い一致を一覧の最上位に昇格させる、クエリ側の機能です。コンテンツとクエリに応じて、セマンティック優先度付けは、開発者の作業を最小限に抑えながら、検索の関連性を大幅に向上させることができます。

このクイックスタートでは、セマンティックランカーを呼び出すインデックスとクエリの変更について説明します。

Note

ChatGPT 相互作用を使用した Azure AI 検索ソリューションの例については、このデモまたはこのアクセラレータを参照してください。

前提条件

アクティブなサブスクリプションが含まれる Azure アカウント。無料でアカウントを作成できます。
セマンティックランカーが有効になっている Basic レベル以上の Azure AI 検索リソース。
API キーと検索サービスエンドポイント。 Azure portal にサインインし、ご利用の検索サービスを探します。

[概要] で URL をコピーし、後の手順のために保存します。たとえば、エンドポイントは https://mydemo.search.windows.net のようになります。

[キー] で、オブジェクトを作成および削除するための完全な権限を持つ管理キーをコピーして保存します。キーには、プライマリとセカンダリの 2 つがあり、どちらでも同じように機能しますどちらかを選択します。

セマンティック優先度付けを追加する

セマンティックランカーを使用するには、検索インデックスに "セマンティック構成" を追加し、クエリにパラメーターを追加します。既存のインデックスがある場合は、検索可能なコンテンツの構造に影響がないため、コンテンツのインデックスを再作成しなくても、これらの変更を行うことができます。

セマンティック構成では、セマンティックの再ランク付けで使用されるタイトル、キーワード、コンテンツを提供するフィールドの優先順位を設定します。フィールドの優先順位付けを行うと、処理を高速化できます。
セマンティックランカーを呼び出すクエリには、クエリの種類、およびキャプションと回答が返されるかどうかを指定するパラメーターが含まれます。これらのパラメーターは、既存のクエリロジックに追加できます。他のパラメーターとの競合はありません。

.NET
Python

Azure.Search.Documents クライアントライブラリを使用してコンソールアプリケーションをビルドし、既存の検索インデックスにセマンティックランク付けを追加します。

または、ソースコードをダウンロードして、完了したプロジェクトを開始することもできます。

環境を設定する

Visual Studio を起動し、コンソールアプリ用の新しいプロジェクトを作成します。
[ツール]>[NuGet パッケージマネージャー] で、 [ソリューションの NuGet パッケージの管理] を選択します。
[参照] を選択します。
Azure.Search.Documents パッケージを検索し、最新の安定バージョンを選択します。
[インストール] を選択して、プロジェクトとソリューションにアセンブリを追加します。

検索クライアントを作成する

Program.cs で次の using ディレクティブを追加します。

using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Indexes.Models;
using Azure.Search.Documents.Models;

2 つのクライアントを作成します。SearchIndexClient はインデックスを作成するクライアントで、SearchClient は既存のインデックスを読み込んで照会するクライアントです。

どちらのクライアントにも、作成と削除の権限による認証のためのサービスエンドポイントと管理者 API キーが必要です。ただし、コードによって URI が構築されるため、serviceName プロパティには検索サービス名のみを指定します。 https:// または .search.windows.net は含めないでください。

 static void Main(string[] args)
 {
     string serviceName = "<YOUR-SEARCH-SERVICE-NAME>";
     string apiKey = "<YOUR-SEARCH-ADMIN-API-KEY>";
     string indexName = "hotels-quickstart";


     // Create a SearchIndexClient to send create/delete index commands
     Uri serviceEndpoint = new Uri($"https://{serviceName}.search.windows.net/");
     AzureKeyCredential credential = new AzureKeyCredential(apiKey);
     SearchIndexClient adminClient = new SearchIndexClient(serviceEndpoint, credential);

     // Create a SearchClient to load and query documents
     SearchClient srchclient = new SearchClient(serviceEndpoint, indexName, credential);
     . . . 
 }

インデックスを作成する

SemanticConfiguration を含むようにインデックススキーマを作成または更新します。既存のインデックスを更新する場合、ドキュメントの構造は変更されないため、この変更ではインデックスの再作成は必要ありません。

// Create hotels-quickstart index
private static void CreateIndex(string indexName, SearchIndexClient adminClient)
{

    FieldBuilder fieldBuilder = new FieldBuilder();
    var searchFields = fieldBuilder.Build(typeof(Hotel));

    var definition = new SearchIndex(indexName, searchFields);
    var suggester = new SearchSuggester("sg", new[] { "HotelName", "Category", "Address/City", "Address/StateProvince" });
    definition.Suggesters.Add(suggester);
    definition.SemanticSearch = new SemanticSearch
    {
        Configurations =
        {
            new SemanticConfiguration("my-semantic-config", new()
            {
                TitleField = new SemanticField("HotelName"),
                ContentFields =
                {
                    new SemanticField("Description"),
                    new SemanticField("Description_fr")
                },
                KeywordsFields =
                {
                    new SemanticField("Tags"),
                    new SemanticField("Category")
                }
            })
        }
    };

    adminClient.CreateOrUpdateIndex(definition);
}

次のコードで、検索サービスにインデックスを作成します。

// Create index
Console.WriteLine("{0}", "Creating index...\n");
CreateIndex(indexName, adminClient);

SearchClient ingesterClient = adminClient.GetSearchClient(indexName);

ドキュメントを読み込む

Azure AI Search は、サービスに保存されているコンテンツを検索します。ドキュメントをアップロードするコードは C# 用のフルテキスト検索のクイックスタートと同じであるため、ここで複製する必要はありません。名前、住所、説明を含む 4 つのホテルが必要です。ソリューションには、ホテルと住所の種類が必要です。

インデックスを検索する

パラメーターを指定するための検索オプションを使用してセマンティックランカーを呼び出すクエリを次に示します。

Console.WriteLine("Example of a semantic query.");

options = new SearchOptions()
{
    QueryType = Azure.Search.Documents.Models.SearchQueryType.Semantic,
    SemanticSearch = new()
    {
        SemanticConfigurationName = "my-semantic-config",
        QueryCaption = new(QueryCaptionType.Extractive)
    }
};
options.Select.Add("HotelName");
options.Select.Add("Category");
options.Select.Add("Description");

// response = srchclient.Search<Hotel>("*", options);
response = srchclient.Search<Hotel>("what hotel has a good restaurant on site", options);
WriteDocuments(response);

比較のために、用語の頻度と近接度に基づいて、既定の BM25 ランク付けを使用するクエリの結果を次に示します。 "what hotel has a good restaurant on site (良い館内レストランがあるホテルはどこか)" というクエリを指定すると、BM25 ランク付けアルゴリズムは、次のスクリーンショットの順序で一致を返します。

BM25 でランク付けされた一致を示すスクリーンショット。

これに対し、セマンティックランク付けが同じクエリ ("what hotel has a good restaurant on site (良い館内レストランがあるホテルはどこか)") に適用されている場合、クエリに対するセマンティック関連性に基づいて結果が再ランク付けされます。今回の上位の結果は、ユーザーの期待により合致したレストランのあるホテルです。

セマンティックランクに基づいてランク付けされたマッチを示すスクリーンショット。

プログラムを実行する

F5 キーを押して、アプリをリビルドし、プログラム全体を実行します。

出力には、Console.WriteLine からのメッセージに加え、クエリの情報と結果が表示されます。

セマンティックランク付けについて学習するには、Azure SDK for Python で azure-search-documents ライブラリと Jupyter ノートブックを使用してください。

あるいは、完成したノートブックをダウンロードして実行することもできます。

環境を設定する

Visual Studio Code と Python 拡張機能 (または同等の IDE)、および Python 3.10 以降。

このクイックスタートでは仮想環境をお勧めします。

Visual Studio Code を起動します。
新しい ipynb ファイルを作成します。
Ctrl + Shift + P キーを使用してコマンドパレットを開きます。
"Python: 環境の作成" を検索します。
Venv. を選択
Python インタープリターを選択します。 3.10 以降を選択します。

短時間で設定されます。問題が発生した場合は、「VS Code での Python 環境」を参照してください。

パッケージをインストールし、変数を設定する

azure-search-documents などのパッケージをインストールします。

! pip install azure-search-documents==11.6.0b1 --quiet
! pip install azure-identity --quiet
! pip install python-dotenv --quiet

エンドポイントと API キーを指定します。

search_endpoint: str = "PUT-YOUR-SEARCH-SERVICE-ENDPOINT-HERE"
search_api_key: str = "PUT-YOUR-SEARCH-SERVICE-ADMIN-API-KEY-HERE"
index_name: str = "hotels-quickstart"

インデックスを作成する

from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents import SearchClient
from azure.search.documents.indexes.models import (
    ComplexField,
    SimpleField,
    SearchFieldDataType,
    SearchableField,
    SearchIndex,
    SemanticConfiguration,
    SemanticField,
    SemanticPrioritizedFields,
    SemanticSearch
)

# Create a search schema
index_client = SearchIndexClient(
    endpoint=search_endpoint, credential=credential)
fields = [
        SimpleField(name="HotelId", type=SearchFieldDataType.String, key=True),
        SearchableField(name="HotelName", type=SearchFieldDataType.String, sortable=True),
        SearchableField(name="Description", type=SearchFieldDataType.String, analyzer_name="en.lucene"),
        SearchableField(name="Description_fr", type=SearchFieldDataType.String, analyzer_name="fr.lucene"),
        SearchableField(name="Category", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),

        SearchableField(name="Tags", collection=True, type=SearchFieldDataType.String, facetable=True, filterable=True),

        SimpleField(name="ParkingIncluded", type=SearchFieldDataType.Boolean, facetable=True, filterable=True, sortable=True),
        SimpleField(name="LastRenovationDate", type=SearchFieldDataType.DateTimeOffset, facetable=True, filterable=True, sortable=True),
        SimpleField(name="Rating", type=SearchFieldDataType.Double, facetable=True, filterable=True, sortable=True),

        ComplexField(name="Address", fields=[
            SearchableField(name="StreetAddress", type=SearchFieldDataType.String),
            SearchableField(name="City", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
            SearchableField(name="StateProvince", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
            SearchableField(name="PostalCode", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
            SearchableField(name="Country", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
        ])
    ]

semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="HotelName"),
        keywords_fields=[SemanticField(field_name="Category")],
        content_fields=[SemanticField(field_name="Description")]
    )
)

# Create the semantic settings with the configuration
semantic_search = SemanticSearch(configurations=[semantic_config])

semantic_settings = SemanticSearch(configurations=[semantic_config])
scoring_profiles = []
suggester = [{'name': 'sg', 'source_fields': ['Tags', 'Address/City', 'Address/Country']}]

# Create the search index with the semantic settings
index = SearchIndex(name=index_name, fields=fields, suggesters=suggester, scoring_profiles=scoring_profiles, semantic_search=semantic_search)
result = index_client.create_or_update_index(index)
print(f' {result.name} created')

ドキュメントペイロードを作成する

JSON ドキュメントを検索インデックスにプッシュできます。ドキュメントはインデックススキーマと一致する必要があります。

documents = [
    {
    "@search.action": "upload",
    "HotelId": "1",
    "HotelName": "Stay-Kay City Hotel",
    "Description": "The hotel is ideally located on the main commercial artery of the city in the heart of New York. A few minutes away is Time's Square and the historic centre of the city, as well as other places of interest that make New York one of America's most attractive and cosmopolitan cities.",
    "Description_fr": "L'hôtel est idéalement situé sur la principale artère commerciale de la ville en plein cœur de New York. A quelques minutes se trouve la place du temps et le centre historique de la ville, ainsi que d'autres lieux d'intérêt qui font de New York l'une des villes les plus attractives et cosmopolites de l'Amérique.",
    "Category": "Boutique",
    "Tags": [ "pool", "air conditioning", "concierge" ],
    "ParkingIncluded": "false",
    "LastRenovationDate": "1970-01-18T00:00:00Z",
    "Rating": 3.60,
    "Address": {
        "StreetAddress": "677 5th Ave",
        "City": "New York",
        "StateProvince": "NY",
        "PostalCode": "10022",
        "Country": "USA"
        }
    },
    {
    "@search.action": "upload",
    "HotelId": "2",
    "HotelName": "Old Century Hotel",
    "Description": "The hotel is situated in a  nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts.",
    "Description_fr": "L'hôtel est situé dans une place du XIXe siècle, qui a été agrandie et rénovée aux plus hautes normes architecturales pour créer un hôtel moderne, fonctionnel et de première classe dans lequel l'art et les éléments historiques uniques coexistent avec le confort le plus moderne.",
    "Category": "Boutique",
    "Tags": [ "pool", "free wifi", "concierge" ],
    "ParkingIncluded": "false",
    "LastRenovationDate": "1979-02-18T00:00:00Z",
    "Rating": 3.60,
    "Address": {
        "StreetAddress": "140 University Town Center Dr",
        "City": "Sarasota",
        "StateProvince": "FL",
        "PostalCode": "34243",
        "Country": "USA"
        }
    },
    {
    "@search.action": "upload",
    "HotelId": "3",
    "HotelName": "Gastronomic Landscape Hotel",
    "Description": "The Hotel stands out for its gastronomic excellence under the management of William Dough, who advises on and oversees all of the Hotel's restaurant services.",
    "Description_fr": "L'hôtel est situé dans une place du XIXe siècle, qui a été agrandie et rénovée aux plus hautes normes architecturales pour créer un hôtel moderne, fonctionnel et de première classe dans lequel l'art et les éléments historiques uniques coexistent avec le confort le plus moderne.",
    "Category": "Resort and Spa",
    "Tags": [ "air conditioning", "bar", "continental breakfast" ],
    "ParkingIncluded": "true",
    "LastRenovationDate": "2015-09-20T00:00:00Z",
    "Rating": 4.80,
    "Address": {
        "StreetAddress": "3393 Peachtree Rd",
        "City": "Atlanta",
        "StateProvince": "GA",
        "PostalCode": "30326",
        "Country": "USA"
        }
    },
    {
    "@search.action": "upload",
    "HotelId": "4",
    "HotelName": "Sublime Palace Hotel",
    "Description": "Sublime Palace Hotel is located in the heart of the historic center of Sublime in an extremely vibrant and lively area within short walking distance to the sites and landmarks of the city and is surrounded by the extraordinary beauty of churches, buildings, shops and monuments. Sublime Palace is part of a lovingly restored 1800 palace.",
    "Description_fr": "Le Sublime Palace Hotel est situé au coeur du centre historique de sublime dans un quartier extrêmement animé et vivant, à courte distance de marche des sites et monuments de la ville et est entouré par l'extraordinaire beauté des églises, des bâtiments, des commerces et Monuments. Sublime Palace fait partie d'un Palace 1800 restauré avec amour.",
    "Category": "Boutique",
    "Tags": [ "concierge", "view", "24-hour front desk service" ],
    "ParkingIncluded": "true",
    "LastRenovationDate": "1960-02-06T00:00:00Z",
    "Rating": 4.60,
    "Address": {
        "StreetAddress": "7400 San Pedro Ave",
        "City": "San Antonio",
        "StateProvince": "TX",
        "PostalCode": "78216",
        "Country": "USA"
        }
    }
]

インデックスにドキュメントをアップロードする

search_client = SearchClient(endpoint=search_endpoint,
                      index_name=index_name,
                      credential=credential)
try:
    result = search_client.upload_documents(documents=documents)
    print("Upload of new document succeeded: {}".format(result[0].succeeded))
except Exception as ex:
    print (ex.message)


    index_client = SearchIndexClient(
    endpoint=search_endpoint, credential=credential)

最初のクエリを実行する

検証手順として空のクエリから開始し、インデックスが動作可能であることを証明します。ホテル名と説明の順序付けられていないリストを取得し、項目数は 4 (インデックスに 4 つのドキュメントがあることを示す) である必要があります。

# Run an empty query (returns selected fields, all documents)
results =  search_client.search(query_type='simple',
    search_text="*" ,
    select='HotelName,Description',
    include_total_count=True)

print ('Total Documents Matching Query:', results.get_count())
for result in results:
    print(result["@search.score"])
    print(result["HotelName"])
    print(f"Description: {result['Description']}")

テキストクエリを実行する

比較する目的で、BM25 関連性スコアリングを使用してテキストクエリを実行します。クエリ文字列を指定すると、フルテキスト検索が呼び出されます。応答はランク付けされた結果で構成され、一致する用語のインスタンスがより多いドキュメント、またはより重要な用語があるドキュメントに高いスコアが付与されます。

この what hotel has a good restaurant on site (館内においしいレストランがあるホテルはどこですか) というクエリでは、説明に site が含まれているため、Sublime Palace Hotel が上位になります。出現頻度の低い用語の場合、ドキュメントの検索スコアが上がります。

# Run a text query (returns a BM25-scored result set)
results =  search_client.search(query_type='simple',
    search_text="what hotel has a good restaurant on site" ,
    select='HotelName,HotelId,Description',
    include_total_count=True)
    
for result in results:
    print(result["@search.score"])
    print(result["HotelName"])
    print(f"Description: {result['Description']}")

セマンティッククエリを実行する

次に、セマンティックランク付けを追加します。新しいパラメータには、 query_type と semantic_configuration_nameが含まれます。

これは同じクエリですが、セマンティックランカーにより、最初のクエリを前提に、Gastronomic Landscape Hotel がより関連性の高い結果として正しく識別されていることに注目してください。このクエリでは、モデルによって生成されたキャプションも返されます。このサンプルでは入力が最小限であるため興味深いキャプションを作成できませんでしたが、この例では正常に構文がデモンストレーションされています。

# Runs a semantic query (runs a BM25-ranked query and promotes the most relevant matches to the top)
results =  search_client.search(query_type='semantic', semantic_configuration_name='my-semantic-config',
    search_text="what hotel has a good restaurant on site", 
    select='HotelName,Description,Category', query_caption='extractive')

for result in results:
    print(result["@search.reranker_score"])
    print(result["HotelName"])
    print(f"Description: {result['Description']}")

    captions = result["@search.captions"]
    if captions:
        caption = captions[0]
        if caption.highlights:
            print(f"Caption: {caption.highlights}\n")
        else:
            print(f"Caption: {caption.text}\n")

セマンティック回答を返す

この最後のクエリでは、セマンティック回答を返します。

セマンティックランカーでは、質問の特性を持つクエリ文字列に対する回答を生成できます。生成された回答は、コンテンツから逐語的に抽出されます。セマンティック回答を取得するには、質問と回答を厳密に揃え、モデルは質問に明確に答えるコンテンツを見つける必要があります。候補の回答が信頼度のしきい値を満たしていない場合、モデルは回答を返しません。この例ではデモ用に、質問が応答を取得するように設計されているため、構文を確認できます。

# Run a semantic query that returns semantic answers  
results =  search_client.search(query_type='semantic', semantic_configuration_name='my-semantic-config',
 search_text="what hotel is in a historic building",
 select='HotelName,Description,Category', query_caption='extractive', query_answer="extractive",)

semantic_answers = results.get_answers()
for answer in semantic_answers:
    if answer.highlights:
        print(f"Semantic Answer: {answer.highlights}")
    else:
        print(f"Semantic Answer: {answer.text}")
    print(f"Semantic Answer Score: {answer.score}\n")

for result in results:
    print(result["@search.reranker_score"])
    print(result["HotelName"])
    print(f"Description: {result['Description']}")

    captions = result["@search.captions"]
    if captions:
        caption = captions[0]
        if caption.highlights:
            print(f"Caption: {caption.highlights}\n")
        else:
            print(f"Caption: {caption.text}\n")

リソースをクリーンアップする

独自のサブスクリプションを使用している場合は、プロジェクトの最後に、作成したリソースがまだ必要かどうかを確認してください。リソースを実行したままにすると、お金がかかる場合があります。リソースを個別に削除するか、リソースグループを削除してリソースのセット全体を削除することができます。

Azure portal 左側のナビゲーションペインにある [すべてのリソース] または [リソースグループ] リンクを使って、リソースを検索および管理できます。

次のステップ

このクイックスタートでは、既存のインデックスにセマンティック優先度付けを呼び出す方法について説明しました。次の手順として、独自のインデックスでセマンティック優先度付けを試すことをお勧めします。デモを続行する場合は、次のリンクを参照してください。

チュートリアル: Web アプリに検索を追加する

次の方法で共有

クイックスタート: .NET または Python を使用したセマンティックランキング

前提条件

セマンティック優先度付けを追加する

環境を設定する

検索クライアントを作成する

インデックスを作成する

ドキュメントを読み込む

インデックスを検索する

プログラムを実行する

環境を設定する

パッケージをインストールし、変数を設定する

インデックスを作成する

ドキュメントペイロードを作成する

インデックスにドキュメントをアップロードする

最初のクエリを実行する

テキストクエリを実行する

セマンティッククエリを実行する

セマンティック回答を返す

リソースをクリーンアップする

次のステップ

フィードバック

その他のリソース

次の方法で共有

クイック スタート: .NET または Python を使用したセマンティック ランキング

前提条件

セマンティック 優先度付けを追加する

環境を設定する

検索クライアントを作成する

インデックスを作成する

ドキュメントを読み込む

インデックスを検索する

プログラムを実行する

リソースをクリーンアップする

次のステップ

フィードバック

その他のリソース

クイックスタート: .NET または Python を使用したセマンティックランキング

セマンティック優先度付けを追加する