チュートリアル: 関連度を最大化する (Azure AI Search の RAG)

[アーティクル]
03/11/2025

このチュートリアルでは、RAG ソリューションで使用される検索結果の関連度を向上させる方法について説明します。関連度の調整は、ユーザーの期待を満たす RAG ソリューションを提供するうえで重要な要素になる可能性があります。 Azure AI Search では、関連度調整に L2 セマンティックのランク付けプロファイルとスコアリングプロファイルが含まれます。

これらの機能を実装するには、インデックススキーマを見直し、セマンティックのランク付けプロファイルとスコアリングプロファイルの構成を追加します。その後、新しいコンストラクトを使用してクエリを再実行します。

このチュートリアルでは、使用する既存の検索インデックスとクエリを変更します。

L2 セマンティックランク付け
ドキュメントブースティングのスコアリングプロファイル

このチュートリアルでは、インデックス作成パイプラインで作成された検索インデックスを更新します。更新は既存のコンテンツに影響を与えないため、リビルドは必要なく、インデクサーを再実行する必要はありません。

Note

ベクトルクエリの重み付けや最小しきい値の設定など、プレビューには関連度機能が他にもありますが、プレビューであるため、このチュートリアルではそれらを省略します。

前提条件

Python 拡張機能と Jupyter パッケージを持つ Visual Studio Code。
Azure AI Search、マネージド ID とセマンティックランク付けのための Basic 以上のレベル、Azure OpenAI および Azure AI Services と同じリージョン。
Azure OpenAI、text-embedding-002 と gpt-35-turbo のデプロイ、Azure AI Search と同じリージョン。

サンプルのダウンロード

サンプルノートブックには、更新されたインデックスとクエリ要求が含まれています。

比較のためにベースラインクエリを実行する

「海と大きな水域に固有の雲の形成はありますか?」という新しいクエリから始めましょう。

関連度機能を追加した後の結果を比較するには、セマンティックランク付けまたはスコアリングプロファイルを追加する前に、既存のインデックススキーマに対してクエリを実行します。

Azure Government クラウドの場合は、トークンプロバイダーの API エンドポイントを "https://cognitiveservices.azure.us/.default" に変更します。

from azure.search.documents import SearchClient
from openai import AzureOpenAI

token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
openai_client = AzureOpenAI(
     api_version="2024-06-01",
     azure_endpoint=AZURE_OPENAI_ACCOUNT,
     azure_ad_token_provider=token_provider
 )

deployment_name = "gpt-4o"

search_client = SearchClient(
     endpoint=AZURE_SEARCH_SERVICE,
     index_name=index_name,
     credential=credential
 )

GROUNDED_PROMPT="""
You are an AI assistant that helps users learn from the information found in the source material.
Answer the query using only the sources provided below.
Use bullets if the answer has multiple points.
If the answer is longer than 3 sentences, provide a summary.
Answer ONLY with the facts listed in the list of sources below. Cite your source when you answer the question
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
Query: {query}
Sources:\n{sources}
"""

# Focused query on cloud formations and bodies of water
query="Are there any cloud formations specific to oceans and large bodies of water?"
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")

search_results = search_client.search(
    search_text=query,
    vector_queries= [vector_query],
    select=["title", "chunk", "locations"],
    top=5,
)

sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])

response = openai_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
        }
    ],
    model=deployment_name
)

print(response.choices[0].message.content)

この要求からの出力は次の例のようになります。

Yes, there are cloud formations specific to oceans and large bodies of water. 
A notable example is "cloud streets," which are parallel rows of clouds that form over 
the Bering Strait in the Arctic Ocean. These cloud streets occur when wind blows from 
a cold surface like sea ice over warmer, moister air near the open ocean, leading to 
the formation of spinning air cylinders. Clouds form along the upward cycle of these cylinders, 
while skies remain clear along the downward cycle (Source: page-21.pdf).

セマンティックランク付けとスコアリングプロファイルを更新する

前のチュートリアルでは、RAG ワークロード用のインデックススキーマを設計しました。基礎に焦点を当てることができるよう、そのスキーマから関連度強化を意図的に省略しました。関連度を後回しにして別個の演習にすることで、更新後に検索結果の品質が前後で比較されます。

セマンティックのランク付けプロファイルとスコアリングプロファイルのクラスが含まれるよう、import ステートメントを更新します。

 from azure.identity import DefaultAzureCredential
 from azure.identity import get_bearer_token_provider
 from azure.search.documents.indexes import SearchIndexClient
 from azure.search.documents.indexes.models import (
     SearchField,
     SearchFieldDataType,
     VectorSearch,
     HnswAlgorithmConfiguration,
     VectorSearchProfile,
     AzureOpenAIVectorizer,
     AzureOpenAIVectorizerParameters,
     SearchIndex,
     SemanticConfiguration,
     SemanticPrioritizedFields,
     SemanticField,
     SemanticSearch,
     ScoringProfile,
     TagScoringFunction,
     TagScoringParameters
 )

検索インデックスに次のセマンティック構成を追加します。この例は、ノートブックのスキーマ更新手順にあります。
```
# New semantic configuration
semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="title"),
        keywords_fields=[SemanticField(field_name="locations")],
        content_fields=[SemanticField(field_name="chunk")]
    )
)

# Create the semantic settings with the configuration
semantic_search = SemanticSearch(configurations=[semantic_config])
```
セマンティック構成には名前があり、フィールドの優先一覧があり、セマンティックランカーへの入力を最適化するのに役立ちます。詳細については、セマンティックランク付けの構成に関するページを参照してください。
次に、スコアリングプロファイルの定義を追加します。セマンティック構成と同様に、スコアリングプロファイルはいつでもインデックススキーマに追加できます。この例は、セマンティック構成に続く、ノートブックのスキーマ更新手順にもあります。
```
# New scoring profile
scoring_profiles = [  
    ScoringProfile(  
        name="my-scoring-profile",
        functions=[
            TagScoringFunction(  
                field_name="locations",  
                boost=5.0,  
                parameters=TagScoringParameters(  
                    tags_parameter="tags",  
                ),  
            ) 
        ]
    )
]
```
このプロファイルでは、場所フィールドで一致が見つかったドキュメントのスコアをブーストするタグ関数が使用されます。検索インデックスには、ベクトルフィールドと、タイトル、チャンク、場所に対する複数の非ベクトルフィールドがあることを思い出してください。場所フィールドは文字列コレクションであり、文字列コレクションはスコアリングプロファイルのタグ関数を使用してブーストできます。詳細については、スコアリングプロファイルの追加に関するページとドキュメントブースティングによる検索関連度の強化に関するページ (ブログ投稿) を参照してください。

検索サービスのインデックス定義を更新します。

# Update the search index with the semantic configuration
 index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search, scoring_profiles=scoring_profiles)  
 result = index_client.create_or_update_index(index)  
 print(f"{result.name} updated")

セマンティックのランク付けプロファイルとスコアリングプロファイルを更新する

前のチュートリアルでは、検索エンジンで実行されるクエリを実行し、チャットの入力候補のために LLM に応答とその他の情報を渡しました。

この例では、セマンティック構成とスコアリングプロファイルが含まれるようにクエリ要求を変更します。

Azure Government クラウドの場合は、トークンプロバイダーの API エンドポイントを "https://cognitiveservices.azure.us/.default" に変更します。

# Import libraries
from azure.search.documents import SearchClient
from openai import AzureOpenAI

token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
openai_client = AzureOpenAI(
     api_version="2024-06-01",
     azure_endpoint=AZURE_OPENAI_ACCOUNT,
     azure_ad_token_provider=token_provider
 )

deployment_name = "gpt-4o"

search_client = SearchClient(
     endpoint=AZURE_SEARCH_SERVICE,
     index_name=index_name,
     credential=credential
 )

# Prompt is unchanged in this update
GROUNDED_PROMPT="""
You are an AI assistant that helps users learn from the information found in the source material.
Answer the query using only the sources provided below.
Use bullets if the answer has multiple points.
If the answer is longer than 3 sentences, provide a summary.
Answer ONLY with the facts listed in the list of sources below.
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
Query: {query}
Sources:\n{sources}
"""

# Queries are unchanged in this update
query="Are there any cloud formations specific to oceans and large bodies of water?"
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")

# Add query_type semantic and semantic_configuration_name
# Add scoring_profile and scoring_parameters
search_results = search_client.search(
    query_type="semantic",
    semantic_configuration_name="my-semantic-config",
    scoring_profile="my-scoring-profile",
    scoring_parameters=["tags-ocean, 'sea surface', seas, surface"],
    search_text=query,
    vector_queries= [vector_query],
    select="title, chunk, locations",
    top=5,
)
sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])

response = openai_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
        }
    ],
    model=deployment_name
)

print(response.choices[0].message.content)

セマンティックにランク付けされ、ブーストされたクエリからの出力は次の例のようになります。

Yes, there are specific cloud formations influenced by oceans and large bodies of water:

- **Stratus Clouds Over Icebergs**: Low stratus clouds can frame holes over icebergs, 
such as Iceberg A-56 in the South Atlantic Ocean, likely due to thermal instability caused 
by the iceberg (source: page-39.pdf).

- **Undular Bores**: These are wave structures in the atmosphere created by the collision 
of cool, dry air from a continent with warm, moist air over the ocean, as seen off the 
coast of Mauritania (source: page-23.pdf).

- **Ship Tracks**: These are narrow clouds formed by water vapor condensing around tiny 
particles from ship exhaust. They are observed over the oceans, such as in the Pacific Ocean 
off the coast of California (source: page-31.pdf).

These specific formations are influenced by unique interactions between atmospheric conditions 
and the presence of large water bodies or objects within them.

セマンティックのランク付けプロファイルとスコアリングプロファイルを追加すると、スコア付け基準を満たし、意味的に関連する結果を昇格させることで、LLM からの応答にプラスの影響を与えます。

インデックスとクエリの設計について理解を深めたので、速度と簡潔性の最適化に進みましょう。量子化とストレージ削減を実装するためにスキーマ定義を見直しますが、残りのパイプラインとモデルについてはそのままです。

次のステップ

ベクトルストレージとコストを最小にする

次の方法で共有

チュートリアル: 関連度を最大化する (Azure AI Search の RAG)

前提条件

サンプルのダウンロード

比較のためにベースラインクエリを実行する

セマンティックランク付けとスコアリングプロファイルを更新する

セマンティックのランク付けプロファイルとスコアリングプロファイルを更新する

次のステップ

フィードバック

その他のリソース

次の方法で共有

チュートリアル: 関連度を最大化する (Azure AI Search の RAG)

前提条件

サンプルのダウンロード

比較のためにベースライン クエリを実行する

セマンティック ランク付けとスコアリング プロファイルを更新する

セマンティックのランク付けプロファイルとスコアリング プロファイルを更新する

次のステップ

フィードバック

その他のリソース

比較のためにベースラインクエリを実行する

セマンティックランク付けとスコアリングプロファイルを更新する

セマンティックのランク付けプロファイルとスコアリングプロファイルを更新する