チュートリアル: ストレージとコストを最小にする (Azure AI 検索での RAG)

[アーティクル]
02/27/2025

Azure AI 検索には、ベクトルインデックスのサイズを小さくするためのいくつかのアプローチが用意されています。これらのアプローチの範囲は、ベクトル圧縮から、検索サービスで何を保存するかをより選択的に選ぶことまで、多岐にわたります。

このチュートリアルでは、使用する既存の検索インデックスを変更します。

データ型の絞り込み
スカラー量子化
検索結果のベクトルをオプトアウトしてストレージを削減

このチュートリアルでは、インデックス作成パイプラインで作成された検索インデックスを再度取り上げます。ここでの更新はすべて既存のコンテンツに影響するため、インデクサーを再実行する必要があります。ただし、検索インデックスを削除するのではなく、2 つ目を作成します。そうすることで、新しい機能を追加した後のベクトルインデックスのサイズ減少を比較できます。

全体として、このチュートリアルで示す手法により、ベクトルストレージを約半分に削減できます。

次のスクリーンショットでは、前のチュートリアルの最初のインデックスと、このチュートリアルで構築されたインデックスを比較しています。

前提条件

このチュートリアルは基本的に、インデックス作成パイプラインの再実行です。このチュートリアルで説明されているすべての Azure リソースとアクセス許可が必要です。

比較のために、Azure AI 検索サービスに既存の py-rag-tutorial-idx インデックスが必要です。サイズはほぼ 2 MB で、ベクトルインデックス部分は 348 KB である必要があります。

また、次のオブジェクトも必要です。

py-rag-tutorial-ds (データソース)
py-rag-tutorial-ss (スキルセット)

サンプルのダウンロード

GitHub から Jupyter Notebook をダウンロードして、Azure AI 検索に要求を送信します。詳細については、「GitHub からファイルをダウンロードする」を参照してください。

ストレージ削減のためにインデックスを更新する

Azure AI 検索には、ベクトルサイズを削減するための複数のアプローチがあり、ベクトルワークロードのコストが削減されます。この手順では、次の機能を使用する新しいインデックスを作成します。

ベクトル圧縮。この機能はスカラー量子化によってを提供されます。
オプションのストレージを削除する。クエリにのみベクトルが必要で、応答ペイロードには不要な場合は、検索結果に使用されるベクトルコピーを削除できます。
データ型を絞り込む。 text_vector フィールドに Collection(Edm.Half) を指定して、受信する float32 ディメンションを float16 として格納できます。これにより、インデックスの占有領域が少なくなります。

これらすべての機能を検索インデックスに指定します。インデックスを読み込んだ後、元のインデックスと新しいインデックスの違いを比較します。

新しいインデックスに py-rag-tutorial-small-vectors-idx という名前を付けます。

新しいインデックスに次の定義を使用します。このスキーマと、関連度の最大化における前のスキーマ更新との違いは、スカラー量子化の新しいクラスと新しい圧縮セクション、text_vector フィールドの新しいデータ型 (Collection(Edm.Half))、false に設定された新しいプロパティ stored です。

from azure.identity import DefaultAzureCredential
from azure.identity import get_bearer_token_provider
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchField,
    SearchFieldDataType,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    AzureOpenAIVectorizer,
    AzureOpenAIVectorizerParameters,
    ScalarQuantizationCompression,
    ScalarQuantizationParameters,
    SearchIndex,
    SemanticConfiguration,
    SemanticPrioritizedFields,
    SemanticField,
    SemanticSearch,
    ScoringProfile,
    TagScoringFunction,
    TagScoringParameters
)

credential = DefaultAzureCredential()

index_name = "py-rag-tutorial-small-vectors-idx"
index_client = SearchIndexClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)  
fields = [
    SearchField(name="parent_id", type=SearchFieldDataType.String),  
    SearchField(name="title", type=SearchFieldDataType.String),
    SearchField(name="locations", type=SearchFieldDataType.Collection(SearchFieldDataType.String), filterable=True),
    SearchField(name="chunk_id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True, analyzer_name="keyword"),  
    SearchField(name="chunk", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),  
    SearchField(name="text_vector", type="Collection(Edm.Half)", vector_search_dimensions=1024, vector_search_profile_name="myHnswProfile", stored= False)
    ]  

# Configure the vector search configuration  
vector_search = VectorSearch(  
    algorithms=[  
        HnswAlgorithmConfiguration(name="myHnsw"),
    ],  
    profiles=[  
        VectorSearchProfile(  
            name="myHnswProfile",  
            algorithm_configuration_name="myHnsw",
            compression_name="myScalarQuantization",
            vectorizer_name="myOpenAI",  
        )
    ],  
    vectorizers=[  
        AzureOpenAIVectorizer(  
            vectorizer_name="myOpenAI",  
            kind="azureOpenAI",  
            parameters=AzureOpenAIVectorizerParameters(  
                resource_url=AZURE_OPENAI_ACCOUNT,  
                deployment_name="text-embedding-3-large",
                model_name="text-embedding-3-large"
            ),
        ),  
    ],
    compressions=[
        ScalarQuantizationCompression(
            compression_name="myScalarQuantization",
            rerank_with_original_vectors=True,
            default_oversampling=10,
            parameters=ScalarQuantizationParameters(quantized_data_type="int8"),
        )
    ]
)

semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="title"),
        keywords_fields=[SemanticField(field_name="locations")],
        content_fields=[SemanticField(field_name="chunk")]
    )
)

semantic_search = SemanticSearch(configurations=[semantic_config])

scoring_profiles = [  
    ScoringProfile(  
        name="my-scoring-profile",
        functions=[
            TagScoringFunction(  
                field_name="locations",  
                boost=5.0,  
                parameters=TagScoringParameters(  
                    tags_parameter="tags",  
                ),  
            ) 
        ]
    )
]

index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search, scoring_profiles=scoring_profiles)  
result = index_client.create_or_update_index(index)  
print(f"{result.name} created")

データソースを作成または再利用する

前のチュートリアルのデータソースの定義を次に示します。検索サービスにこのデータソースが既にある場合は、新しいものの作成をスキップできます。

from azure.search.documents.indexes import SearchIndexerClient
from azure.search.documents.indexes.models import (
    SearchIndexerDataContainer,
    SearchIndexerDataSourceConnection
)

# Create a data source 
indexer_client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)
container = SearchIndexerDataContainer(name="nasa-ebooks-pdfs-all")
data_source_connection = SearchIndexerDataSourceConnection(
    name="py-rag-tutorial-ds",
    type="azureblob",
    connection_string=AZURE_STORAGE_CONNECTION,
    container=container
)
data_source = indexer_client.create_or_update_data_source_connection(data_source_connection)

print(f"Data source '{data_source.name}' created or updated")

スキルセットを作成または再利用する

スキルセットも前のチュートリアルから変更されていません。確認できるようにもう一度ここに示します。

from azure.search.documents.indexes.models import (
    SplitSkill,
    InputFieldMappingEntry,
    OutputFieldMappingEntry,
    AzureOpenAIEmbeddingSkill,
    EntityRecognitionSkill,
    SearchIndexerIndexProjection,
    SearchIndexerIndexProjectionSelector,
    SearchIndexerIndexProjectionsParameters,
    IndexProjectionMode,
    SearchIndexerSkillset,
    CognitiveServicesAccountKey
)

# Create a skillset  
skillset_name = "py-rag-tutorial-ss"

split_skill = SplitSkill(  
    description="Split skill to chunk documents",  
    text_split_mode="pages",  
    context="/document",  
    maximum_page_length=2000,  
    page_overlap_length=500,  
    inputs=[  
        InputFieldMappingEntry(name="text", source="/document/content"),  
    ],  
    outputs=[  
        OutputFieldMappingEntry(name="textItems", target_name="pages")  
    ],  
)  
  
embedding_skill = AzureOpenAIEmbeddingSkill(  
    description="Skill to generate embeddings via Azure OpenAI",  
    context="/document/pages/*",  
    resource_url=AZURE_OPENAI_ACCOUNT,  
    deployment_name="text-embedding-3-large",  
    model_name="text-embedding-3-large",
    dimensions=1536,
    inputs=[  
        InputFieldMappingEntry(name="text", source="/document/pages/*"),  
    ],  
    outputs=[  
        OutputFieldMappingEntry(name="embedding", target_name="text_vector")  
    ],  
)

entity_skill = EntityRecognitionSkill(
    description="Skill to recognize entities in text",
    context="/document/pages/*",
    categories=["Location"],
    default_language_code="en",
    inputs=[
        InputFieldMappingEntry(name="text", source="/document/pages/*")
    ],
    outputs=[
        OutputFieldMappingEntry(name="locations", target_name="locations")
    ]
)
  
index_projections = SearchIndexerIndexProjection(  
    selectors=[  
        SearchIndexerIndexProjectionSelector(  
            target_index_name=index_name,  
            parent_key_field_name="parent_id",  
            source_context="/document/pages/*",  
            mappings=[  
                InputFieldMappingEntry(name="chunk", source="/document/pages/*"),  
                InputFieldMappingEntry(name="text_vector", source="/document/pages/*/text_vector"),
                InputFieldMappingEntry(name="locations", source="/document/pages/*/locations"),  
                InputFieldMappingEntry(name="title", source="/document/metadata_storage_name"),  
            ],  
        ),  
    ],  
    parameters=SearchIndexerIndexProjectionsParameters(  
        projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS  
    ),  
) 

cognitive_services_account = CognitiveServicesAccountKey(key=AZURE_AI_MULTISERVICE_KEY)

skills = [split_skill, embedding_skill, entity_skill]

skillset = SearchIndexerSkillset(  
    name=skillset_name,  
    description="Skillset to chunk documents and generating embeddings",  
    skills=skills,  
    index_projection=index_projections,
    cognitive_services_account=cognitive_services_account
)
  
client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)  
client.create_or_update_skillset(skillset)  
print(f"{skillset.name} created")

新しいインデクサーを作成してインデックスを読み込む

新しいインデックスを使用して既存のインデクサーをリセットして再実行することもできますが、新しいインデクサーを作成するのも同じくらい簡単です。 2 つのインデックスとインデクサーを使用すると、実行履歴が保持され、より詳細な比較が可能になります。

このインデクサーは、このチュートリアルの新しいインデックスを指定する点を除き、前のインデクサーと同じです。

from azure.search.documents.indexes.models import (
    SearchIndexer
)

# Create an indexer  
indexer_name = "py-rag-tutorial-small-vectors-idxr" 

indexer_parameters = None

indexer = SearchIndexer(  
    name=indexer_name,  
    description="Indexer to index documents and generate embeddings",
    target_index_name="py-rag-tutorial-small-vectors-idx",
    skillset_name="py-rag-tutorial-ss", 
    data_source_name="py-rag-tutorial-ds",
    parameters=indexer_parameters
)  

# Create and run the indexer  
indexer_client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)  
indexer_result = indexer_client.create_or_update_indexer(indexer)  

print(f' {indexer_name} is created and running. Give the indexer a few minutes before running a query.')

最後の手順として、Azure portal に切り替えて、2 つのインデックスのベクトルストレージ要件を比較します。次のスクリーンショットのような結果が表示されます。

このチュートリアルで作成したインデックスでは、テキストベクトルに半精度浮動小数点数 (float16) を使用しています。これにより、単精度浮動小数点数 (float32) を使用した前のインデックスと比較して、ベクトルのストレージ要件が半減します。スカラー圧縮と、1 つのベクトルセットを省略することによって、残りのストレージが節約されます。ベクトルサイズの縮小の詳細については、「ベクトルのストレージと処理を最適化するための方法を選択する」を参照してください。

クエリの速度と有効性を比較できるように、前のチュートリアルのクエリを再度確認することを検討してください。クエリを繰り返すたびに LLM 出力に多少の変化が生じることは予想されますが、一般的には、実装したストレージ節約手法によって検索結果の品質が低下することはありません。

次のステップ

すべての Azure SDK には、Azure AI 検索のプログラミング機能を提供するコードサンプルがあります。また、特定のユースケースとテクノロジの組み合わせを示すベクトルサンプルコードを確認することもできます。

azure-search-vector-samples

次の方法で共有

チュートリアル: ストレージとコストを最小にする (Azure AI 検索での RAG)

前提条件

サンプルのダウンロード

ストレージ削減のためにインデックスを更新する

データソースを作成または再利用する

スキルセットを作成または再利用する

新しいインデクサーを作成してインデックスを読み込む

次のステップ

フィードバック

その他のリソース

次の方法で共有

チュートリアル: ストレージとコストを最小にする (Azure AI 検索での RAG)

前提条件

サンプルのダウンロード

ストレージ削減のためにインデックスを更新する

データ ソースを作成または再利用する

スキルセットを作成または再利用する

新しいインデクサーを作成してインデックスを読み込む

次のステップ

フィードバック

その他のリソース

データソースを作成または再利用する