在搜尋索引中設定向量化工具

發行項
01/16/2025

在 Azure AI 搜尋中， 向量化工具 是使用 Azure OpenAI 或 Azure AI Vision 上部署的內嵌模型來執行向量化的元件。它會在查詢執行期間將文字（或影像）轉換為向量。

向量化工具定義於搜尋索引、套用至可搜尋向量欄位，並在查詢時用來產生文字或影像查詢輸入的內嵌。如果您需要將內容向量化做為索引程式的一部分，請參閱整合向量化。若要在編製索引期間進行內建向量化，您可以設定索引器和技能集，以呼叫原始文字或影像內容的內嵌模型。

若要將向量化工具新增至搜尋索引，可以在 Azure 入口網站使用索引設計工具，或是呼叫 [ 建立或更新索引] REST API，或使用任何更新的 Azure Beta SDK 套件來提供這項功能。

只要您使用正式推出的技能向量化工具組，向量化工具現在即可正式推出。 AzureOpenAIEmbedding 向量化工具和 AzureOpenAIEmbedding 技能已正式推出。自訂 Web API 向量化工具也已正式推出。

Azure AI 視覺向量化工具、 Azure AI Foundry 模型目錄向量化工具及其對等技能仍處於預覽狀態。您的技能集必須指定 2024-05-01-preview REST API 才能使用預覽技能和向量化工具。

必要條件

Azure AI 搜尋服務上有可搜尋向量欄位的索引。
已部署的內嵌模型（請參閱下一節）。
使用內嵌模型的權限。在 Azure OpenAI 上，呼叫端必須具有認知服務 OpenAI 用戶許可權。或者，您可以提供 API 金鑰。
Visual Studio Code 搭配 REST 用戶端可傳送查詢及接受回應。

建議您在搜尋服務上啟用診斷記錄，以確認向量查詢執行。

支援的內嵌模型

下表列出可與向量化工具搭配使用的內嵌模型。因為您必須使用相同的內嵌模型來編製索引和查詢，因此向量化工具會與在編製索引期間產生內嵌的技能配對。下表列出與特定向量化工具相關聯的技能。

Vectorizer 種類	模型名稱	模型提供者	相關聯的技能
`azureOpenAI`	text-embedding-ada-002， text-embedding-3	Azure OpenAI	AzureOpenAIEmbedding 技能
`aml`	Facebook-DinoV2-Image-Embeddings， Cohere-embed-v3	Azure AI Foundry 模型目錄	AML 技能
`aiServicesVision`	多模式內嵌 4.0 API	Azure AI 視覺（透過 Azure AI 多服務帳戶）	Azure AI Vision 多模式內嵌技能
`customWebApi`	任何內嵌模型	裝載於外部	自訂 Web API 技能

以範例資料試用向量化工具

匯入和向量化資料精靈從 Azure Blob 記憶體讀取檔案、建立具有區塊化和向量化欄位的索引，以及新增向量化工具。根據設計，精靈所建立的向量化工具，設定使用編製 Blob 內容索引的同一個內嵌模型。

將範例資料檔案上傳至 Microsoft Azure 儲存體上的容器。我們使用NASA 衛星影像集的部分小型文字檔，在免費的搜尋服務測試這些指令。
執行匯入和向量化資料精靈，選擇資料來源的 Blob 容器。
選擇 text-embedding-ada-002 的現有部署。此模型在編製索引期間產生內嵌，也可用來設定查詢期間所使用的向量化工具。

精靈完成且所有索引子處理完成之後，您應該會有具有可搜尋向量欄位的索引。欄位的 JSON 定義如下所示：

 {
     "name": "vector",
     "type": "Collection(Edm.Single)",
     "searchable": true,
     "retrievable": true,
     "dimensions": 1536,
     "vectorSearchProfile": "vector-nasa-ebook-text-profile"
 }

您應該也會有向量設定檔和向量化工具，類似下列範例：

"profiles": [
   {
     "name": "vector-nasa-ebook-text-profile",
     "algorithm": "vector-nasa-ebook-text-algorithm",
     "vectorizer": "vector-nasa-ebook-text-vectorizer"
   }
 ],
 "vectorizers": [
   {
     "name": "vector-nasa-ebook-text-vectorizer",
     "kind": "azureOpenAI",
     "azureOpenAIParameters": {
       "resourceUri": "https://my-fake-azure-openai-resource.openai.azure.com",
       "deploymentId": "text-embedding-ada-002",
       "modelName": "text-embedding-ada-002",
       "apiKey": "0000000000000000000000000000000000000",
       "authIdentity": null
     },
     "customWebApiParameters": null
   }
 ]

在查詢執行期間跳過，直接為文字到向量轉換測試向量化工具。

定義向量化工具和向量設定檔

本節說明手動定義向量化工具之索引結構描述的修改內容。

使用建立或更新索引將 vectorizers 新增至搜尋索引。

將下列 JSON 新增至您的索引定義。向量化工具區段提供已部署內嵌模型的連線資訊。此步驟顯示兩個向量化工具範例，讓您可以並排比較 Azure OpenAI 內嵌模型和自訂 Web API。

  "vectorizers": [
    {
      "name": "my_azure_open_ai_vectorizer",
      "kind": "azureOpenAI",
      "azureOpenAIParameters": {
        "resourceUri": "https://url.openai.azure.com",
        "deploymentId": "text-embedding-ada-002",
        "modelName": "text-embedding-ada-002",
        "apiKey": "mytopsecretkey"
      }
    },
    {
      "name": "my_custom_vectorizer",
      "kind": "customWebApi",
      "customVectorizerParameters": {
        "uri": "https://my-endpoint",
        "authResourceId": " ",
        "authIdentity": " "
      }
    }
  ]

在相同的索引中，新增指定其中一個向量化工具的向量設定檔區段。向量設定檔還需要建立導覽結構用的向量搜尋演算法。

"profiles": [ 
    { 
        "name": "my_vector_profile", 
        "algorithm": "my_hnsw_algorithm", 
        "vectorizer":"my_azure_open_ai_vectorizer" 
    }
]

將向量設定檔指派給向量欄位。下列範例顯示欄位集合，其中包含必要的索引鍵欄位、標題字串欄位，以及具有向量設定檔指派的兩個向量欄位。

"fields": [ 
        { 
            "name": "ID", 
            "type": "Edm.String", 
            "key": true, 
            "sortable": true, 
            "analyzer": "keyword" 
        }, 
        { 
            "name": "title", 
            "type": "Edm.String"
        }, 
        { 
            "name": "vector", 
            "type": "Collection(Edm.Single)", 
            "dimensions": 1536, 
            "vectorSearchProfile": "my_vector_profile", 
            "searchable": true, 
            "retrievable": true
        }, 
        { 
            "name": "my-second-vector", 
            "type": "Collection(Edm.Single)", 
            "dimensions": 1024, 
            "vectorSearchProfile": "my_vector_profile", 
            "searchable": true, 
            "retrievable": true
        }
]

測試向量化工具

使用搜尋用戶端，透過向量化工具傳送查詢。此範例假設 Visual Studio Code 搭配 REST 用戶端和範例索引。

在 Visual Studio Code 中，提供搜尋端點和搜尋查詢 API 金鑰：
```
 @baseUrl: 
 @queryApiKey: 00000000000000000000000
```

在向量查詢要求貼上。

 ### Run a query
 POST {{baseUrl}}/indexes/vector-nasa-ebook-txt/docs/search?api-version=2024-07-01 HTTP/1.1
     Content-Type: application/json
     api-key: {{queryApiKey}}

     {
         "count": true,
         "select": "title,chunk",
         "vectorQueries": [
             {
                 "kind": "text",
                 "text": "what cloud formations exists in the troposphere",
                 "fields": "vector",
                 "k": 3,
                 "exhaustive": true
             }
         ]
     }

關於查詢的要點包括：

"kind": "text" 告知搜尋引擎輸入是文字字串，並指示使用與搜尋欄位相關聯的向量化工具。
"text": "what cloud formations exists in the troposphere" 是要向量化的文字字串。
"fields": "vector" 是要查詢的欄位名稱。如果您使用精靈所產生的範例索引，產生的向量欄位會命名為 vector。

傳送要求。您應該會得到三個 k 結果，以第一個結果相關性最高。

請注意，查詢時沒有要設定的向量化工具屬性。查詢根據索引中的向量設定檔欄位指派，讀取向量化工具屬性。

查看記錄

如果您已啟用搜尋服務的診斷記錄，請執行 Kusto 查詢，確認已在向量欄位執行查詢：

OperationEvent
| where TIMESTAMP > ago(30m)
| where Name == "Query.Search" and AdditionalInfo["QueryMetadata"]["Vectors"] has "TextLength"

最佳作法

如果您要設定 Azure OpenAI 向量化工具，請考慮建議用於 Azure OpenAI 內嵌技能的相同最佳做法。

另請參閱

整合向量化 (預覽)

共用方式為