使用 Java 編製 Azure Cosmos DB for NoSQL 向量資料索引並查詢這些向量資料

發行項
12/03/2024
適用於:

✅ NoSQL

使用向量索引和搜尋之前，您必須先啟用此功能。本文涵蓋下列步驟：

在適用於 NoSQL 的 Azure Cosmos DB 中啟用向量搜尋功能
設定 Azure Cosmos DB 容器以進行向量搜尋
製作向量內嵌原則
將向量索引新增至容器索引編製原則
使用向量索引和向量內嵌原則建立容器
對儲存的數據執行向量搜尋

本指南會逐步解說建立向量資料、編製資料索引、然後查詢容器中資料的流程。

必要條件

現有的 Azure Cosmos DB for NoSQL 帳戶。
- 如果您沒有 Azure 訂用帳戶，可以免費試用 Azure Cosmos DB for NoSQL。
- 如果您有現有的 Azure 訂用帳戶，請建立新的 Azure Cosmos DB for NoSQL 帳戶。
最新版的 Azure Cosmos DB Java SDK。

啟用功能

適用於 NoSQL 的 Azure Cosmos DB 向量搜尋需要啟用功能。請遵循下列步驟註冊：

瀏覽至您的 Azure Cosmos DB for NoSQL 資源頁面。
選取 [設定] 功能表項目下的 [功能] 窗格。
選取 [適用於 NoSQL 的 Azure Cosmos DB 中的向量搜尋]。
閱讀功能的描述，以確認您想要啟用此功能。
選取 [啟用] 以在適用於 NoSQL 的 Azure Cosmos DB 中開啟向量搜尋。

提示

或者，使用 Azure CLI 來更新您的帳戶的功能以支援 NoSQL 向量搜尋。

az cosmosdb update \
     --resource-group <resource-group-name> \
     --name <account-name> \
     --capabilities EnableNoSQLVectorSearch

注意

註冊要求將會自動核准;不過，可能需要 15 分鐘才會生效。

了解向量搜尋中涉及的步驟

下列步驟假設您知道如何設定 Cosmos DB NoSQL 帳戶和建立資料庫。現有容器目前不支援向量搜尋功能，因此您必須建立新的容器，並指定容器層級的向量內嵌原則，以及容器建立時的向量索引編製原則。

讓我們以建立網際網路型的書店資料庫為例，您要儲存每本書的標題、作者、ISBN 和描述。我們也定義了兩個屬性來包含向量內嵌。第一個是“contentVector”屬性，其中包含產生自書籍文字內容 (例如，在建立內嵌之前串連「標題」、「作者」、「ISBN」和「描述」屬性) 的文字內嵌。第二個是從書籍封面影像產生的“coverImageVector”。

針對您要執行向量搜尋的欄位建立和儲存向量內嵌。
指定向量內嵌原則中的向量內嵌路徑。
在容器的索引編製原則中包含任何所需的向量索引。

在本文的後續各節中，我們考慮為儲存在容器中的項目採用下列結構:

{
  "title": "book-title", 
  "author": "book-author", 
  "isbn": "book-isbn", 
  "description": "book-description", 
  "contentVector": [2, -1, 4, 3, 5, -2, 5, -7, 3, 1], 
  "coverImageVector": [0.33, -0.52, 0.45, -0.67, 0.89, -0.34, 0.86, -0.78] 
}

首先，建立 CosmosContainerProperties 物件。

CosmosContainerProperties collectionDefinition = new CosmosContainerProperties(UUID.randomUUID().toString(), "Partition_Key_Def");

為您的容器建立向量內嵌原則

接下來，您必須定義容器向量原則。此原則提供的資訊用來告知 Azure Cosmos DB 查詢引擎，如何處理 VectorDistance 系統函數中的向量屬性。這也會告知向量索引編製原則所需的資訊，您應該選擇指定一個。容器向量原則中包含下列資訊:

“path”：包含向量的屬性路徑
“datatype”：向量元素的類型 (預設值 Float32)
“dimensions”：路徑中每個向量的長度 (預設值 1536)
“distanceFunction”：用來計算距離/相似度的計量 (預設值 Cosine)

對於具有書籍詳細資料的範例，向量原則可能看起來類似範例 JSON:

// Creating vector embedding policy
CosmosVectorEmbeddingPolicy cosmosVectorEmbeddingPolicy = new CosmosVectorEmbeddingPolicy();

CosmosVectorEmbedding embedding1 = new CosmosVectorEmbedding();
embedding1.setPath("/coverImageVector");
embedding1.setDataType(CosmosVectorDataType.FLOAT32);
embedding1.setDimensions(8L);
embedding1.setDistanceFunction(CosmosVectorDistanceFunction.COSINE);

CosmosVectorEmbedding embedding2 = new CosmosVectorEmbedding();
embedding2.setPath("/contentVector");
embedding2.setDataType(CosmosVectorDataType.FLOAT32);
embedding2.setDimensions(10L);
embedding2.setDistanceFunction(CosmosVectorDistanceFunction.DOT_PRODUCT);

cosmosVectorEmbeddingPolicy.setCosmosVectorEmbeddings(Arrays.asList(embedding1, embedding2, embedding3));

collectionDefinition.setVectorEmbeddingPolicy(cosmosVectorEmbeddingPolicy);

在索引編製原則中建立向量索引

一旦決定向量內嵌路徑，就必須將向量索引新增至索引編製原則。目前，只有新容器支援 Azure Cosmos DB for NoSQL 的向量搜尋功能，因此您必須在容器建立期間套用向量原則，且之後就無法再修改。本範例中的編製索引原則與下例相似:

IndexingPolicy indexingPolicy = new IndexingPolicy();
indexingPolicy.setIndexingMode(IndexingMode.CONSISTENT);
ExcludedPath excludedPath1 = new ExcludedPath("/coverImageVector/*");
ExcludedPath excludedPath2 = new ExcludedPath("/contentVector/*");
indexingPolicy.setExcludedPaths(ImmutableList.of(excludedPath1, excludedPath2));

IncludedPath includedPath1 = new IncludedPath("/*");
indexingPolicy.setIncludedPaths(Collections.singletonList(includedPath1));

// Creating vector indexes
CosmosVectorIndexSpec cosmosVectorIndexSpec1 = new CosmosVectorIndexSpec();
cosmosVectorIndexSpec1.setPath("/coverImageVector");
cosmosVectorIndexSpec1.setType(CosmosVectorIndexType.QUANTIZED_FLAT.toString());

CosmosVectorIndexSpec cosmosVectorIndexSpec2 = new CosmosVectorIndexSpec();
cosmosVectorIndexSpec2.setPath("/contentVector");
cosmosVectorIndexSpec2.setType(CosmosVectorIndexType.DISK_ANN.toString());

indexingPolicy.setVectorIndexes(Arrays.asList(cosmosVectorIndexSpec1, cosmosVectorIndexSpec2, cosmosVectorIndexSpec3));

collectionDefinition.setIndexingPolicy(indexingPolicy);

最後，使用容器索引原則和向量索引原則建立容器。

database.createContainer(collectionDefinition).block();

重要

已新增至索引編製原則的 "excludedPaths" 區段以確保插入效能最佳化的向量路徑。若未將向量路徑新增至 "excludedPaths"，將會導致向量插入的 RU 費用和延遲較高。

執行向量相似度搜尋查詢

在您建立具有所需向量原則的容器，並將向量資料插入容器後，您就可以在查詢中使用向量距離系統函數來執行向量搜尋。假設您想要藉由查看說明來搜尋食譜的相關書籍，您必須先取得查詢文字的內嵌。在此情況下，您可能會想要針對查詢文字產生內嵌 - 「食譜」。一旦您對搜尋查詢具有內嵌，就可以在向量搜尋查詢的 VectorDistance 函數中使用該內嵌，並取得與您查詢類似的所有項目，如下所示：

SELECT TOP 10 c.title, VectorDistance(c.contentVector, [1,2,3,4,5,6,7,8,9,10]) AS SimilarityScore   
FROM c  
ORDER BY VectorDistance(c.contentVector, [1,2,3,4,5,6,7,8,9,10])

此查詢會擷取書籍標題，以及與查詢有關的相似度分數。以下是 Java:

float[] embedding = new float[10];
for (int i = 0; i < 10; i++) {
    array[i] = i + 1;
}
ArrayList<SqlParameter> paramList = new ArrayList<SqlParameter>();
  paramList.add(new SqlParameter("@embedding", embedding));
  SqlQuerySpec querySpec = new SqlQuerySpec("SELECT c.title, VectorDistance(c.contentVector,@embedding) AS SimilarityScore  FROM c ORDER BY VectorDistance(c.contentVector,@embedding)", paramList);
  CosmosPagedIterable<Family> filteredFamilies = container.queryItems(querySpec, new CosmosQueryRequestOptions(), Family.class);

  if (filteredFamilies.iterator().hasNext()) {
      Family family = filteredFamilies.iterator().next();
      logger.info(String.format("First query result: Family with (/id, partition key) = (%s,%s)",family.getId(),family.getLastName()));
  }

共用方式為

使用 Java 編製 Azure Cosmos DB for NoSQL 向量資料索引並查詢這些向量資料

必要條件

啟用功能

了解向量搜尋中涉及的步驟

為您的容器建立向量內嵌原則

在索引編製原則中建立向量索引

執行向量相似度搜尋查詢

意見反應

其他資源

共用方式為

使用 Java 編製 Azure Cosmos DB for NoSQL 向量資料索引並查詢這些向量資料

必要條件

啟用功能

了解向量搜尋中涉及的步驟

為您的容器建立向量內嵌原則

在索引編製原則中建立向量索引

執行向量相似度搜尋查詢

相關內容

意見反應

其他資源