Using the Azure CosmosDB NoSQL Vector Store connector (Preview)
Warning
The Semantic Kernel Vector Store functionality is in preview, and improvements that require breaking changes may still occur in limited circumstances before release.
Overview
The Azure CosmosDB NoSQL Vector Store connector can be used to access and manage data in Azure CosmosDB NoSQL. The connector has the following characteristics.
Feature Area | Support |
---|---|
Collection maps to | Azure Cosmos DB NoSQL Container |
Supported key property types |
|
Supported data property types |
|
Supported vector property types |
|
Supported index types |
|
Supported distance functions |
|
Supported filter clauses |
|
Supports multiple vectors in a record | Yes |
IsFilterable supported? | Yes |
IsFullTextSearchable supported? | Yes |
StoragePropertyName supported? | No, use JsonSerializerOptions and JsonPropertyNameAttribute instead. See here for more info. |
HybridSearch supported? | Yes |
Limitations
When initializing CosmosClient
manually, it is necessary to specify CosmosClientOptions.UseSystemTextJsonSerializerWithOptions
due to limitations in the default serializer. This option can be set to JsonSerializerOptions.Default
or customized with other serializer options to meet specific configuration needs.
var cosmosClient = new CosmosClient(connectionString, new CosmosClientOptions()
{
UseSystemTextJsonSerializerWithOptions = JsonSerializerOptions.Default,
});
Getting started
Add the Azure CosmosDB NoSQL Vector Store connector NuGet package to your project.
dotnet add package Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL --prerelease
You can add the vector store to the dependency injection container available on the KernelBuilder
or to the IServiceCollection
dependency injection container using extension methods provided by Semantic Kernel.
using Microsoft.SemanticKernel;
// Using Kernel Builder.
var kernelBuilder = Kernel
.CreateBuilder()
.AddAzureCosmosDBNoSQLVectorStore(connectionString, databaseName);
using Microsoft.SemanticKernel;
// Using IServiceCollection with ASP.NET Core.
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddAzureCosmosDBNoSQLVectorStore(connectionString, databaseName);
Extension methods that take no parameters are also provided. These require an instance of Microsoft.Azure.Cosmos.Database
to be separately registered with the dependency injection container.
using Microsoft.Azure.Cosmos;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.SemanticKernel;
// Using Kernel Builder.
var kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.Services.AddSingleton<Database>(
sp =>
{
var cosmosClient = new CosmosClient(connectionString, new CosmosClientOptions()
{
// When initializing CosmosClient manually, setting this property is required
// due to limitations in default serializer.
UseSystemTextJsonSerializerWithOptions = JsonSerializerOptions.Default,
});
return cosmosClient.GetDatabase(databaseName);
});
kernelBuilder.AddAzureCosmosDBNoSQLVectorStore();
using Microsoft.Azure.Cosmos;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.SemanticKernel;
// Using IServiceCollection with ASP.NET Core.
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddSingleton<Database>(
sp =>
{
var cosmosClient = new CosmosClient(connectionString, new CosmosClientOptions()
{
// When initializing CosmosClient manually, setting this property is required
// due to limitations in default serializer.
UseSystemTextJsonSerializerWithOptions = JsonSerializerOptions.Default,
});
return cosmosClient.GetDatabase(databaseName);
});
builder.Services.AddAzureCosmosDBNoSQLVectorStore();
You can construct an Azure CosmosDB NoSQL Vector Store instance directly.
using Microsoft.Azure.Cosmos;
using Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL;
var cosmosClient = new CosmosClient(connectionString, new CosmosClientOptions()
{
// When initializing CosmosClient manually, setting this property is required
// due to limitations in default serializer.
UseSystemTextJsonSerializerWithOptions = JsonSerializerOptions.Default,
});
var database = cosmosClient.GetDatabase(databaseName);
var vectorStore = new AzureCosmosDBNoSQLVectorStore(database);
It is possible to construct a direct reference to a named collection.
using Microsoft.Azure.Cosmos;
using Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL;
var cosmosClient = new CosmosClient(connectionString, new CosmosClientOptions()
{
// When initializing CosmosClient manually, setting this property is required
// due to limitations in default serializer.
UseSystemTextJsonSerializerWithOptions = JsonSerializerOptions.Default,
});
var database = cosmosClient.GetDatabase(databaseName);
var collection = new AzureCosmosDBNoSQLVectorStoreRecordCollection<Hotel>(
database,
"skhotels");
Data mapping
The Azure CosmosDB NoSQL Vector Store connector provides a default mapper when mapping from the data model to storage.
This mapper does a direct conversion of the list of properties on the data model to the fields in Azure CosmosDB NoSQL and uses System.Text.Json.JsonSerializer
to convert to the storage schema. This means that usage of the JsonPropertyNameAttribute
is supported if a different storage name to the
data model property name is required. The only exception is the key of the record which is mapped to a database field named id
, since all CosmosDB NoSQL
records must use this name for ids.
It is also possible to use a custom JsonSerializerOptions
instance with a customized property naming policy. To enable this, the JsonSerializerOptions
must be passed to the AzureCosmosDBNoSQLVectorStoreRecordCollection
on construction.
using System.Text.Json;
using Microsoft.Azure.Cosmos;
using Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL;
var jsonSerializerOptions = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseUpper };
var cosmosClient = new CosmosClient(connectionString, new CosmosClientOptions()
{
// When initializing CosmosClient manually, setting this property is required
// due to limitations in default serializer.
UseSystemTextJsonSerializerWithOptions = jsonSerializerOptions
});
var database = cosmosClient.GetDatabase(databaseName);
var collection = new AzureCosmosDBNoSQLVectorStoreRecordCollection<Hotel>(
database,
"skhotels",
new() { JsonSerializerOptions = jsonSerializerOptions });
Using the above custom JsonSerializerOptions
which is using SnakeCaseUpper
, the following data model will be mapped to the below json.
using System.Text.Json.Serialization;
using Microsoft.Extensions.VectorData;
public class Hotel
{
[VectorStoreRecordKey]
public ulong HotelId { get; set; }
[VectorStoreRecordData(IsFilterable = true)]
public string HotelName { get; set; }
[VectorStoreRecordData(IsFullTextSearchable = true)]
public string Description { get; set; }
[JsonPropertyName("HOTEL_DESCRIPTION_EMBEDDING")]
[VectorStoreRecordVector(4, DistanceFunction.EuclideanDistance, IndexKind.QuantizedFlat)]
public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }
}
{
"id": 1,
"HOTEL_NAME": "Hotel Happy",
"DESCRIPTION": "A place where everyone can be happy.",
"HOTEL_DESCRIPTION_EMBEDDING": [0.9, 0.1, 0.1, 0.1],
}
Using partition key
In the Azure Cosmos DB for NoSQL connector, the partition key property defaults to the key property - id
. The PartitionKeyPropertyName
property in AzureCosmosDBNoSQLVectorStoreRecordCollectionOptions<TRecord>
class allows specifying a different property as the partition key.
The AzureCosmosDBNoSQLVectorStoreRecordCollection
class supports two key types: string
and AzureCosmosDBNoSQLCompositeKey
. The AzureCosmosDBNoSQLCompositeKey
consists of RecordKey
and PartitionKey
.
If the partition key property is not set (and the default key property is used), string
keys can be used for operations with database records. However, if a partition key property is specified, it is recommended to use AzureCosmosDBNoSQLCompositeKey
to provide both the key and partition key values.
Specify partition key property name:
var options = new AzureCosmosDBNoSQLVectorStoreRecordCollectionOptions<Hotel>
{
PartitionKeyPropertyName = nameof(Hotel.HotelName)
};
var collection = new AzureCosmosDBNoSQLVectorStoreRecordCollection<Hotel>(database, "collection-name", options)
as IVectorStoreRecordCollection<AzureCosmosDBNoSQLCompositeKey, Hotel>;
Get with partition key:
var record = await collection.GetAsync(new AzureCosmosDBNoSQLCompositeKey("hotel-id", "hotel-name"));
Overview
The Azure CosmosDB NoSQL Vector Store connector can be used to access and manage data in Azure CosmosDB NoSQL. The connector has the following characteristics.
Feature Area | Support |
---|---|
Collection maps to | Azure Cosmos DB NoSQL Container |
Supported key property types |
|
Supported data property types |
|
Supported vector property types |
|
Supported index types |
|
Supported distance functions |
|
Supported filter clauses |
|
Supports multiple vectors in a record | Yes |
is_filterable supported? | Yes |
is_full_text_searchable supported? | Yes |
HybridSearch supported? | No |
Getting started
Add the Azure extra package to your project.
pip install semantic-kernel[azure]
Next you can create a Azure CosmosDB NoSQL Vector Store instance directly. This reads certain environment variables to configure the connection to Azure CosmosDB NoSQL:
- AZURE_COSMOS_DB_NO_SQL_URL
- AZURE_COSMOS_DB_NO_SQL_DATABASE_NAME
And optionally:
- AZURE_COSMOS_DB_NO_SQL_KEY
When this is not set, a AsyncDefaultAzureCredential
is used to authenticate.
from semantic_kernel.connectors.memory.azure_cosmos_db import AzureCosmosDBNoSQLStore
vector_store = AzureCosmosDBNoSQLStore()
You can also supply these values in the constructor:
from semantic_kernel.connectors.memory.azure_cosmos_db import AzureCosmosDBNoSQLStore
vector_store = AzureCosmosDBNoSQLStore(
url="https://<your-account-name>.documents.azure.com:443/",
key="<your-account-key>",
database_name="<your-database-name>"
)
And you can pass in a CosmosClient instance, just make sure it is a async client.
from semantic_kernel.connectors.memory.azure_cosmos_db import AzureCosmosDBNoSQLStore
from azure.cosmos.aio import CosmosClient
client = CosmosClient(
url="https://<your-account-name>.documents.azure.com:443/",
credential="<your-account-key>" or AsyncDefaultAzureCredential()
)
vector_store = AzureCosmosDBNoSQLStore(
client=client,
database_name="<your-database-name>"
)
The next step needs a data model, a variable called Hotels is used in the example below.
With a store, you can get a collection:
from semantic_kernel.connectors.memory.azure_cosmos_db import AzureCosmosDBNoSQLStore
vector_store = AzureCosmosDBNoSQLStore()
collection = vector_store.get_collection(collection_name="skhotels", data_model=Hotel)
It is possible to construct a direct reference to a named collection, this uses the same environment variables as above.
from semantic_kernel.connectors.memory.azure_cosmos_db import AzureCosmosDBNoSQLCollection
collection = AzureCosmosDBNoSQLCollection(
collection_name="skhotels",
data_model_type=Hotel,
)
Using partition key
In the Azure Cosmos DB for NoSQL connector, the partition key property defaults to the key property - id
. You can also supply a value for the partition key in the constructor.
from semantic_kernel.connectors.memory.azure_cosmos_db import AzureCosmosDBNoSQLCollection
collection = AzureCosmosDBNoSQLCollection(
collection_name="skhotels",
data_model_type=Hotel,
partition_key="hotel_name"
)
This can be a more complex key, when using the PartitionKey
object:
from semantic_kernel.connectors.memory.azure_cosmos_db import AzureCosmosDBNoSQLCollection
from azure.cosmos import PartitionKey
partition_key = PartitionKey(path="/hotel_name")
collection = AzureCosmosDBNoSQLCollection(
collection_name="skhotels",
data_model_type=Hotel,
partition_key=partition_key
)
The AzureCosmosDBNoSQLVectorStoreRecordCollection
class supports two key types: string
and AzureCosmosDBNoSQLCompositeKey
. The AzureCosmosDBNoSQLCompositeKey
consists of key
and partition_key
.
If the partition key property is not set (and the default key property is used), string
keys can be used for operations with database records. However, if a partition key property is specified, it is recommended to use AzureCosmosDBNoSQLCompositeKey
to provide both the key and partition key values to the get
and delete
methods.
from semantic_kernel.connectors.memory.azure_cosmos_db import AzureCosmosDBNoSQLCollection
from semantic_kernel.connectors.memory.azure_cosmos_db import AzureCosmosDBNoSQLCompositeKey
from semantic_kernel.data import VectorStoreRecordDataField
@vectorstoremodel
class data_model_type:
id: Annotated[str, VectorStoreRecordKeyField]
product_type: Annotated[str, VectorStoreRecordDataField()]
...
collection = store.get_collection(
collection_name=collection_name,
data_model=data_model_type,
partition_key=PartitionKey(path="/product_type"),
)
# when there is data in the collection
composite_key = AzureCosmosDBNoSQLCompositeKey(
key='key value', partition_key='partition key value'
)
# get a record, with the partition key
record = await collection.get(composite_key)
# or delete
await collection.delete(composite_key)
Coming soon
More info coming soon.