Serialization of your data model to and from different stores (Preview)
In order for your data model to be stored in a database, it needs to be converted to a format that the database can understand. Different databases require different storage schemas and formats. Some have a strict schema that needs to be adhered to, while others allow the schema to be defined by the user.
Mapping options
The vector store connectors provided by Semantic Kernel provide multiple ways to achieve this mapping.
Built-in mappers
The vector store connectors provided by Semantic Kernel have built-in mappers that will map your data model to and from the database schemas. See the page for each connector for more information on how the built-in mappers map data for each database.
Custom mappers
The vector store connectors provided by Semantic Kernel support the ability to provide custom mappers in combination with
a VectorStoreRecordDefinition
. In this case, the VectorStoreRecordDefinition
can differ from the supplied data model.
The VectorStoreRecordDefinition
is used to define the database schema, while the data model is used by the developer
to interact with the vector store.
A custom mapper is required in this case to map from the data model to the custom database schema defined by the VectorStoreRecordDefinition
.
Tip
See How to build a custom mapper for a Vector Store connector for an example on how to create your own custom mapper.
In order for your data model defined either as a class or a definition to be stored in a database, it needs to be serialized to a format that the database can understand.
There are two ways that can be done, either by using the built-in serialization provided by the Semantic Kernel or by providing your own serialization logic.
Serialization options
Built-in serialization
The built-in serialization is done by first converting the data model to a dictionary and then serializing it to the model that that store understands, for each store that is different and defined as part of the built-in connector. Deserialization is done in the reverse order.
Custom to and from dict methods
The built-in serialization can also use custom methods to go from the data model to a dictionary and from a dictionary to the data model. This can be done by implementing methods from the VectorStoreModelToDictFromDictProtocol
for a class or functions following the ToDictProtocol
and FromDictProtocol
protocols in your record definition, both can be found in semantic_kernel/data/vector_store_model_protocols.py
.
This is especially useful when you want to use a optimized, container format in your code, but still want to be able to move between stores easily.
Pydantic models
When you define you model using a Pydantic BaseModel, it will use the model_dump
and model_validate
methods to serialize and deserialize the data model to and from a dict.
Custom serialization
You can also define the serialization to be done directly from your model into the model of the data store.
This can be done by implementing the VectorStoreModelFunctionSerdeProtocol
protocol, or by adding functions that follow the SerializeProtocol
and DeserializeProtocol
in your record definition, both can be found in semantic_kernel/data/vector_store_model_protocols.py
.
Serialization of vectors
When you have a vector in your data model, it needs to either be a list of floats or list of ints, since that is what most stores need, if you want your class to store the vector in a different format, you can use the serialize_function
and deserialize_function
defined in the VectorStoreRecordVectorField
annotation. For instance for a numpy array you can use the following annotation:
import numpy as np
vector: Annotated[
np.ndarray | None,
VectorStoreRecordVectorField(
dimensions=1536,
serialize_function=np.ndarray.tolist,
deserialize_function=np.array,
),
] = None
If you do use a vector store that can handle native numpy arrays and you don't want to have them converted back and forth, you should setup the direct serialization and deserialization for the model and that store.
Coming soon
More info coming soon.