Vector data type (preview)
Applies to: Azure SQL Database SQL database in Microsoft Fabric
The vector data type is designed to store vector data optimized for operations such as similarity search and machine learning applications. Vectors are stored in an optimized binary format but are exposed as JSON arrays for convenience. Each element of the vector is stored as a single-precision (4-byte) floating-point value.
Note
This data type is in preview and is subject to change. Make sure to read preview usage terms in the Service Level Agreements (SLA) for Online Services document. For limitations of the current preview, see Limitations and Known issues.
For more information on working with vector data, see:
Sample syntax
The usage syntax for the vector type is similar to all other SQL Server data types in a table.
column_name VECTOR( {<dimensions>} ) [NOT NULL | NULL]
Dimensions
A vector must have at least one dimension. The maximum number of dimensions supported is 1998.
Examples
A. Column definition
The vector type can be used in column definition contained in a CREATE TABLE
statement, for example:
The following example creates a table with a vector column and inserts data into it.
CREATE TABLE dbo.vectors
(
id INT PRIMARY KEY,
v VECTOR(3) NOT NULL
);
INSERT INTO dbo.vectors (id, v) VALUES
(1, '[0.1, 2, 30]'),
(2, '[-100.2, 0.123, 9.876]');
SELECT * FROM dbo.vectors;
B. Usage in variables
The following example declares vectors using the new vector data type and calculates distances using the VECTOR_DISTANCE
function.
The vector type can be used with variables:
DECLARE @v VECTOR(3) = '[0.1, 2, 30]';
SELECT @v;
C. Usage in stored procedures or functions
The vector data type can be used as parameter in stored procedure or functions. For example:
CREATE PROCEDURE dbo.SampleStoredProcedure
@V VECTOR(3),
@V2 VECTOR(3) OUTPUT
AS
BEGIN
SELECT @V;
SET @V2 = @V;
END
Feature availability
The new vector type is available under all database compatibility levels.
Compatibility
To allow all clients to be able to operate on vector data, vectors are exposed as varchar(max) types. Client applications can work with vector data as if it was a JSON Array. The engine will automatically convert vectors to and from a JSON array, making the new type transparent for the client. Thanks to this approach all drivers and all languages are automatically compatible with the new type.
You can start to use the new vector type right away. Here's some examples:
With C#, vectors can be serialized and deserialized to and from string using the JsonSerializer
class.
using Microsoft.Data.SqlClient;
using Dapper;
using DotNetEnv;
using System.Text.Json;
namespace DotNetSqlClient;
class Program
{
static void Main(string[] args)
{
Env.Load();
var v1 = new float[] { 1.0f, 2.0f, 3.0f };
using var conn = new SqlConnection(Env.GetString("MSSQL"));
conn.Execute("INSERT INTO dbo.vectors VALUES(100, @v)", param: new {@v = JsonSerializer.Serialize(v1)});
var r = conn.ExecuteScalar<string>("SELECT v FROM dbo.vectors") ?? "[]";
var v2 = JsonSerializer.Deserialize<float[]>(r);
Console.WriteLine(JsonSerializer.Serialize(v2));
}
}
Limitations
The ongoing preview has the following limitations:
Tables
- Column-level constraints are not supported, except for
NULL
/NOT NULL
constraints.DEFAULT
andCHECK
constraints are not supported for vector columns.- Key constraints, such as
PRIMARY KEY
orFOREIGN KEY
, are not supported for vector columns. Equality, uniqueness, joins using vector columns as keys, and sort orders do not apply to vector data types. - There is no notion of uniqueness for vectors, so unique constraints are not applicable.
- Checking the range of values within a vector is also not applicable.
- Vectors do not support comparison, addition, subtraction, multiplication, division, concatenation, or any other mathematical, logical, and compound assignment operators.
- vector columns cannot be used in memory-optimized tables.
- Altering vector columns using
ALTER TABLE ... ALTER COLUMN
to other data types is not permitted.
Table schema metadata
sp_describe_first_result_set system stored procedure doesn't correctly return the vector data type. Therefore, many data access clients and driver see a varchar or nvarchar data type.
INFORMATION_SCHEMA.COLUMNS
reports columns using vector type as varbinary. A workaround to get the correct data type is to usesys.columns
system view.sys.columns
returns length of vector in bytes. To get the number of dimensions, use the following formula:dimensions = (length - 8) / 4
where
length
is the value returned bymax_length
. For example, if you see amax_length
of 20 bytes, the number of dimensions is (20 - 8) / 4 = 3.
Conversions
Implicit and explicit conversion using
CAST
orCONVERT
from the vector type can be done to varchar, and nvarchar types Similarly, only varchar, and nvarchar can be implicitly or explicitly converted to the vector type.The vector type can't be used with the sql_variant type or assigned to a sql_variant variable or column. This restriction similar to varchar(max), varbinary(max), nvarchar(max), xml, json, and CLR-based data types.
Casting to and from JSON data type is not supported yet. The workaround is to first convert from/to nvarchar(max) and then to/from JSON. For example, to convert a vector to a JSON type:
DECLARE @v VECTOR(3) = '[1.0, -0.2, 30]'; SELECT CAST(CAST(@v AS NVARCHAR(MAX)) AS JSON) AS j;
and to convert from a JSON type to vector:
DECLARE @j JSON = JSON_ARRAY(1.0, -0.2, 30) SELECT CAST(CAST(@j AS NVARCHAR(MAX)) AS VECTOR(3)) AS v;
Indexes
- B-tree indexes or columnstore indexes are not allowed on vector columns. However, a vector column can be specified as an included column in an index definition.
User-defined types
- Creation of alias type using
CREATE TYPE
for the vector type isn't allowed, similar to the behavior of the xml and json data types.
Ledger tables
- Stored procedure
sp_verify_database_ledger
will generate an error if the database contains a table with a vector column.
Known issues
In the ongoing preview there are the following known issues:
- Tools like SQL Server Management Studio, Azure Data Studio, or the mssql extension for VS Code currently might not be able to generate the script of a table that has a column using the vector data type.
- Tools like SQL Server Management Studio, Azure Data Studio, or the mssql extension for VS Code currently might report a data type of varbinary instead of vector for a column using the vector type.
- BCP and
BULK INSERT
don't currently work if tables contain the vector type. - Import and Export via DacFx currently doesn't work if there is a table using vector type.
- Column encryption doesn't currently support the vector type.
- Always Encrypted doesn't currently support the vector type.
- Data Masking currently shows vector data as varbinary data type in the portal.
- When passing a vector type to
LEN
andDATALENGTH
error 8116 (Argument data type vector is invalid for argument 1 of datalength function) is returned. - In some cases when you pass a vector to a stored procedure or a function you, may get error 42211 (Truncation of vector is not allowed during the conversion). A workaround is to use nvarchar(max) instead vector type.
These issues will be fixed in future updates and documentation will be updated accordingly.