vector_search
function
Applies to: Databricks SQL
Important
This feature is in Public Preview.
The vector_search()
function allows you to query a Mosaic AI Vector Search index using SQL.
Requirements
- This function is not available on classic SQL warehouses.
- For more information, see Databricks SQL pricing page.
- This function is available in regions where Mosaic AI Vector Search is supported.
Syntax
In Databricks Runtime 15.3 and above you can use query_text
or query_vector
to specify what to search for in the index.
SELECT * FROM vector_search(
index,
{ query_text | query_vector },
[ num_results ]
)
In Databricks Runtime 15.2 and below, you can use query
to specify what to search for in the index.
SELECT * FROM vector_search(
index, query, num_results
)
Arguments
All arguments must be passed by name, like vector_search(index => indexName, query_text => queryText)
.
index
: ASTRING
constant, the fully qualified name of an existing vector search index in the same workspace for invocations. The definer must have “Select” permission on the index.- Use one of the following to specify the expression to search for in the index:
- For Databricks Runtime 15.3 or above, use
query_text
to search for a specific string of text in the embedding source column of your Delta Sync Index. The query must be aSTRING
expression of the string to search for in the index. - For Databricks Runtime 15.3 or above, use
query_vector
to search for a specific vector in the embedding vector column of your Delta Sync Index. This argument is required for searching a Delta Sync Index using self-managed vectors. The query must be anARRAY<FLOAT>
, orARRAY<DOUBLE>
, orARRAY<DECIMAL(_, _)>
expression of the embedding vector to search for in the index. - For Databricks Runtime 15.2 or below, use
query
to specify the string to search for in your index.
- For Databricks Runtime 15.3 or above, use
num_results
(optional): An integer constant, the max number of records to return. Defaults to 10.
Returns
A table of the top matching records from the index. All the columns of the index are included.
Examples
The following sections show example SQL queries for different index searches.
Text queries on indexes with embedding source columns
Search over an index of product SKUs to find similar products by name. The following example uses query_text
which is only supported in Databricks Runtime 15.3 and above. For Databricks Runtime 15.2 and below, use query
instead of query_text
.
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_text => "iphone", num_results => 2)
ID | Product name |
---|---|
10 | iPhone |
20 | iPhone SE |
The following example searches for multiple terms at the same time by using a LATERAL subquery.
SELECT
query_txt,
query_id,
search.*
FROM
query_table,
LATERAL(
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_text => query_txt, num_results => 2)
) as search
query_txt |
query_id | search.id | search.product_name |
---|---|---|---|
iphone | 1 | 10 | iPhone 10 |
iphone | 1 | 20 | iPhone SE |
pixel 8 | 2 | 30 | Pixel 8 |
pixel 8 | 2 | 40 | Pixel 8a |
Text queries on indexes with embedding source columns
Search over an index of images with pre-computed embeddings to find similar images by embedding. The following example uses query_vector
which is only supported in Databricks Runtime 15.3 and above. For Databricks Runtime 15.2 and below, use query
instead of query_vector
.
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => ARRAY(0.45, -0.35, 0.78, 0.22), num_results => 3)
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => ARRAY(0.45F, -0.35F, 0.78F, 0.22F), num_results => 3)
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => ARRAY(0.45D, -0.35D, 0.78D, 0.22D), num_results => 3)
id | image_name | image_embedding |
---|---|---|
28 | horse.jpg | [0.46, -0.34, 0.77, 0.21] |
27 | donkey.jpg | [0.44, -0.36, 0.79, 0.23] |
5 | elk.jpg | [0.23, -0.44, 0.77, 0.80] |
The following example searches for multiple terms at the same time by using a LATERAL subquery.
SELECT
query_embedding,
search.*
FROM
query_table,
LATERAL(
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => image_embedding, num_results => 1)
) as search
query_embedding | search.id | search.image_name | search_image_embedding |
---|---|---|---|
[0.45, -0.35, 0.78, 0.22] | 27 | donkey.jpg | [0.46, -0.34, 0.77, 0.21] |
[0.14, 0.29, 0.30, -0.90] | 3 | landscape.jpg | [0.15, 0.30, 0.31, -0.91] |
[0.23, -0.44, 0.77, 0.80] | 10 | golden_gate_bridge.jpg | [0.28, -0.40, 0.23, 0.81] |
[0.88, 0.88, 0.88, 0.88] | 44 | blank.jpg | [0.88, 0.88, 0.88, 0.88] |
Limitations
The following limitations apply during the preview:
- Querying
DIRECT_ACCESS
index types are not supported. - Input parameters
filters_json
orcolumns
are not supported. - Vector Search with
num_results
greater than 100 are not supported. - Users who do not have READ access to the source table cannot use
vector_search()
. - Hybrid keyword-similarity search is not supported using
vector_search()
. vector_search
cannot be used with model serving endpoints using Foundation Model APIs provisioned throughput.