How to optimize performance when using pgvector on Azure Cosmos DB for PostgreSQL
APPLIES TO: Azure Cosmos DB for PostgreSQL (powered by the Citus database extension to PostgreSQL)
The pgvector
extension adds an open-source vector similarity search to PostgreSQL.
This article explores the limitations and tradeoffs of pgvector
and shows how to use partitioning, indexing and search settings to improve performance.
For more on the extension itself, see basics of pgvector
. You may also want to refer to the official README of the project.
Performance
You should always start by investigating the query plan. If your query terminates reasonably fast, run EXPLAIN (ANALYZE,VERBOSE, BUFFERS)
.
EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM t_test ORDER BY embedding <-> '[1,2,3]' LIMIT 5;
For queries that take too long to execute, consider dropping the ANALYZE
keyword. The result contains fewer details but is provided instantly.
EXPLAIN (VERBOSE, BUFFERS) SELECT * FROM t_test ORDER BY embedding <-> '[1,2,3]' LIMIT 5;
Third party sites, like explain.depesz.com can be helpful in understanding query plans. Some questions that you should try to answer are:
- Was the query parallelized?
- Was an index used?
- Did I use the same condition in the WHERE clause as in a partial index definition?
- If I use partitioning, were not-needed partitions pruned?
If your vectors are normalized to length 1, like OpenAI embeddings. You should consider using inner product (<#>
) for best performance.
Parallel execution
In the output of your explain plan, look for Workers Planned
and Workers Launched
(latter only when ANALYZE
keyword was used). The max_parallel_workers_per_gather
PostgreSQL parameter defines how many background workers the database can launch for every Gather
and Gather Merge
plan node. Increasing this value might speed up your exact search queries without having to create indexes. Note however, that the database might not decide to run the plan in parallel even when this value is high.
EXPLAIN SELECT * FROM t_test ORDER BY embedding <-> '[1,2,3]' LIMIT 3;
QUERY PLAN
------------------------------------------------------------------------------------------
Limit (cost=4214.82..4215.16 rows=3 width=33)
-> Gather Merge (cost=4214.82..13961.30 rows=84752 width=33)
Workers Planned: 1
-> Sort (cost=3214.81..3426.69 rows=84752 width=33)
Sort Key: ((embedding <-> '[1,2,3]'::vector))
-> Parallel Seq Scan on t_test (cost=0.00..2119.40 rows=84752 width=33)
(6 rows)
Indexing
Without indexes present, the extension performs an exact search, which provides perfect recall at the expense of performance.
In order to perform approximate nearest neighbor search you can create indexes on your data, which trades recall for execution performance.
When possible, always load your data before indexing it. It's both faster to create the index this way and the resulting layout is more optimal.
There are three supported index types:
- Inverted File with Flat Compression (IVVFlat)
- Hierarchical Navigable Small Worlds (HNSW)
- Disk Approximate Nearest Neighbor (DiskANN)
The IVFFlat
index has faster build times and uses less memory than HNSW
, but has lower query performance (in terms of speed-recall tradeoff). DiskANN
offers a great balance of both offering highly accurate query performance and fast build times.
Limits
- In order to index a column, it has to have dimensions defined. Attempting to index a column defined as
col vector
results in the error:ERROR: column does not have dimensions
. - You can only index a column that has up to 2000 dimensions. Attempting to index a column with more dimensions results in the error:
ERROR: column cannot have more than 2000 dimensions for INDEX_TYPE index
whereINDEX_TYPE
is eitherivfflat
orhnsw
.
While you can store vectors with more than 2000 dimensions, you can't index them. You can use dimensionality reduction to fit within the limits. Alternatively rely on partitioning and/or sharding with Azure Cosmos DB for PostgreSQL to achieve acceptable performance without indexing.
Inverted File with Flat Compression (IVVFlat)
The ivfflat
is an index for approximate nearest neighbor (ANN) search. This method uses an inverted file index to partition the dataset into multiple lists. The probes parameter controls how many lists are searched, which can improve the accuracy of the search results at the cost of slower search speed.
If the probes parameter is set to the number of lists in the index, then all lists are searched and the search becomes an exact nearest neighbor search. In this case, the planner isn't using the index because searching all lists is equivalent to performing a brute-force search on the entire dataset.
The indexing method partitions the dataset into multiple lists using the k-means clustering algorithm. Each list contains vectors that are closest to a particular cluster center. During a search, the query vector is compared to the cluster centers to determine which lists are most likely to contain the nearest neighbors. If the probes parameter is set to 1, then only the list corresponding to the closest cluster center would be searched.
Index options
Selecting the correct value for the number of probes to perform and the sizes of the lists might affect search performance. Good places to start are:
- Use
lists
equal torows / 1000
for tables with up to 1 million rows andsqrt(rows)
for larger datasets. - For
probes
start withlists / 10
for tables up to 1 million rows andsqrt(lists)
for larger datasets.
The amount of lists
is defined upon index creation with the lists
option:
CREATE INDEX t_test_embedding_cosine_idx ON t_test USING ivfflat (embedding vector_cosine_ops) WITH (lists = 5000);
The probes can be set for the whole connection or per transaction (using SET LOCAL
within a transaction block):
SET ivfflat.probes = 10;
SELECT * FROM t_test ORDER BY embedding <=> '[1,2,3]' LIMIT 5; -- uses 10 probes
SELECT * FROM t_test ORDER BY embedding <=> '[1,2,3]' LIMIT 5; -- uses 10 probes
BEGIN;
SET LOCAL ivfflat.probes = 10;
SELECT * FROM t_test ORDER BY embedding <=> '[1,2,3]' LIMIT 5; -- uses 10 probes
COMMIT;
SELECT * FROM t_test ORDER BY embedding <=> '[1,2,3]' LIMIT 5; -- uses default, one probe
Indexing progress
With PostgreSQL 12 and newer, you can use pg_stat_progress_create_index
to check indexing progress.
SELECT phase, round(100.0 * tuples_done / nullif(tuples_total, 0), 1) AS "%" FROM pg_stat_progress_create_index;
Phases for building IVFFlat indexes are:
initializing
performing k-means
assigning tuples
loading tuples
Note
Progress percentage (%
) is only populated during loading tuples
phase.
Hierarchical Navigable Small Worlds (HNSW)
The hnsw
is an index for approximate nearest neighbor (ANN) search using the Hierarchical Navigable Small Worlds algorithm. It works by creating a graph around randomly selected entry points finding their nearest neighbors, the graph is then extended with multiple layers, each lower layer containing more points. This multilayered graph when searched starts at the top, narrowing down until it hits the lowest layer that contains the nearest neighbors of the query.
Building this index takes more time and memory than IVFFlat, however it has better speed-recall tradeoff. Additionally, there's no training step like with IVFFlat, so the index can be created on an empty table.
Index options
When creating the index, you can tune two parameters:
m
- maximum number of connections per layer (defaults to 16)ef_construction
- size of the dynamic candidate list used for graph construction (defaults to 64)
CREATE INDEX t_test_hnsw_l2_idx ON t_test USING hnsw (embedding vector_l2_ops) WITH (m = 16, ef_construction = 64);
During queries, you can specify the dynamic candidate list for search (defaults to 40).
The dynamic candidate list for search can be set for the whole connection or per transaction (using SET LOCAL
within a transaction block):
SET hnsw.ef_search = 100;
SELECT * FROM t_test ORDER BY embedding <=> '[1,2,3]' LIMIT 5; -- uses 100 candidates
BEGIN;
SET hnsw.ef_search = 100;
SELECT * FROM t_test ORDER BY embedding <=> '[1,2,3]' LIMIT 5; -- uses 100 candidates
COMMIT;
SELECT * FROM t_test ORDER BY embedding <=> '[1,2,3]' LIMIT 5; -- uses default, 40 candidates
Indexing progress
With PostgreSQL 12 and newer, you can use pg_stat_progress_create_index
to check indexing progress.
SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" FROM pg_stat_progress_create_index;
Phases for building HNSW indexes are:
initializing
loading tuples
Disk Approximate Nearest Neighbor (DiskANN)
DiskANN
is a scalable approximate nearest neighbor search algorithm for efficient vector search at any scale. It offers high recall, high queries per second (QPS), and low query latency, even for billion-point datasets. This makes it a powerful tool for handling large volumes of data. Learn more about DiskANN from Microsoft.
Building this index takes more time and memory than IVFFlat
, however it has better speed-recall tradeoff. Additionally, there's no training step like with IVFFlat
, so the index can be created on an empty table.
Index options
When creating an index with diskann
, you can specify various parameters to control its behavior. Here are the options that we currently have:
max_neighbors
: Maximum number of edges per node in the graph. (Defaults to 32)l_value_ib
: The size of the search list during index build (Defaults to 50)
CREATE INDEX my_table_embedding_diskann_custom_idx ON my_table USING diskann (embedding vector_cosine_ops)
WITH (
max_neighbors = 48,
l_value_ib = 100
);
The L value for index scanning (l_value_is
) can be set for the whole connection or per transaction (using SET LOCAL
within a transaction block):
SET diskann.l_value_is = 100;
SELECT * FROM my_table ORDER BY embedding <=> '[1,2,3]' LIMIT 5; -- uses 100 candidates
Postgres will automatically decide when to use the DiskANN index. If there are scenarios you always want to use the index, use the following command:
SET LOCAL enable_seqscan TO OFF;
SELECT * FROM my_table ORDER BY embedding <=> '[1,2,3]' LIMIT 5; -- forces the use of index
Indexing progress
With PostgreSQL 12 and newer, you can use pg_stat_progress_create_index
to check indexing progress.
SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" FROM pg_stat_progress_create_index;
Phases for building DiskANN indexes are:
initializing
loading tuples
Selecting the index access function
The vector
type allows you to perform three types of searches on the stored vectors. You need to select the correct access function for your index in order to have the database consider your index when executing your queries. The examples demonstrate on ivfflat
index types, however the same can be done for hnsw
and diskann
indexes. The lists
option only applies to ivfflat
indexes.
Cosine distance
For cosine similarity search, use the vector_cosine_ops
access method.
CREATE INDEX t_test_embedding_cosine_idx ON t_test USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
To use the above index, the query needs to perform a cosine similarity search, which is done with the <=>
operator.
EXPLAIN SELECT * FROM t_test ORDER BY embedding <=> '[1,2,3]' LIMIT 5;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Limit (cost=5.02..5.23 rows=5 width=33)
-> Index Scan using t_test_embedding_cosine_idx on t_test (cost=5.02..175.06 rows=4003 width=33)
Order By: (embedding <=> '[1,2,3]'::vector)
(3 rows)
L2 distance
For L2 distance (also known as Euclidean distance), use the vector_l2_ops
access method.
CREATE INDEX t_test_embedding_l2_idx ON t_test USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
To use the above index, the query needs to perform an L2 distance search, which is done with the <->
operator.
EXPLAIN SELECT * FROM t_test ORDER BY embedding <-> '[1,2,3]' LIMIT 5;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Limit (cost=5.02..5.23 rows=5 width=33)
-> Index Scan using t_test_embedding_l2_idx on t_test (cost=5.02..175.06 rows=4003 width=33)
Order By: (embedding <-> '[1,2,3]'::vector)
(3 rows)
Inner product
For inner product similarity, use the vector_ip_ops
access method.
CREATE INDEX t_test_embedding_ip_idx ON t_test USING ivfflat (embedding vector_ip_ops) WITH (lists = 100);
To use the above index, the query needs to perform an inner product similarity search, which is done with the <#>
operator.
EXPLAIN SELECT * FROM t_test ORDER BY embedding <#> '[1,2,3]' LIMIT 5;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Limit (cost=5.02..5.23 rows=5 width=33)
-> Index Scan using t_test_embedding_ip_idx on t_test (cost=5.02..175.06 rows=4003 width=33)
Order By: (embedding <#> '[1,2,3]'::vector)
(3 rows)
Partial indexes
In some scenarios, it's beneficial to have an index that covers only a partial set of the data. We can, for example, build an index just for our premium users:
CREATE INDEX t_premium ON t_test USING ivfflat (vec vector_ip_ops) WITH (lists = 100) WHERE tier = 'premium';
We can now see the premium tier now uses the index:
explain select * from t_test where tier = 'premium' order by vec <#> '[2,2,2]';
QUERY PLAN
------------------------------------------------------------------------------------
Index Scan using t_premium on t_test (cost=65.57..25638.05 rows=245478 width=39)
Order By: (vec <#> '[2,2,2]'::vector)
(2 rows)
While the free tier users lack the benefit:
explain select * from t_test where tier = 'free' order by vec <#> '[2,2,2]';
QUERY PLAN
-----------------------------------------------------------------------
Sort (cost=44019.01..44631.37 rows=244941 width=39)
Sort Key: ((vec <#> '[2,2,2]'::vector))
-> Seq Scan on t_test (cost=0.00..15395.25 rows=244941 width=39)
Filter: (tier = 'free'::text)
(4 rows)
Having only a subset of data indexed, means the index takes less space on disk and is faster to search through.
PostgreSQL might fail to recognize that the index is safe to use if the form used in the WHERE
clause of the partial index definition doesn't match the one used in your queries.
In our example dataset, we only have the exact values 'free'
, 'test'
and 'premium'
as the distinct values of the tier column. Even with a query using tier LIKE 'premium'
PostgreSQL isn't using the index.
explain select * from t_test where tier like 'premium' order by vec <#> '[2,2,2]';
QUERY PLAN
-----------------------------------------------------------------------
Sort (cost=44086.30..44700.00 rows=245478 width=39)
Sort Key: ((vec <#> '[2,2,2]'::vector))
-> Seq Scan on t_test (cost=0.00..15396.59 rows=245478 width=39)
Filter: (tier ~~ 'premium'::text)
(4 rows)
Partitioning
One way to improve performance is to split the dataset over multiple partitions. We can imagine a system when it's natural to refer to data just from the current year or maybe past two years. In such a system, you can partition your data by a date range and then capitalize on improved performance when the system is able to read only the relevant partitions as defined by the queried year.
Let us define a partitioned table:
CREATE TABLE t_test_partitioned(vec vector(3), vec_date date default now()) partition by range (vec_date);
We can manually create partitions for every year or use the Citus utility function (available on Cosmos DB for PostgreSQL).
select create_time_partitions(
table_name := 't_test_partitioned',
partition_interval := '1 year',
start_from := '2020-01-01'::timestamptz,
end_at := '2024-01-01'::timestamptz
);
Check the created partitions:
\d+ t_test_partitioned
Partitioned table "public.t_test_partitioned"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
----------+-----------+-----------+----------+---------+----------+-------------+--------------+-------------
vec | vector(3) | | | | extended | | |
vec_date | date | | | now() | plain | | |
Partition key: RANGE (vec_date)
Partitions: t_test_partitioned_p2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
t_test_partitioned_p2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
t_test_partitioned_p2022 FOR VALUES FROM ('2022-01-01') TO ('2023-01-01'),
t_test_partitioned_p2023 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01')
To manually create a partition:
CREATE TABLE t_test_partitioned_p2019 PARTITION OF t_test_partitioned FOR VALUES FROM ('2019-01-01') TO ('2020-01-01');
Then make sure that your queries actually filter down to a subset of available partitions. For example in the query below we filtered down to two partitions:
explain analyze select * from t_test_partitioned where vec_date between '2022-01-01' and '2024-01-01';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..58.16 rows=12 width=36) (actual time=0.014..0.018 rows=3 loops=1)
-> Seq Scan on t_test_partitioned_p2022 t_test_partitioned_1 (cost=0.00..29.05 rows=6 width=36) (actual time=0.013..0.014 rows=1 loops=1)
Filter: ((vec_date >= '2022-01-01'::date) AND (vec_date <= '2024-01-01'::date))
-> Seq Scan on t_test_partitioned_p2023 t_test_partitioned_2 (cost=0.00..29.05 rows=6 width=36) (actual time=0.002..0.003 rows=2 loops=1)
Filter: ((vec_date >= '2022-01-01'::date) AND (vec_date <= '2024-01-01'::date))
Planning Time: 0.125 ms
Execution Time: 0.036 ms
You can index a partitioned table.
CREATE INDEX ON t_test_partitioned USING ivfflat (vec vector_cosine_ops) WITH (lists = 100);
explain analyze select * from t_test_partitioned where vec_date between '2022-01-01' and '2024-01-01' order by vec <=> '[1,2,3]' limit 5;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.13..12.20 rows=2 width=44) (actual time=0.040..0.042 rows=1 loops=1)
-> Merge Append (cost=4.13..12.20 rows=2 width=44) (actual time=0.039..0.040 rows=1 loops=1)
Sort Key: ((t_test_partitioned.vec <=> '[1,2,3]'::vector))
-> Index Scan using t_test_partitioned_p2022_vec_idx on t_test_partitioned_p2022 t_test_partitioned_1 (cost=0.04..4.06 rows=1 width=44) (actual time=0.022..0.023 rows=0 loops=1)
Order By: (vec <=> '[1,2,3]'::vector)
Filter: ((vec_date >= '2022-01-01'::date) AND (vec_date <= '2024-01-01'::date))
-> Index Scan using t_test_partitioned_p2023_vec_idx on t_test_partitioned_p2023 t_test_partitioned_2 (cost=4.08..8.11 rows=1 width=44) (actual time=0.015..0.016 rows=1 loops=1)
Order By: (vec <=> '[1,2,3]'::vector)
Filter: ((vec_date >= '2022-01-01'::date) AND (vec_date <= '2024-01-01'::date))
Planning Time: 0.167 ms
Execution Time: 0.139 ms
(11 rows)
Conclusion
Congratulations, you just learned the tradeoffs, limitations and best practices to achieve the best performance with pgvector
.