Hello Arik Levy,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that you would like to know how to work with search across multiple vector indexes in PostgreSQL with PGVector.
Regarding your questions, PostgreSQL does not natively support using multiple vector indexes (e.g., IVFFLAT
and HNSW
) in the same query when calculating a combined similarity score across different embedding columns. The query planner can only use one index per query for ordering purposes, which means the combined query will fall back to a sequential scan after computing the similarity for both embeddings.
- PostgreSQL’s query planner typically selects a single index for sorting the results when using
ORDER BY
. Even if bothname_embedding
andgeo_embedding
have separate indexes, PostgreSQL cannot combine the results of both indexes in a single query for the final ordering. - The
pgvector
extension, though powerful, relies on PostgreSQL's planner, which does not yet optimize for multiple vector operations in the same query.
PostgreSQL’s query planner doesn’t support combining two vector indexes in a single query efficiently. Using CTEs or similar approaches can help achieve the desired result while still leveraging the indexes. Due to query planner limitations, you can break down the query into two parts, use the indexes separately, and then combine the results.
WITH name_results AS (
SELECT
id,
source,
name,
latitude,
longitude,
(name_embedding <=> '{name_embedding_str}'::vector) AS name_similarity
FROM
entities
ORDER BY
name_similarity ASC
LIMIT 100 -- Adjust this as necessary
),
geo_results AS (
SELECT
id,
(geo_embedding <-> '{geo_embedding_str}'::vector) AS geo_similarity
FROM
entities
ORDER BY
geo_similarity ASC
LIMIT 100 -- Adjust this as necessary
)
SELECT
e.id,
e.source,
e.name,
e.latitude,
e.longitude,
nr.name_similarity,
gr.geo_similarity,
(0.3 * nr.name_similarity) + (0.8 * gr.geo_similarity) AS combined_similarity
FROM
name_results nr
JOIN
geo_results gr
ON
nr.id = gr.id
JOIN
entities e
ON
e.id = nr.id
ORDER BY
combined_similarity ASC
LIMIT 10;
How it will work:
- Perform two separate queries to leverage the
ivfflat
andhnsw
indexes effectively. - Use a
JOIN
onid
to merge the results and calculatecombined_similarity
. - Use
LIMIT
in each CTE to reduce the number of rows considered.
PostgreSQL will apply the indexes to each CTE query, using ivfflat
for name_embedding
and hnsw
for geo_embedding
. The final combination ensures a balance between the two similarity scores.
If performance becomes an issue, you can alternative approaches:
- Consider using tools like
pg_bouncer
or partitioned queries to improve parallelism. - Precompute similarity scores for frequent queries and store them in a materialized view.
- Fine-tune the
ivfflat
andhnsw
parameters (e.g.,lists
,m
,ef_construction
) for better performance on larger datasets. - Partition your dataset to reduce the number of rows per query.
I hope this is helpful! Do not hesitate to let me know if you have any other questions.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.