How to not unecessarily re-embedd documents in Azure AI Search?

Noah Pursell 0 Reputation points
2025-01-10T04:06:45.4266667+00:00

Hello all!

I am using Azure AI Search to store some vectorized documents. In my use case, I will receive a new set of documents periodically. I want to add these to my Azure AI Search index. However, there is a high probability that some of these documents are already in the index. I am wondering if it is possible to only add the documents that are not already in the index (primarily to save time).

I do not see any built-in function to do this (I am mainly using Python/langchain). I also do not see any easy way to get a list of all document IDs from an index (this would allow me to do the filtering locally, and only push documents whose ID is not in the retrieved IDs).

Does anyone have any suggestions? It would be much appreciated!

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,141 questions
0 comments No comments
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.