How to not unecessarily re-embedd documents in Azure AI Search?
Hello all!
I am using Azure AI Search to store some vectorized documents. In my use case, I will receive a new set of documents periodically. I want to add these to my Azure AI Search index. However, there is a high probability that some of these documents are already in the index. I am wondering if it is possible to only add the documents that are not already in the index (primarily to save time).
I do not see any built-in function to do this (I am mainly using Python/langchain). I also do not see any easy way to get a list of all document IDs from an index (this would allow me to do the filtering locally, and only push documents whose ID is not in the retrieved IDs).
Does anyone have any suggestions? It would be much appreciated!