Delete documents from Azure AI Search Index

Hemachandra Siddani 0 Reputation points
2024-08-30T08:44:05.6433333+00:00

Hi , We used the "import and vectorize" wizard in the Azure Ai Search service for importing data from azure blob storage. This created an Index and Indexer. The Key field in the index shows as Chunk Id. The search works as expected. The problem we are experiencing is when a document is deleted from the Azure Blob Storage, we would like the Search Service not to fetch data from the deleted document. We would like this to be removed the Vector Index.

We tried using the REST API for the index ( @search.action =delete ) but it expected ID field which the "Import nd vectorize data" wizard does not create as part of index schema. Any suggestions on how to go about this issue ? Any help is highly appreciated.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,165 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247 27,281 Reputation points MVP
    2024-09-12T13:54:10.0266667+00:00

    Hi Hemachandra Siddani,

    Thanks for reaching out to Microsoft Q&A.

    To address the issue of deleting documents from the azure ai search index when the associated documents are deleted from azure blob storage, you can explore few options:

    1. Configure the indexer for soft delete detection:

    Azure cognitive search indexers support detecting and removing deleted documents by configuring the indexer with a "soft delete" column. Since you are using blob storage, you can modify your indexer configuration to use a soft delete field that detects when a blob has been deleted. This field should indicate deletion (ex: a boolean field or null). You can configure the indexer to automatically remove those items from the index when the corresponding blob is deleted or flagged as inactive.

    1. Use a custom indexer with ID mapping:

    If the wizard generated index does not have an id field, you can modify the index schema to include a unique identifier. This way, you can map the docu chunks to a specific id (ex: blob file name or any unique attribute from the source). Once you have an id field, you can use the rest api with '@search.action=delete' to remove documents from the index by referencing the document id.

    1. Manually track deletions in blob storage:

    As a more manual approach, you could implement a mechanism that tracks deletions in the blob storage (ex: using event grid triggers). When a file is deleted in blob storage, trigger a process that invokes the azure search rest api to remove the corresponding index entries.

    1. Rebuild the indexer regularly:

    if the number of deletions is low or can be handled periodically, you might consider rebuilding the index from scratch periodically. This will sync the index with the current state of blob storage, effectively removing entries for deleted documents.

    I suggest starting with the first option, configuring the indexer for soft delete detection, as it automates the process. However, if that does not fit your case, adding a unique id field to your schema for api-based deletion would be a more precise solution.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    2 people found this answer helpful.

  2. Sina Salam 16,526 Reputation points
    2025-01-22T21:52:25.5966667+00:00

    Hello Hemachandra Siddani,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having issue to delete documents from Azure AI Search Index.

    This will suggest modifying the index or using advanced configurations not possible with the wizard-generated setup and also, address the absence of an ID field directly created by the wizard.

    You can configure an Event Grid subscription for the blob storage to track BlobDeleted events and Trigger an Azure Function or Logic App to call the Azure AI Search REST API for document deletion.

         import requests
         
         def delete_document(chunk_id, search_service, api_key):
             url = f"https://{search_service}.search.windows.net/indexes/{index_name}/docs/index?api-version=2020-06-30"
             headers = {"Content-Type": "application/json", "api-key": api_key}
             payload = [{"@search.action": "delete", "chunk_id": chunk_id}]
             response = requests.post(url, headers=headers, json={"value": payload})
             response.raise_for_status()
    

    Secondly, if real-time updates are not critical, schedule periodic reindexing to remove stale entries.

    Thirdly, modify the index schema to include a unique identifier (e.g., blob file name or a unique attribute from the source). This will allow you to use the REST API with @search.action=delete.

    If modifying the index schema is not feasible, configure the indexer for soft delete detection by adding a soft delete field that indicates deletion (e.g., a boolean field or null).

    Finally, implement a mechanism to manually track deletions in blob storage and invoke the REST API to remove the corresponding index entries.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.