How do we delete specific documents from an Azure AI Search Index?

Aravind Vijayaraghavan 40 Reputation points
2025-01-22T10:39:26.15+00:00

I tried to delete some few documents beyond a certain date for my date field, but it ended up deleting a lot more. I realised its because my date field is just string values so it ended up in so much deletion. How do I delete specific documents for greater than or lesser than values for certain fields or for string fields specifically? All my fields are strings and vectors. This is my current code:

def chunk_data(data, chunk_size):
    """Helper function to chunk data into smaller pieces."""
    for i in range(0, len(data), chunk_size):
        yield data[i:i + chunk_size]

def delete_documents_from_search_client(date_ymd, chunk_size=32000):
    """Deletes documents from the search index with 'date_ymd' equal or below the specified date."""
    query = f"date_ymd le '{date_ymd}'"
    results = search_client.search(query)  
    
    data_to_delete = []

    for result in results:
        document_id = result["hardware_id"]
        data_to_delete.append({
            "@search.action": "delete",
            "hardware_id": document_id
        })

    for chunk in chunk_data(data_to_delete, chunk_size):
        try:
            result = search_client.upload_documents(documents=chunk)
            print(f"Deleted {len(chunk)} documents successfully.")
        except HttpResponseError as e:
            print(f"An error occurred during document delete: {e}")
            return None

delete_documents_from_search_client("2024-12-02")
Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,170 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,094 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sina Salam 17,016 Reputation points
    2025-01-22T22:19:37.4266667+00:00

    Hello Aravind Vijayaraghavan,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    Though the topic of your question was initially marked as the same with: https://learn.microsoft.com/en-us/answers/questions/2029279/delete-documents-from-azure-ai-search-index?page=1&orderby=Helpful#answers but after my careful evaluation of your question and the one in the link, they are not the same.

    So, I understand that your question involves filtering and deleting documents based on string-based date fields, with excessive deletions caused by improper comparison logic.

    This will help you to convert fields to DateTime or reindexing, in a better approach.

    1. Preprocess Dates in Code by sort the string-based dates to ensure proper comparison.
            def normalize_date_string(date_str):
                return date_str.replace("-", "")  # Convert YYYY-MM-DD to YYYYMMDD for accurate comparison
      
    2. Modify Query Logic by ensuring consistent comparison logic for string dates.
              def delete_documents_with_string_date(date_ymd):
                  query = f"date_ymd le '{normalize_date_string(date_ymd)}'"
                  # Continue with existing deletion logic
      
    3. Reindex if feasible, reimport data with proper field types, ensuring the date field uses Edm.DateTimeOffset. I will advise consider reindexing your data with proper field types for long-term accuracy.

    To accurately delete documents based on string-based date fields, you can preprocess the date strings to ensure proper comparison and modify the query logic accordingly. The below is a full refined code:

    def chunk_data(data, chunk_size):
        """Helper function to chunk data into smaller pieces."""
        for i in range(0, len(data), chunk_size):
            yield data[i:i + chunk_size]
    def normalize_date_string(date_str):
        """Convert YYYY-MM-DD to YYYYMMDD for accurate comparison."""
        return date_str.replace("-", "")
    def delete_documents_from_search_client(date_ymd, chunk_size=32000):
        """Deletes documents from the search index with 'date_ymd' equal or below the specified date."""
        normalized_date = normalize_date_string(date_ymd)
        query = f"date_ymd le '{normalized_date}'"
        results = search_client.search(query)  
        
        data_to_delete = []
        for result in results:
            document_id = result["hardware_id"]
            data_to_delete.append({
                "@search.action": "delete",
                "hardware_id": document_id
            })
        for chunk in chunk_data(data_to_delete, chunk_size):
            try:
                result = search_client.upload_documents(documents=chunk)
                print(f"Deleted {len(chunk)} documents successfully.")
            except HttpResponseError as e:
                print(f"An error occurred during document delete: {e}")
                return None
    delete_documents_from_search_client("2024-12-02")
    

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.