Hello Aravind Vijayaraghavan,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
Though the topic of your question was initially marked as the same with: https://learn.microsoft.com/en-us/answers/questions/2029279/delete-documents-from-azure-ai-search-index?page=1&orderby=Helpful#answers but after my careful evaluation of your question and the one in the link, they are not the same.
So, I understand that your question involves filtering and deleting documents based on string-based date fields, with excessive deletions caused by improper comparison logic.
This will help you to convert fields to DateTime or reindexing, in a better approach.
- Preprocess Dates in Code by sort the string-based dates to ensure proper comparison.
def normalize_date_string(date_str): return date_str.replace("-", "") # Convert YYYY-MM-DD to YYYYMMDD for accurate comparison
- Modify Query Logic by ensuring consistent comparison logic for string dates.
def delete_documents_with_string_date(date_ymd): query = f"date_ymd le '{normalize_date_string(date_ymd)}'" # Continue with existing deletion logic
- Reindex if feasible, reimport data with proper field types, ensuring the date field uses
Edm.DateTimeOffset
. I will advise consider reindexing your data with proper field types for long-term accuracy.
To accurately delete documents based on string-based date fields, you can preprocess the date strings to ensure proper comparison and modify the query logic accordingly. The below is a full refined code:
def chunk_data(data, chunk_size):
"""Helper function to chunk data into smaller pieces."""
for i in range(0, len(data), chunk_size):
yield data[i:i + chunk_size]
def normalize_date_string(date_str):
"""Convert YYYY-MM-DD to YYYYMMDD for accurate comparison."""
return date_str.replace("-", "")
def delete_documents_from_search_client(date_ymd, chunk_size=32000):
"""Deletes documents from the search index with 'date_ymd' equal or below the specified date."""
normalized_date = normalize_date_string(date_ymd)
query = f"date_ymd le '{normalized_date}'"
results = search_client.search(query)
data_to_delete = []
for result in results:
document_id = result["hardware_id"]
data_to_delete.append({
"@search.action": "delete",
"hardware_id": document_id
})
for chunk in chunk_data(data_to_delete, chunk_size):
try:
result = search_client.upload_documents(documents=chunk)
print(f"Deleted {len(chunk)} documents successfully.")
except HttpResponseError as e:
print(f"An error occurred during document delete: {e}")
return None
delete_documents_from_search_client("2024-12-02")
I hope this is helpful! Do not hesitate to let me know if you have any other questions.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.