Facing an Issue in Azure AI Search.

Question

Dear Microsoft Team,

I am facing an issue with AI Search and Azure OpenAI integration in my project. I have a 120-page PDF document that I split into 50 groups using Python and uploaded the grouped data to Azure AI Search. The index and indexer count are accurate at 50, and I built a chatbot using the Ada-embedding model via Azure OpenAI Service. While the AI Search Explorer retrieves accurate results, the chatbot provides correct answers for approximately 90% of queries. However, for certain cases, such as querying the "flip policy" on page 73 (grouped within pages 70–73), the chatbot fails to provide the correct output. It seems the chatbot might not be scanning the document completely for some queries. Could you help identify if the issue is originating from AI Search or the chatbot and suggest solutions to resolve it? I'm attaching the error details below.

We have created an Azure Vector index with id ‘4358cd94-9515-422b-b642-3abdefd1ce10’. Along with the Text we are storing the embeddings. We are doing text chunking having one page of the Pdf in one Azure Page. The source document we are using contains around 120 pages and the text is spread across multiple pages. So before creating the Azure Index we are grouping the pages then we create embeddings then we store it in Azure index. After grouping total number of pages is 50.

When we do a Hybrid Search with the question ‘What are the requirements for appraisal on a flip transaction?’ Vector Search is not returning the Document containing the below text which is in Page number 74:

13. Flip Policy

Flip transactions must comply with the HPML appraisal rules in Regulation-Z (Reg-Z). The full Reg-Z revisions can be found at http://www.consumerfinance.gov/regulations/appraisals-for-higher-priced-mortgage-loans. A second appraisal is required in the following circumstances: · Greater than 10% increase in sales price if the seller acquired the property in the past 90-days · Greater than 20% increase in sales price if the seller acquired the property in the past 91-180 days · These requirements do not apply if the seller is FNMA, FHLMC, HUD or any other government entity.

The Hybrid Search will return below page contents:

'Title': 'Content from Pages 101-104', 'Score': 0.032786883413791656, 'Content': 'FOR INTERNAL USE ONLY ANGEL OAK INVESTOR CASH …

'Title': 'Content from Pages 14-26', 'Score': 0.0320020467042923, 'Content': '2 Appraisal and Property Requirements 2.1 Appraisal Transfers Appraisal transfers …

'Title': 'Content from Pages 70-73', 'Score': 0.03021353855729103, 'Content': '12 Transaction Types 12.1 Purchase The lesser of the purchase price or appraised value of the subject property is used to calculate …

'Title': 'Content from Pages 37-49', 'Score': 0.01587301678955555, 'Content': "5 Credit and Liabilities 5.1 General Information A U.S. credit report is required for each borrower on the loan …

'Title': 'Content from Pages 33-36', 'Score': 0.015384615398943424, 'Content': '4 Borrowers 4.1 Borrowers â€“ General The USA Patriot Act requires banks and financial institutions to verify the name …

Below is the code snipped used for Hybrid Search:

response = requests.post(

"https://api.openai.com/v1/embeddings",

headers=headers,

json={

"input": query,

"model": "text-embedding-ada-002"

}

)

response.raise_for_status()

embedding = response.json()["data"][0]["embedding"]

# Adjust vector query parameters

vector_query = VectorizedQuery(

vector=embedding,

k_nearest_neighbors=5, # Increased from 5 to 10

fields="contentVector"

)

# Add hybrid search

search_client = SearchClient(endpoint=config["endpoint"],

index_name=index_name,

credential=AzureKeyCredential(config["admin_key"]))

# Combine vector search with keyword search

results = search_client.search(

search_text=query, # Add keyword search

vector_queries=[vector_query],

select=["title", "content", "category"],

top=10, # Increase number of results

semantic_configuration_name="my-semantic-config" # Use semantic search

)

response = []

for result in results:

item = {

"Title": result['title'],

"Score": result['@search.score'],

"Content": result['content'],

"Category": result['category']

}

response.append(item)

# Sort results by relevance score

response.sort(key=lambda x: x['Score'], reverse=True)

Best regards, Sudhakar.P

Answer

Hi there Sudhakar P

Thanks for using QandA platform

seems like your current chunking strategy may cause the chatbot to miss context if related content spans multiple groups. try revising the chunking process to include overlapping content between groups. also, while the text-embedding-ada is suitable for general embeddings, it might not capture specific domain nuances. try fine-tuning the embeddings or trying a different model. Do preprocessing user queries to include related keywords that sometiemes can improve semantic matching.

If this helps kindly accept the answer thanks much.

Share via

Facing an Issue in Azure AI Search.

1 answer

Your answer