Why is my RAG model giving non-deterministic answer to same question?

Mason 20 Reputation points
2024-11-19T19:18:26.98+00:00

I am using azure virtual machines to host a LLM that is in a RAG model. I ask it the same question multiple times and it gives me different results. It says 'I don't know the answer' sometimes and sometimes it gives the actual answer (which is the correct answer). In the prompt, I specified for the model to say 'I don't know the answer' when the answer is not in the context. Why would it miss categorize the response in different iterations?

A note is that I have temperature = 0, top_p = 0, and a random seed set on the model. It seems that the Azure AI Search is giving the same result each time the question is asked (the model uses the context from the search).

A thought that I had is that the azure virtual machine gives non-deterministic calculations. I am using the Standard d2 v3 machine, but this is not the only machine I have used and had this problem.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,306 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 12,816 Reputation points
    2024-11-20T17:46:46.7366667+00:00

    Hello Mason,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that your RAG model giving non-deterministic answer to same question.

    The issue seems the issue might be deeper, possibly related to how the model processes the context provided by Azure AI Search.

    To address the inconsistency, I will advise you do the followings:

    1. You will need to Refine Prompt Engineering by making sure the prompt explicitly and unambiguously defines the conditions for when the model should respond with "I don't know the answer. For example:

    Context: {retrieved_context}

    Question: {user_question}

    If the context contains information to answer the question, provide the answer. Otherwise, respond with "I don't know the answer." Do not guess.

    1. Log the context retrieved from Azure AI Search for every query and compare these logs across iterations to confirm consistency.
    2. Validate Determinism by enabling and log the seed in API responses. This ensures the same configuration is applied across all iterations, and if available, compare system_fingerprint to rule out backend discrepancies.
    3. Also, unlikely, but make sure that your VM environment has no variability in computational precision. Switching to a higher-performance VM or a managed Azure service like Azure OpenAI could eliminate potential VM-related variability.
    4. Use an intermediary logging mechanism to observe how the model processes the retrieved context and determines whether the answer is present. This might reveal issues with the classification step.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.