Is it possible to find out what documents AzureOpenAI indexed when using "Use Your Own Data"?

Vedant Bahel 0 Reputation points
2025-01-02T22:38:00.0933333+00:00

I am using the "Use your own data" feature of Azure OpenAI service to restrict results only from my own datafiles (PDF) stored in blob storage. Is it possible to find out what documents AzureOpenAI indexed when using "Use Your Own Data"?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,480 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Pavankumar Purilla 2,130 Reputation points Microsoft Vendor
    2025-01-03T01:03:34.6+00:00

    Hi Vedant Bahel,
    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    Yes, it is possible to find out what documents Azure OpenAI indexed when using the "Use Your Own Data" feature. When you create a new model using your own data, you can specify the location of your data files in Azure Blob Storage. The model will then index the contents of those files and use them to train the model.
    You can use the Azure Portal to view the contents of your Blob Storage container. Navigate to the container that contains your data files, and you should see a list of all the files that were indexed by the model.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.


  2. Daniel FANG 960 Reputation points MVP
    2025-01-06T11:27:24.99+00:00

    When you use chat completion with a reference to ai search, the API response will contain a citation to the original document in AI search. for example, you can test it out in playground and click on the citation as shown blow.

    User's image

    these information is also available in the api call, the request and response looks like below. if you look at the response screenshot, the json blocks under citations are your ai search documents. My example's filepath is null because the sample data did not set the file path correctly, but it will work out if a file name or path field is added to the json document when you ingesting the document. Then you can use to find the document base on this value. If you are using AI Foundry to ingest the document (i.e setup using azure portal), a file name field should be automatically added to the document.

    import os  
    import base64
    from openai import AzureOpenAI  
    
    endpoint = os.getenv("ENDPOINT_URL", "https://xxx-prd-openai.openai.azure.com/")  
    deployment = os.getenv("DEPLOYMENT_NAME", "gpt-4o")  
    search_endpoint = os.getenv("SEARCH_ENDPOINT", "https://xxx-prd-search.search.windows.net")  
    search_key = os.getenv("SEARCH_KEY", "xxxx")  
    search_index = os.getenv("SEARCH_INDEX_NAME", "xxx-sp3code")  
    subscription_key = os.getenv("AZURE_OPENAI_API_KEY", "xxxx")  
    
    # Initialize Azure OpenAI client with key-based authentication    
    client = AzureOpenAI(  
        azure_endpoint=endpoint,  
        api_key=subscription_key,  
        api_version="2024-05-01-preview",  
    )
        
    # Generate the completion  
    completion = client.chat.completions.create(  
        model=deployment,  
        messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "move robot"}
      ],  
        max_tokens=800,  
        temperature=0.7,  
        top_p=0.95,  
        frequency_penalty=0,  
        presence_penalty=0,  
        stop=None,  
        stream=False,
        extra_body={
          "data_sources": [{
              "type": "azure_search",
              "parameters": {
                "endpoint": f"{search_endpoint}",
                "index_name": "xxx-sp3code",
                "semantic_configuration": "semantic-search-config",
                "query_type": "semantic",
                "fields_mapping": {},
                "in_scope": True,
                "role_information": "You are an AI assistant that helps people find information.",
                "filter": None,
                "strictness": 3,
                "top_n_documents": 5,
                "authentication": {
                  "type": "api_key",
                  "key": f"{search_key}"
                }
              }
            }]
        }
    )
    
    print(completion.to_json())  
        
    
    

    User's image

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.