Esercitazione: eseguire ricerche nei dati usando un modello di chat (RAG in Azure AI Search)

Articolo
01/09/2025

La caratteristica di definizione di una soluzione RAG in Ricerca di intelligenza artificiale di Azure consiste nell'inviare query a un modello LLM (Large Language Model) per un'esperienza di ricerca conversazionale sul contenuto indicizzato. Può essere sorprendentemente facile se si implementano solo le nozioni di base.

In questa esercitazione:

Configurare i client
Scrivere le istruzioni per il LLM
Fornire una query progettata per gli input del LLM
Rivedere i risultati ed esplorare i passaggi successivi

Questa esercitazione si basa sulle esercitazioni precedenti. Si presuppone che sia disponibile un indice di ricerca creato dalla pipeline di indicizzazione.

Prerequisiti

Visual Studio Code con l'estensione Python e il pacchetto Jupyter. Per altre informazioni, vedere Python in Visual Studio Code.
Azure AI Search, in un'area condivisa con Azure OpenAI.
Azure OpenAI, con una distribuzione di gpt-4o. Per altre informazioni, vedere Scegliere i modelli per RAG in Azure AI Search.

Scaricare l'esempio

Si usa lo stesso notebook dell'esercitazione precedente sulla pipeline di indicizzazione. Gli script per l'esecuzione di query sull'LLM seguono i passaggi di creazione della pipeline. Se il notebook non è già disponibile, scaricarlo da GitHub.

Configurare i client per l'invio di query

Il modello RAG in Azure AI Search è una serie sincronizzata di connessioni a un indice di ricerca per ottenere dati di base, seguita da una connessione a un LLM per formulare risposte alle domande dell'utente. La stessa stringa di query viene usata da entrambi i client.

Si configurano due client, quindi sono necessari endpoint e autorizzazioni per entrambe le risorse. Questa esercitazione presuppone la configurazione delle assegnazioni di ruolo per le connessioni autorizzate, ma è necessario fornire gli endpoint nel notebook di esempio:

# Set endpoints and API keys for Azure services
AZURE_SEARCH_SERVICE: str = "PUT YOUR SEARCH SERVICE ENDPOINT HERE"
# AZURE_SEARCH_KEY: str = "DELETE IF USING ROLES, OTHERWISE PUT YOUR SEARCH SERVICE ADMIN KEY HERE"
AZURE_OPENAI_ACCOUNT: str = "PUR YOUR AZURE OPENAI ENDPOINT HERE"
# AZURE_OPENAI_KEY: str = "DELETE IF USING ROLES, OTHERWISE PUT YOUR AZURE OPENAI KEY HERE"

Script di esempio per prompt e query

Ecco lo script Python che crea un'istanza dei client, definisce il prompt e configura la query. È possibile eseguire questo script nel notebook per generare una risposta dalla distribuzione del modello di chat.

Per il cloud Azure per enti pubblici, modificare l'endpoint API nel provider di token in "https://cognitiveservices.azure.us/.default".

# Import libraries
from azure.search.documents import SearchClient
from openai import AzureOpenAI

token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
openai_client = AzureOpenAI(
     api_version="2024-06-01",
     azure_endpoint=AZURE_OPENAI_ACCOUNT,
     azure_ad_token_provider=token_provider
 )

deployment_name = "gpt-4o"

search_client = SearchClient(
     endpoint=AZURE_SEARCH_SERVICE,
     index_name=index_name,
     credential=credential
 )

# Provide instructions to the model
GROUNDED_PROMPT="""
You are an AI assistant that helps users learn from the information found in the source material.
Answer the query using only the sources provided below.
Use bullets if the answer has multiple points.
If the answer is longer than 3 sentences, provide a summary.
Answer ONLY with the facts listed in the list of sources below. Cite your source when you answer the question
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
Query: {query}
Sources:\n{sources}
"""

# Provide the search query. 
# It's hybrid: a keyword search on "query", with text-to-vector conversion for "vector_query".
# The vector query finds 50 nearest neighbor matches in the search index
query="What's the NASA earth book about?"
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")

# Set up the search results and the chat thread.
# Retrieve the selected fields from the search index related to the question.
# Search results are limited to the top 5 matches. Limiting top can help you stay under LLM quotas.
search_results = search_client.search(
    search_text=query,
    vector_queries= [vector_query],
    select=["title", "chunk", "locations"],
    top=5,
)

# Newlines could be in the OCR'd content or in PDFs, as is the case for the sample PDFs used for this tutorial.
# Use a unique separator to make the sources distinct. 
# We chose repeated equal signs (=) followed by a newline because it's unlikely the source documents contain this sequence.
sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])

response = openai_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
        }
    ],
    model=deployment_name
)

print(response.choices[0].message.content)

Verificare i risultati

In questa risposta, la risposta si basa su cinque input (top=5) costituiti da blocchi determinati dal motore di ricerca per essere il più rilevante. Le istruzioni nel prompt indicano all'LLM di usare solo le informazioni contenute in sources o nei risultati della ricerca formattati.

I risultati della prima query "What's the NASA earth book about?" dovrebbero essere simili all'esempio seguente.

The NASA Earth book is about the intricate and captivating science of our planet, studied 
through NASA's unique perspective and tools. It presents Earth as a dynamic and complex 
system, observed through various cycles and processes such as the water cycle and ocean 
circulation. The book combines stunning satellite images with detailed scientific insights, 
portraying Earth’s beauty and the continuous interaction of land, wind, water, ice, and 
air seen from above. It aims to inspire and demonstrate that the truth of our planet is 
as compelling as any fiction.

Source: page-8.pdf

È previsto che gli LLM restituiscano risposte diverse, anche se prompt e query sono invariate. Il risultato potrebbe essere molto diverso dall'esempio. Per altre informazioni, vedere Informazioni su come usare l'output riproducibile.

Nota

Durante il test di questa esercitazione, sono state presentate diverse risposte, alcune più rilevanti di altre. Talvolta, la ripetizione della stessa richiesta ha causato un deterioramento della risposta, probabilmente a causa di confusione nella cronologia delle chat, probabilmente con il modello che registra le richieste ripetute come risposte generate non soddisfacenti. La gestione della cronologia delle chat non rientra nell'ambito di questa esercitazione, ma includerla nel codice dell'applicazione attenua o persino elimina questo comportamento.

Aggiungi un filtro

Tenere presente che è stato creato un campo locations usando l'intelligenza artificiale applicata, che è popolato con posizioni riconosciute dalla competenza Riconoscimento entità. La definizione del campo per le posizioni include l'attributo filterable. Ripetere la richiesta precedente, ma questa volta si aggiunga un filtro che selezioni il termine ghiaccio nel campo posizione.

Un filtro introduce criteri di inclusione o esclusione. Il motore di ricerca sta ancora eseguendo una ricerca vettoriale su "What's the NASA earth book about?", ma ora esclude le corrispondenze che non includono ghiaccio. Per altre informazioni sul filtro per le raccolte di stringhe e sulle query vettoriali, vedere Nozioni fondamentali sui filtri di testo, Informazioni sui filtri di raccolta e Aggiungere filtri a una query vettoriale.

Sostituire la definizione di search_results con l'esempio seguente che include un filtro:

query="what is the NASA earth book about?"
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")

# Add a filter that selects documents based on whether locations includes the term "ice".
search_results = search_client.search(
    search_text=query,
    vector_queries= [vector_query],
    filter="search.ismatch('ice*', 'locations', 'full', 'any')",
    select=["title", "chunk", "locations"],
    top=5
)

sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])

I risultati della query filtrata dovrebbero ora essere simili alla risposta seguente. Si noti l'enfasi sulla copertura di ghiaccio.

The NASA Earth book showcases various geographic and environmental features of Earth through 
satellite imagery, highlighting remarkable landscapes and natural phenomena. 

- It features extraordinary views like the Holuhraun Lava Field in Iceland, captured by 
Landsat 8 during an eruption in 2014, with false-color images illustrating different elements 
such as ice, steam, sulfur dioxide, and fresh lava ([source](page-43.pdf)).
- Other examples include the North Patagonian Icefield in South America, depicted through 
clear satellite images showing glaciers and their changes over time ([source](page-147.pdf)).
- It documents melt ponds in the Arctic, exploring their effects on ice melting and 
- heat absorption ([source](page-153.pdf)).
  
Overall, the book uses satellite imagery to give insights into Earth's dynamic systems 
and natural changes.

Modificare gli input

L'aumento o la diminuzione del numero di input per LLM può avere un effetto importante sulla risposta. Provare a eseguire di nuovo la stessa query dopo aver impostato top=8. Quando si aumentano gli input, il modello restituisce risultati diversi ogni volta, anche se la query non cambia.

Ecco un esempio di ciò che il modello restituisce dopo aver aumentato gli input a 8.

The NASA Earth book features a range of satellite images capturing various natural phenomena 
across the globe. These include:

- The Holuhraun Lava Field in Iceland documented by Landsat 8 during a 2014 volcanic 
eruption (Source: page-43.pdf).
- The North Patagonian Icefield in South America, highlighting glacial landscapes 
captured in a rare cloud-free view in 2017 (Source: page-147.pdf).
- The impact of melt ponds on ice sheets and sea ice in the Arctic, with images from 
an airborne research campaign in Alaska during July 2014 (Source: page-153.pdf).
- Sea ice formations at Shikotan, Japan, and other notable geographic features in various 
locations recorded by different Landsat missions (Source: page-168.pdf).

Summary: The book showcases satellite images of diverse Earth phenomena, such as volcanic 
eruptions, icefields, and sea ice, to provide insights into natural processes and landscapes.

Poiché il modello è associato ai dati di base, la risposta diventa più estesa man mano che si aumentano le dimensioni dell'input. È possibile usare l'ottimizzazione della pertinenza per generare potenzialmente risposte più mirate.

Modificare i prompt

È anche possibile modificare il prompt per controllare il formato dell'output, del tono e se si vuole che il modello integri la risposta con i propri dati di training modificando il prompt.

Ecco un altro esempio di output LLM se ci concentriamo sulla richiesta di identificazione delle posizioni per lo studio scientifico.

# Provide instructions to the model
GROUNDED_PROMPT="""
You are an AI assistant that helps scientists identify locations for future study.
Answer the query cocisely, using bulleted points.
Answer ONLY with the facts listed in the list of sources below.
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
Do not exceed 5 bullets.
Query: {query}
Sources:\n{sources}
"""

L'output dalla modifica solo del prompt, altrimenti mantenendo tutti gli aspetti della query precedente, potrebbe essere simile a questo esempio.

The NASA Earth book appears to showcase various locations on Earth captured through satellite imagery, 
highlighting natural phenomena and geographic features. For instance, the book includes:

- The Holuhraun Lava Field in Iceland, detailing volcanic activity and its observation via Landsat 8.
- The North Patagonian Icefield in South America, covering its glaciers and changes over time as seen by Landsat 8.
- Melt ponds in the Arctic and their impacts on the heat balance and ice melting.
- Iceberg A-56 in the South Atlantic Ocean and its interaction with cloud formations.

(Source: page-43.pdf, page-147.pdf, page-153.pdf, page-39.pdf)

Suggerimento

Se si continua con l'esercitazione, ricordarsi di ripristinare il prompt al valore precedente (You are an AI assistant that helps users learn from the information found in the source material).

La modifica dei parametri e delle richieste influisce sulla risposta da LLM. Quando si esplora autonomamente, tenere presenti i suggerimenti seguenti:

Aumentare il valore top può esaurire la quota disponibile nel modello. Se non è presente alcuna quota, viene restituito un messaggio di errore o il modello potrebbe restituire "Non so".
Aumentare il valore top non migliora necessariamente il risultato. Nel test con la parte superiore, a volte si nota che le risposte non sono notevolmente migliori.
Quindi cosa potrebbe essere utile? In genere, la risposta è l'ottimizzazione della pertinenza. Il miglioramento della pertinenza dei risultati della ricerca di Azure AI Search è in genere l'approccio più efficace per ottimizzare l'utilità dell'LLM.

Nella prossima serie di esercitazioni, il focus passa all'ottimizzazione della pertinenza e all'ottimizzazione delle prestazioni delle query per velocità e concisione. Si rivisita la definizione dello schema e della logica di query per implementare le funzionalità di pertinenza, ma il resto della pipeline e dei modelli rimane invariato.

Passaggio successivo

Ottimizzare la pertinenza

Condividi tramite