Tutorial: Choose embedding and chat models for RAG in Azure AI Search

A RAG solution built on Azure AI Search takes a dependency on embedding models for vectorization, and on chat models for conversational search over your data.

In this tutorial, you:

  • Learn which models in the Azure cloud work with built-in integration
  • Learn about the Azure models used for chat
  • Deploy models and collect model information for your code
  • Configure search engine access to Azure models
  • Learn about custom skills and vectorizers for attaching non-Azure models

If you don't have an Azure subscription, create a free account before you begin.

Prerequisites

  • The Azure portal, used to deploy models and configure role assignments in the Azure cloud.

  • An Owner or User Access Administrator role on your Azure subscription, necessary for creating role assignments. You use at least three Azure resources in this tutorial. The connections are authenticated using Microsoft Entra ID, which requires the ability to create roles. Role assignments for connecting to models are documented in this article. If you can't create roles, you can use API keys instead.

  • A model provider, such as Azure OpenAI, Azure AI Vision via an Azure AI services multi-service resource, or Azure AI Foundry.

    We use Azure OpenAI in this tutorial. Other providers are listed so that you know your options for integrated vectorization.

  • Azure AI Search, Basic tier or higher provides a managed identity used in role assignments.

  • A shared region. To complete all of the tutorials in this series, the region must support both Azure AI Search and the model provider. See supported regions for:

    Azure AI Search is currently facing limited availability in some regions. To confirm region status, check the Azure AI Search region list.

Tip

Check this article for a list of overlapping regions.

Review models supporting built-in vectorization

Vectorized content improves the query results in a RAG solution. Azure AI Search supports a built-in vectorization action in an indexing pipeline. It also supports vectorization at query time, converting text or image inputs into embeddings for a vector search. In this step, identify an embedding model that works for your content and queries. If you're providing raw vector data and raw vector queries, or if your RAG solution doesn't include vector data, skip this step.

Vector queries that include a text-to-vector conversion step must use the same embedding model that was used during indexing. The search engine doesn't throw an error if you use different models, but you get poor results.

To meet the same-model requirement, choose embedding models that can be referenced through skills during indexing and through vectorizers during query execution. The following table lists the skill and vectorizer pairs. To see how the embedding models are used, skip ahead to Create an indexing pipeline for code that calls an embedding skill and a matching vectorizer.

Azure AI Search provides skill and vectorizer support for the following embedding models in the Azure cloud.

Client Embedding models Skill Vectorizer
Azure OpenAI text-embedding-ada-002,
text-embedding-3-large,
text-embedding-3-small
AzureOpenAIEmbedding AzureOpenAIEmbedding
Azure AI Vision multimodal 4.0 1 AzureAIVision AzureAIVision
Azure AI Foundry model catalog Facebook-DinoV2-Image-Embeddings-ViT-Base,
Facebook-DinoV2-Image-Embeddings-ViT-Giant,
Cohere-embed-v3-english,
Cohere-embed-v3-multilingual
AML 2 Azure AI Foundry model catalog

1 Supports image and text vectorization.

2 Deployed models in the model catalog are accessed over an AML endpoint. We use the existing AML skill for this connection.

You can use other models besides the ones listed here. For more information, see Use non-Azure models for embeddings in this article.

Note

Inputs to an embedding models are typically chunked data. In an Azure AI Search RAG pattern, chunking is handled in the indexer pipeline, covered in another tutorial in this series.

Review models used for generative AI at query time

Azure AI Search doesn't have integration code for chat models, so you should choose an LLM that you're familiar with and that meets your requirements. You can modify query code to try different models without having to rebuild an index or rerun any part of the indexing pipeline. Review Search and generate answers for code that calls the chat model.

The following models are commonly used for a chat search experience:

Client Chat models
Azure OpenAI GPT-35-Turbo,
GPT-4,
GPT-4o,
GPT-4 Turbo

GPT-35-Turbo and GPT-4 models are optimized to work with inputs formatted as a conversation.

We use GPT-4o in this tutorial. During testing, we found that it's less likely to supplement with its own training data. For example, given the query "how much of the earth is covered by water?", GPT-35-Turbo answered using its built-in knowledge of earth to state that 71% of the earth is covered by water, even though the sample data doesn't provide that fact. In contrast, GPT-4o responded (correctly) with "I don't know".

Deploy models and collect information

Models must be deployed and accessible through an endpoint. Both embedding-related skills and vectorizers need the number of dimensions and the model name.

This tutorial series uses the following models and model providers:

  • Text-embedding-3-large on Azure OpenAI for embeddings
  • GPT-4o on Azure OpenAI for chat completion

You must have Cognitive Services OpenAI Contributor or higher to deploy models in Azure OpenAI.

  1. Go to Azure AI Foundry.

  2. Select Deployments on the left menu.

  3. Select Deploy model > Deploy base model.

  4. Select text-embedding-3-large from the dropdown list and confirm the selection.

  5. Specify a deployment name. We recommend "text-embedding-3-large".

  6. Accept the defaults.

  7. Select Deploy.

  8. Repeat the previous steps for gpt-4o.

  9. Make a note of the model names and endpoint. Embedding skills and vectorizers assemble the full endpoint internally, so you only need the resource URI. For example, given https://MY-FAKE-ACCOUNT.openai.azure.com/openai/deployments/text-embedding-3-large/embeddings?api-version=2024-06-01, the endpoint you should provide in skill and vectorizer definitions is https://MY-FAKE-ACCOUNT.openai.azure.com.

Configure search engine access to Azure models

For pipeline and query execution, this tutorial uses Microsoft Entra ID for authentication and roles for authorization.

Assign yourself and the search service identity permissions on Azure OpenAI. The code for this tutorial runs locally. Requests to Azure OpenAI originate from your system. Also, search results from the search engine are passed to Azure OpenAI. For these reasons, both you and the search service need permissions on Azure OpenAI.

  1. Sign in to the Azure portal and find your search service.

  2. Configure Azure AI Search to use a system-managed identity.

  3. Find your Azure OpenAI resource.

  4. Select Access control (IAM) on the left menu.

  5. Select Add role assignment.

  6. Select Cognitive Services OpenAI User.

  7. Select Managed identity and then select Members. Find the system-managed identity for your search service in the dropdown list.

  8. Next, select User, group, or service principal and then select Members. Search for your user account and then select it from the dropdown list.

  9. Make sure you have two security principals assigned to the role.

  10. Select Review and Assign to create the role assignments.

For access to models on Azure AI Vision, assign Cognitive Services OpenAI User. For Azure AI Foundry, assign Azure AI Developer.

Use non-Azure models for embeddings

The pattern for integrating any embedding model is to wrap it in a custom skill and custom vectorizer. This section provides links to reference articles. For a code example that calls a non-Azure model, see custom-embeddings demo.

Client Embedding models Skill Vectorizer
Any Any custom skill custom vectorizer

Next step