Tutorial: Get started with DeepSeek-R1 reasoning model in Azure AI model inference

Article
03/06/2025

In this tutorial, you learn:

How to create and configure the Azure resources to use DeepSeek-R1 model in Azure AI model inference.
How to configure the model deployment.
How to use DeepSeek-R1 using the Azure AI Inference SDK or REST APIs.
How to use DeepSeek-R1 using other SDKs.

Prerequisites

To complete this article, you need:

An Azure subscription. If you're using GitHub Models, you can upgrade your experience and create an Azure subscription in the process. Read Upgrade from GitHub Models to Azure AI model inference if that's your case.

Reasoning models

Reasoning models can reach higher levels of performance in domains like math, coding, science, strategy, and logistics. The way these models produces outputs is by explicitly using chain of thought to explore all possible paths before generating an answer. They verify their answers as they produce them which helps them to arrive to better more accurate conclusions. This means that reasoning models may require less context in prompting in order to produce effective results.

Such way of scaling model's performance is referred as inference compute time as it trades performance against higher latency and cost. It contrasts to other approaches that scale through training compute time.

Reasoning models then produce two types of outputs:

Reasoning completions
Output completions

Both of these completions count towards content generated from the model and hence, towards the token limits and costs associated with the model. Some models may output the reasoning content, like DeepSeek-R1. Some others, like o1, only outputs the output piece of the completions.

Create the resources

Azure AI model inference is a capability in Azure AI Services resources in Azure. You can create model deployments under the resource to consume their predictions. You can also connect the resource to Azure AI Hubs and Projects in Azure AI Foundry to create intelligent applications if needed. The following picture shows the high level architecture.

To create an Azure AI project that supports model inference for DeepSeek-R1, follow these steps:

Tip

You can also create the resources using Azure CLI or infrastructure as code with Bicep.

Go to Azure AI Foundry portal and log in with your account.
On the landing page, select Create project.
Give the project a name, for example "my-project".
In this tutorial, we create a brand new project under a new AI hub, hence, select Create new hub. Hubs are containers for multiple projects and allow you to share resources across all the projects.
Give the hub a name, for example "my-hub" and select Next.
The wizard updates with details about the resources that are going to be created. Select Azure resources to be created to see the details.

You can see that the following resources are created:

Property	Description
Resource group	The main container for all the resources in Azure. This helps get resources that work together organized. It also helps to have a scope for the costs associated with the entire project.
Location	The region of the resources that you're creating.
Hub	The main container for AI projects in Azure AI Foundry. Hubs promote collaboration and allow you to store information for your projects.
AI Services	The resource enabling access to the flagship models in Azure AI model catalog. In this tutorial, a new account is created, but Azure AI services resources can be shared across multiple hubs and projects. Hubs use a connection to the resource to have access to the model deployments available there. To learn how, you can create connections between projects and Azure AI Services to consume Azure AI model inference you can read Connect your AI project.

Select Create. The resources creation process starts.
Once completed, your project is ready to be configured.
Azure AI model inference is a Preview feature that needs to be turned on in Azure AI Foundry. At the top navigation bar, over the right corner, select the Preview features icon. A contextual blade shows up at the right of the screen.
Turn the feature Deploy models to Azure AI model inference service on.
Close the panel.

Add DeepSeek-R1 model deployment

Let's now create a new model deployment for DeepSeek-R1:

Go to Model catalog section in Azure AI Foundry portal and find the model DeepSeek-R1 model.
You can review the details of the model in the model card.
Select Deploy.
The wizard shows the model's terms and conditions for DeepSeek-R1, which is offered as a Microsoft first party consumption service. You can review our privacy and security commitments under Data, privacy, and Security.

Tip

Review the pricing details for the model by selecting Pricing and terms.
Accept the terms on those cases by selecting Subscribe and deploy.
You can configure the deployment settings at this time. By default, the deployment receives the name of the model you're deploying. The deployment name is used in the model parameter for request to route to this particular model deployment. This allows you to also configure specific names for your models when you attach specific configurations.
We automatically select an Azure AI Services connection depending on your project. Use the Customize option to change the connection based on your needs. DeepSeek-R1 is currently offered under the Global Standard deployment type which offers higher throughput and performance.
Select Deploy.
Once the deployment completes, the new model is listed in the page and it's ready to be used.

Use the model in playground

You can get started by using the model in the playground to have an idea of the model capabilities.

On the deployment details page, select Open in playground option in the top bar.
In the Deployment drop down, the deployment you created has been automatically selected.
Configure the system prompt as needed. In general, reasoning models don't use system messages in the same way that other types of models.
Type your prompt and see the outputs.
Additionally, you can use View code so see details about how to access the model deployment programmatically.

When building prompts for reasoning models, take the following into consideration:

Use simple instructions and avoid using chain-of-thought techniques.
Built-in reasoning capabilities make simple zero-shot prompts as effective as more complex methods.
When providing additional context or documents, like in RAG scenarios, including only the most relevant information may help preventing the model from over-complicating its response.
Reasoning models may support the use of system messages. However, they may not follow them as strictly as other non-reasoning models.
When creating multi-turn applications, consider only appending the final answer from the model, without it's reasoning content as explained at Reasoning content section.

Notice that reasoning models can take longer times to generate responses. They use long reasoning chains of thought that enabled deeper and more structured problem-solving. They also perform self-verification to cross-check its own answers and correct its own mistakes, showcasing emergent self-reflective behaviors.

Use the model in code

Use the Azure AI model inference endpoint and credentials to connect to the model:

You can use the Azure AI Inference package to consume the model in code:

Install the package azure-ai-inference using your package manager, like pip:

pip install azure-ai-inference

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

model = ChatCompletionsClient(
    endpoint="https://<resource>.services.ai.azure.com/models",
    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
)

Explore our samples and read the API reference documentation to get yourself started.

Install the package @azure-rest/ai-inference using npm:

npm install @azure-rest/ai-inference

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";

const client = new ModelClient(
    "https://<resource>.services.ai.azure.com/models", 
    new AzureKeyCredential(process.env.AZUREAI_ENDPOINT_KEY)
);

Explore our samples and read the API reference documentation to get yourself started.

Install the Azure AI inference library with the following command:

dotnet add package Azure.AI.Inference --prerelease

Import the following namespaces:

using Azure;
using Azure.Identity;
using Azure.AI.Inference;

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri("https://<resource>.services.ai.azure.com/models"),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);

Explore our samples and read the API reference documentation to get yourself started.

Add the package to your project:

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-inference</artifactId>
    <version>1.0.0-beta.1</version>
</dependency>

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

ChatCompletionsClient client = new ChatCompletionsClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("{endpoint}")
    .buildClient();

Explore our samples and read the API reference documentation to get yourself started.

Use the reference section to explore the API design and which parameters are available. For example, the reference section for Chat completions details how to use the route /chat/completions to generate predictions based on chat-formatted instructions. Notice that the path /models is included to the root of the URL:

Request

POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
api-key: <api-key>
Content-Type: application/json

from azure.ai.inference.models import SystemMessage, UserMessage

response = client.complete(
    messages=[
        UserMessage(content="How many languages are in the world?"),
    ],
    model="DeepSeek-R1"
)

print(response.choices[0].message.content)

var messages = [
    { role: "user", content: "How many languages are in the world?" },
];

var response = await client.path("/chat/completions").post({
    body: {
        messages: messages,
        model: "DeepSeek-R1"
    }
});

console.log(response.choices[0].message.content)

requestOptions = new ChatCompletionsOptions()
{
    Messages = {
        new ChatRequestUserMessage("How many languages are in the world?")
    },
    Model = "DeepSeek-R1"
};

response = client.Complete(requestOptions);
Console.WriteLine($"Response: {response.Value.Content}");

List<ChatRequestMessage> chatMessages = new ArrayList<>();
chatMessages.add(new ChatRequestUserMessage("How many languages are in the world?"));

ChatCompletions chatCompletions = client.complete(new ChatCompletionsOptions(chatMessages, "DeepSeek-R1"));

for (ChatChoice choice : chatCompletions.getChoices()) {
    ChatResponseMessage message = choice.getMessage();
    System.out.println("Response:" + message.getContent());
}

Request

POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
api-key: <api-key>
Content-Type: application/json

{
    "messages": [
        {
            "role": "user",
            "content": "How many languages are in the world?"
        }
    ],
    "model": "DeepSeek-R1"
}

Reasoning may generate longer responses and consume a larger amount of tokens. You can see the rate limits that apply to DeepSeek-R1 models. Consider having a retry strategy to handle rate limits being applied. You can also request increases to the default limits.

Reasoning content

Some reasoning models, like DeepSeek-R1, generate completions and include the reasoning behind it. The reasoning associated with the completion is included in the response's content within the tags <think> and </think>. The model may select on which scenarios to generate reasoning content. You following example shows how to do it in Python:

import re

match = re.match(r"<think>(.*?)</think>(.*)", response.choices[0].message.content, re.DOTALL)

print("Response:", )
if match:
    print("\tThinking:", match.group(1))
    print("\tAnswer:", match.group(2))
else:
    print("\tAnswer:", response.choices[0].message.content)
print("Model:", response.model)
print("Usage:")
print("\tPrompt tokens:", response.usage.prompt_tokens)
print("\tTotal tokens:", response.usage.total_tokens)
print("\tCompletion tokens:", response.usage.completion_tokens)

Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer. Let's start by recalling the general consensus from linguistic sources. I remember that the number often cited is around 7,000, but maybe I should check some reputable organizations.\n\nEthnologue is a well-known resource for language data, and I think they list about 7,000 languages. But wait, do they update their numbers? It might be around 7,100 or so. Also, the exact count can vary because some sources might categorize dialects differently or have more recent data. \n\nAnother thing to consider is language endangerment. Many languages are endangered, with some having only a few speakers left. Organizations like UNESCO track endangered languages, so mentioning that adds context. Also, the distribution isn't even. Some countries have hundreds of languages, like Papua New Guinea with over 800, while others have just a few. \n\nA user might also wonder why the exact number is hard to pin down. It's because the distinction between a language and a dialect can be political or cultural. For example, Mandarin and Cantonese are considered dialects of Chinese by some, but they're mutually unintelligible, so others classify them as separate languages. Also, some regions are under-researched, making it hard to document all languages. \n\nI should also touch on language families. The 7,000 languages are grouped into families like Indo-European, Sino-Tibetan, Niger-Congo, etc. Maybe mention a few of the largest families. But wait, the question is just about the count, not the families. Still, it's good to provide a bit more context. \n\nI need to make sure the information is up-to-date. Let me think – recent estimates still hover around 7,000. However, languages are dying out rapidly, so the number decreases over time. Including that note about endangerment and language extinction rates could be helpful. For instance, it's often stated that a language dies every few weeks. \n\nAnother point is sign languages. Does the count include them? Ethnologue includes some, but not all sources might. If the user is including sign languages, that adds more to the count, but I think the 7,000 figure typically refers to spoken languages. For thoroughness, maybe mention that there are also over 300 sign languages. \n\nSummarizing, the answer should state around 7,000, mention Ethnologue's figure, explain why the exact number varies, touch on endangerment, and possibly note sign languages as a separate category. Also, a brief mention of Papua New Guinea as the most linguistically diverse country. \n\nWait, let me verify Ethnologue's current number. As of their latest edition (25th, 2022), they list 7,168 living languages. But I should check if that's the case. Some sources might round to 7,000. Also, SIL International publishes Ethnologue, so citing them as reference makes sense. \n\nOther sources, like Glottolog, might have a different count because they use different criteria. Glottolog might list around 7,000 as well, but exact numbers vary. It's important to highlight that the count isn't exact because of differing definitions and ongoing research. \n\nIn conclusion, the approximate number is 7,000, with Ethnologue being a key source, considerations of endangerment, and the challenges in counting due to dialect vs. language distinctions. I should make sure the answer is clear, acknowledges the variability, and provides key points succinctly.

Answer: The exact number of languages in the world is challenging to determine due to differences in definitions (e.g., distinguishing languages from dialects) and ongoing documentation efforts. However, widely cited estimates suggest there are approximately **7,000 languages** globally.
Model: DeepSeek-R1
Usage: 
  Prompt tokens: 11
  Total tokens: 897
  Completion tokens: 886

Parameters

In general, reasoning models don't support the following parameters you can find in chat completion models:

Temperature
Presence penalty
Repetition penalty
Parameter top_p

Share via

Tutorial: Get started with DeepSeek-R1 reasoning model in Azure AI model inference

Prerequisites

Reasoning models

Create the resources

Add DeepSeek-R1 model deployment

Use the model in playground

Use the model in code

Reasoning content

Parameters

Feedback

Additional resources

Share via

Tutorial: Get started with DeepSeek-R1 reasoning model in Azure AI model inference

Prerequisites

Reasoning models

Create the resources

Add DeepSeek-R1 model deployment

Use the model in playground

Use the model in code

Reasoning content

Parameters

Related content

Feedback

Additional resources