How to trace your application with Azure AI Inference SDK

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

In this article you'll learn how to trace your application with Azure AI Inference SDK with your choice between using Python, JavaScript, or C#. The Azure AI Inference client library provides support for tracing with OpenTelemetry.

Enable trace in your application

Prerequisites

Installation

Install the package azure-ai-inference using your package manager, like pip:

  pip install azure-ai-inference[opentelemetry] 

Install the Azure Core OpenTelemetry Tracing plugin, OpenTelemetry, and the OTLP exporter for sending telemetry to your observability backend. To install the necessary packages for Python, use the following pip commands:

pip install opentelemetry 

pip install opentelemetry-exporter-otlp 

To learn more about Azure AI Inference SDK for Python and observability, see Tracing via Inference SDK for Python.

To learn more , see the Inference SDK reference.

Configuration

You need to add following configuration settings as per your use case:

  • To capture prompt and completion contents, set the AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED environment variable to true (case insensitive). By default, prompts, completions, function names, parameters, or outputs aren't recorded.

  • To enable Azure SDK tracing, set the AZURE_SDK_TRACING_IMPLEMENTATION environment variable to opentelemetry. Alternatively, you can configure it in the code with the following snippet:

    from azure.core.settings import settings 
    
    settings.tracing_implementation = "opentelemetry" 
    

    To learn more, see Azure Core Tracing OpenTelemetry client library for Python.

Enable Instrumentation

The final step is to enable Azure AI Inference instrumentation with the following code snippet:

from azure.ai.inference.tracing import AIInferenceInstrumentor 

# Instrument AI Inference API 

AIInferenceInstrumentor().instrument() 

It's also possible to uninstrument the Azure AI Inferencing API by using the uninstrument call. After this call, the traces will no longer be emitted by the Azure AI Inferencing API until instrument is called again:

AIInferenceInstrumentor().uninstrument() 

Tracing your own functions

To trace your own custom functions, you can leverage OpenTelemetry, you'll need to instrument your code with the OpenTelemetry SDK. This involves setting up a tracer provider and creating spans around the code you want to trace. Each span represents a unit of work and can be nested to form a trace tree. You can add attributes to spans to enrich the trace data with additional context. Once instrumented, configure an exporter to send the trace data to a backend for analysis and visualization. For detailed instructions and advanced usage, refer to the OpenTelemetry documentation. This will help you monitor the performance of your custom functions and gain insights into their execution.

Attach User feedback to traces

To attach user feedback to traces and visualize them in Azure AI Foundry portal using OpenTelemetry's semantic conventions, you can instrument your application enabling tracing and logging user feedback. By correlating feedback traces with their respective chat request traces using the response ID, you can use view and manage these traces in Azure AI Foundry portal. OpenTelemetry's specification allows for standardized and enriched trace data, which can be analyzed in Azure AI Foundry portal for performance optimization and user experience insights. This approach helps you use the full power of OpenTelemetry for enhanced observability in your applications.

  • Python samples containing fully runnable Python code for tracing using synchronous and asynchronous clients.
  • JavaScript samples containing fully runnable JavaScript code for tracing using synchronous and asynchronous clients.
  • C# Samples containing fully runnable C# code for doing inference using synchronous and asynchronous methods.