Deploy model to NVIDIA Triton Inference Server

Intermediate
AI Engineer
Data Scientist
Azure
Azure Machine Learning

NVIDIA Triton Inference Server is a multi-framework, open-source software that is optimized for inference. It supports popular machine learning frameworks like TensorFlow, Open Neural Network Exchange (ONNX) Runtime, PyTorch, NVIDIA TensorRT, and more. It can be used for your CPU or GPU workloads. In this module, you deploy your production model to NVIDIA Triton server to perform inference on a cloud-hosted virtual machine.

Learning objectives

In this module, you learn how to:

  • Create an NVIDIA GPU Accelerated Virtual Machine.
  • Configure NVIDIA Triton Inference Server and related prerequisites.
  • Execute an inference workload on NVIDIA Triton Inference Server.

Prerequisites