Introduction
NVIDIA Triton Inference Server is a multi-framework, open-source software that is optimized for inference. It supports popular machine learning frameworks like TensorFlow, ONNX Runtime, PyTorch, NVIDIA TensorRT, and more. NVIDIA Triton can be used to process inference for CPU or GPU workloads. In this module, you deploy your production model to NVIDIA Triton server to perform inference on a cloud-hosted virtual machine.
Prerequisites
Scenario: Deploy a production model to NVIDIA Triton Server for inference processing
You're a data scientist who is assigned the task of improving automation in a manufacturing facility using computer vision. Your team developed an Open Neural Network Exchange (ONNX) based object detection model using Azure Machine Learning studio and is ready to put that model into production. NVIDIA Triton Inference Server is chosen as the inference processor due to it's ability to run the ONNX format on CPU or GPU based hardware. Your team plans to target a cloud-hosted virtual machine to run the model, which allows you to perform inference on image frames that are received from the production environment.
What do you learn?
After you finish this module, you're able to:
- Create an NVIDIA GPU Accelerated Virtual Machine.
- Configure NVIDIA Triton Inference Server and related prerequisites.
- Execute an inference workload on NVIDIA Triton Inference Server.
What is the main goal?
This module shows you how to deploy a production model to NVIDIA Triton Inference Server for inference processing.