Generative AI

Introduction

This section serves as a comprehensive guide for launching, operating, and enhancing Generative AI applications in a production environment. It encapsulates essential capabilities, from delivering innovative experiences to ensuring robust security and privacy, and managing the lifecycle of AI solutions. Navigating the complexities of distributed systems is a formidable task, further compounded by the unpredictable nature of Generative AI models. This section aims to consolidate crucial information and best practices, making them readily accessible and comprehensible. It includes links to more resources, aiding diverse teams across numerous disciplines to access the insights they need for success.

Generative AI solutions

Large Language Models (LLMs) are the core of enterprise Generative AI applications. They can process and generate natural language, but they require more components to handle user interactions, security, and other functionality to respond to or act on user inputs. The collection of these components and services that form a functional solution is called a Generative AI application. A best practice when developing a Gen AI application is to follow a standard AI Lifecycle, while utilizing Large Language Model Operations (LLMOps) tools and processes to facilitate and simplify the steps.

Generative AI Application Stack Figure 1: Generative AI Application Stack

The absence of a unified tool set for overseeing the development of individual components and services means that creating a comprehensive end-to-end solution demands the use of connective code and custom functions. These tools are essential for seamlessly integrating diverse products and services into a high-quality Generative AI application tailored for enterprise use.

The sections below will explore each of these components along with the tools and processes that help our customers develop, deploy, and maintain a Generative AI solution. We focus on the top use cases that ISE is seeing with our customers which are predominantly language-to-language or language-to-action solutions. The underlying components are similar for image generation, but we do not explore the differences here.

AI lifecycle

We will base our capabilities in the context of the Gen AI application lifecycle that has standard stages for custom ML, and Large Language Model solutions. This lifecycle represents the typical iterative approach to preparing, deploying, and improving a Gen AI application over time.

AI solution Lifecycle Figure 2: AI solution lifecycle: Data Science and ML Engineering

Enterprise Generative AI application

Managed services

Managed services enable access to Large Language Models and provide built-in services to adapt the models to specific use cases and may include capabilities for integrating with existing tooling and infrastructure. Enterprise deployments also require services to manage the ML lifecycle (MLOps). Examples of Microsoft’s Generative AI and ML lifecycle managed services include Azure AI Search, Azure OpenAI Service and Azure ML Service.

AI solution framework

The backend system orchestrates the data workflow in a large language model application by facilitating access to Language Model (LLMs), overseeing data processing, and enabling cognitive skills. It breaks down tasks and makes calls to agents or libraries, enhancing the capabilities of the LLM. Examples of such integrations include Semantic Kernel and LangChain.

Client applications

Applications provide the “canvas” for how the user will interact with the solution, and all the supporting functions needed to make the solution useful. The most common examples of user interfaces are chat-based. Application examples and frontend tooling include Microsoft Co-pilot, Teams, and Power Virtual Agents (PVAs).

Deployment monitoring and observability

It involves tracking the performance and health of a Generative AI application. It includes collecting metrics, logs, and traces to gain visibility into the system’s operations and to understand its state at any given moment. In Generative AI, monitoring could mean to monitor the model’s performance, data throughput, and response times to ensure the system is functioning as intended.

Security and privacy

Security and privacy are critical aspects that encompass protecting the Generative AI application from unauthorized access and ensuring the confidentiality, integrity, and availability of data. It also involves safeguarding the privacy of the data used by the application, which includes implementing measures to prevent data breaches and leaks and ensuring compliance with data protection regulations. See LLM Security Recommendations and LLM Application Security Plan for more information.

Data platform

It is the underlying infrastructure that supports the storage, processing, and analysis of large volumes of data used by Generative AI applications. It includes databases, data lakes, and other storage systems, as well as the tools and services for data ingestion, transformation, and querying. A robust data platform is essential for prompt engineering, fine-tuning and operating Generative AI models effectively.

LLMOps

Large Language Model Operations (LLMOps) act as the orchestrator that manages all above components cohesively. LLMOps refers to the set of practices, techniques, and tools that facilitate the development, integration, testing, release, deployment, and monitoring of LLM-based applications. It establishes the operational framework, ensuring a smooth and efficient interplay among these elements throughout the application lifecycle. LLMOps ensures that LLMs remain reliable, efficient, and up to date as they are integrated into Generative AI applications.

These components work together to create a stable, secure, and efficient environment for Generative AI applications to develop, deploy, operate, and evolve.