Design principles of intelligent application workloads

Article
09/11/2024

Guidance around planning, developing, and maintaining intelligent application workloads is built on Power Platform Well-Architected and its five pillars of architectural excellence.

Well-Architected pillar	Summary
Reliability	An intelligent application workload requires resilience at the architecture layer to ensure AI models and workflows are highly available and can recover quickly from failure. Implement robust error-handling mechanisms. A resilient architecture also maintains the integrity of data used by the AI models, ensuring consistent and accurate outputs.
Security	An intelligent application workload often handles sensitive data. Safeguard sensitive data used and generated by AI models. Implement encryption, access controls, and regular security audits. Ensure the workload complies with relevant regulation standards, such as GDPR (General Data Protection Regulation) and HIPAA (Health Insurance Portability and Accountability Act), to protect user privacy and data.
Performance efficiency	An intelligent application workload must be designed to scale seamlessly with increasing data volumes and user demands. Identify key performance metrics and implement monitoring to track progress toward achieving workload performance goals. In the context of intelligent application workloads, performance also takes into account the number of requests and interactions that can be completed through self-service, which would otherwise require human intervention.
Operational excellence	An intelligent application workload requires comprehensive monitoring and logging to track the performance and health of AI models, workflows, and conversations. Monitoring helps to quickly identify and resolve issues. The Operational Excellence pillar recommends using automation to streamline operations, reduce manual intervention, and minimize the risk of human error.
Experience optimization	An intelligent application workload should prioritize conversation design to ensure a user-friendly experience that enables users to achieve their goals with minimal effort. The design should account for topics the generative AI can't handle and incorporate fallback mechanisms. Also implement mechanisms to collect user feedback and continuously refine the AI models and workload based on this feedback.

Reliability

When you design an intelligent application workload with Power Platform, focus on resilience and availability.

Resilience is the ability of a system to recover from failures and continue to function.
Availability ensures uninterrupted uptime. High availability minimizes application downtime and enhances recovery from incidents.

Reliability is important in the development of any workload, and generative AI is no exception. In fact, there are unique factors to consider when engineering generative AI workloads. Recognizing and emphasizing resilience is essential for generative AI workloads to ensure organizational availability and maintain business continuity.

Failures can happen in the cloud. Instead of trying to prevent failures altogether, your goal should be to minimize the effects of a single failing component. Use the following information to minimize downtime and ensure that recommended practices for high availability are built into your intelligent application workload:

Ensure the workload can handle failures and continue to operate, even if at reduced functionality. Identify potential faults and make the system resilient, to tolerate and recover from these faults.
Make the workload observable so that development teams learn from failures. Quickly identify and address issues by implementing monitoring, logging, and alerting mechanisms.
Ensure the workload can scale to handle varying loads, especially important for AI workloads that might have fluctuating demands.
Implement robust error handling and recovery mechanisms. Set up automated alerts for system failures and have a clear plan for quick recovery.
Validate the target architecture and scale by understanding the target volumes of chat messages or conversations. Target volumes also help validate the licensing aspects of the intelligent application and the potential effect on Dataverse storage for conversation transcripts.

For intelligent applications that use generative AI capabilities, consider not only resilience and availability, but also the reliability and accuracy of the responses provided by the intelligent workload. Consider the following recommendations for each design consideration:

Optimize for Retrieval Augmented Generation (RAG): Ensure your data is clean and well-structured, create efficient embeddings and indexes for quick retrieval, and implement robust monitoring and feedback mechanisms to continuously improve the workload's performance.
Effective prompts: Design precise and contextually relevant prompts to guide the AI to produce accurate responses.
Regular evaluation: Implement continuous monitoring and testing of AI outputs to assess accuracy, relevance, and ethical adherence.
Feedback loops: Establish feedback mechanisms where users can report inaccuracies, which can then be used to refine and improve the models. Microsoft Copilot Studio provides customer satisfaction analytics, which provide actionable insights on drivers of satisfaction or dissatisfaction with your agent's responses.
Domain-specific training: Fine-tune models on domain-specific data to enhance accuracy in specific contexts.
Regular updates: Periodically update models with new data to maintain their relevance and accuracy.
Unrecognized intents: Handle unrecognized intents by using Generative answers to find answers from available data sources and by using the Fallback topic to integrate with other systems.

Security

In a shared-responsibility model:

Organizations are primarily responsible for managing and operating workloads.
Microsoft manages the security of the underlying infrastructure, including data centers, network security, and physical security measures and built-in security features such as encryption, identity management, and compliance with industry standards. Learn more in Security in Microsoft Power Platform and Copilot Studio security and governance.

We recommend that you regularly assess the services and technologies to ensure that your security posture adapts to the evolving threat landscape. Establishing a clear understanding of the shared responsibility model with vendors is essential when collaborating to implement security measures.

You can employ several methods to secure your intelligent application workloads:

User authentication and access control: Implement robust authentication and access control measures to ensure only authorized users can access the intelligent application workload. Unauthorized access to the intelligent application workload can result in data breaches, misuse of resources, and potential exposure of sensitive information. Weak or ineffective authentication mechanisms might also result in compromised user accounts.
Compliance: Ensure that data is protected and managed in compliance with regulatory requirements. Understand local regulations, and stay informed about local data protection laws and ensure that your data residency strategy complies with these regulations.
Integration: Secure all integrations with service principals. Monitor and protect the network integrity of internal and external endpoints through security capabilities and appliances, such as firewalls or web application firewalls.
Ongoing monitoring and auditing: Continuously monitor and audit the workloads activities to detect and respond proactively.
Azure security tools: Use Azure's built-in security tools, such as Microsoft Defender for Cloud and Azure Policy, to monitor and enforce security policies.
Employee training: Train employees on data protection best practices and the importance of adhering to data residency requirements.

Performance efficiency

Performance efficiency is the ability of your workload to efficiently scale to meet the demands placed on it by users.

Increase performance efficiency by:

Understanding target volumes to validate the target architecture and scale. Target volumes also help validate the licensing aspects of the generative AI (agent) and the potential effect on Dataverse storage for conversation transcripts.
Understanding platform limits. When you integrate your intelligent application workload with external systems, for example through Power Automate or HTTP requests, it's important to validate that every component can handle the load.
Continuously monitoring performance and detecting anomalies by using tools such as Azure Monitor, Log Analytics, Application Insights, and alerts.
Understanding the expected response times for:
- First chat load and first message response
- Maximum latency for the agent to answer user queries
- Approach for handling long-running actions (for example, waiting for an external system to return data)
Optimizing the deflection rate, or the rate at which requests are completed in a self-service fashion due to automation (reducing the number of requests that require human assistance). Learn more in Performance optimization for intelligent application workloads.

Considering each of these aspects helps you build an intelligent application workload with a consistent, cohesive user experience.

Operational excellence

Operational excellence involves developing efficient processes to support your intelligent application workload.

Operational failures can affect other design areas as well as the overall success of the intelligent application workload. It's important to tailor your operational processes to support an intelligent application workload in production. The following recommendations drive operational excellence:

Automate build and release processes. Fully automated build and release processes reduce friction and increase the velocity of deploying updates, bringing repeatability and consistency across environments. Automation shortens the feedback loop, from developers pushing changes, to obtaining insights on code quality, test coverage, resiliency, security, and performance, all of which contribute to developer productivity.
Maintain governance and compliance.
Analyze your environment's performance and health in production.
Maintain documentation that captures:
- Troubleshooting procedures
- Disaster-recovery plans
Provide remediation guidance on how to accelerate the process of resolving problems.
Embrace continuous operational improvement. Prioritize routine improvement of the system and user experience. Use a health model to understand and measure operational efficiency, together with feedback mechanisms to enable application teams to understand and address gaps in an iterative manner.

These recommendations can help your team collaborate in a way that's efficient and transparent.

Experience optimization

An intelligent application workload should prioritize conversation design to ensure a user-friendly experience that enables users to achieve their goals with minimal effort. The design should address topics that the generative AI can't handle and include fallback mechanisms. Also implement mechanisms to collect user feedback and continuously refine the AI models and workload based on this feedback.

Optimizing the user experience for an intelligent application workload involves several key considerations:

Conversation design: Design conversations that are intuitive and easy to navigate. Use clear and concise language, and ensure that the AI can handle common user queries effectively. Focus on helping users achieve their goals with minimal effort. Understand user intents and provide relevant responses quickly to ensure a seamless and efficient user experience.
Handling limitations: Implement fallback mechanisms for topics the generative AI can't handle, such as redirecting users to customer service representatives or providing alternative resources. Design robust error-handling processes to manage unexpected inputs gracefully. Inform users when the AI is unable to process their request and offer alternatives.
User feedback: Integrate mechanisms to gather user feedback continuously. Microsoft Copilot Studio provides customer satisfaction analytics that provide actionable insights on drivers of satisfaction or dissatisfaction with your agent's responses. Use the collected feedback to refine and improve the AI models and overall workload. Regular updates based on user input can significantly enhance the user experience.
Customization and personalization: Customize prompts and instructions to align with your specific use cases and user needs, to ensure more accurate and relevant responses. Use dynamic chaining to automate triggers and manage topic flows efficiently to reduce the need for manually predefined topics and improve the AI's ability to recognize user intent. Learn more in Optimize prompts and topic configuration.

Next steps

The Well-Architected Framework design principles are incorporated into intelligent application workload design areas. Each design area provides targeted guidance to help you quickly access the information you need to improve productivity efficiently.

Start by reviewing the design considerations that are needed to support a workload:

Share via