Design methodology for AI workloads on Azure
When you design AI workloads, the code and data that you integrate replaces deterministic behavior. This shift facilitates tasks like prediction, classification, and other functional goals. AI workload architecture can often be complex and must align with business constraints. The Azure Well-Architected Framework provides a solid foundation for architectural excellence, but you should also consider AI-specific design principles.
This article presents a design methodology based on AI principles. The AI principles systematically guide the design and optimization of solutions. Another benefit of the methodology is collaboration with product owners and stakeholders to justify technical decisions. If you need help making decisions, refer back to this methodology to align your design direction with high-level AI principles.
If you design a capability or introduce an improvement, evaluate the change from the methodology perspective. Does your change affect the user experience? Is your change flexible enough to adapt to future innovations? Does it disrupt the experimentation flow?
Design with an experimental mindset
Design with an experimental mindset so that you can achieve relevancy through iterative and statistically driven processes based on real-world use cases.
Experimentation in AI involves continuous adjustments with results that you can measure against quality targets after each iteration. Do an experimentation loop during the initial model evaluation and ongoing refinement. The inner loop refines a model's predictive power in a development environment. The outer loop monitors production usage and can trigger further refinement or data preparation. Both loops rely on continuous monitoring and evaluation to identify improvements.
Not every experiment succeeds. Consider worst-case scenarios, and have contingency plans for failed experiments.
Design responsibly
When users interact with your AI system, they place their trust in its ethical functionality, even if they don't understand the underlying logic and decision making of the AI model. This trust holds you accountable for designing a system that prevents unethical behaviors, such as manipulation, content toxicity, IP infringement, and fabricated responses. You should embed responsible AI principles in systems operations and in the culture of your team. The practices must extend throughout the lifecycle of user interaction—from the users' initial intent to use the system, during sessions, and even during disruptions caused by system errors.
Content moderation is a key strategy in responsible design of generative AI. Content moderation evaluates requests and responses in real-time to help ensure safety and appropriateness. As part of the experimentation loops, strive to make algorithms fair and inclusive to minimize bias. Bias can enter the system through various channels, including during actual sessions or when you gather feedback.
Ethical data management is central to responsible design. Make informed decisions about when to use or avoid relying on user data. Users trust you to ensure that personal information is either removed from the system or retained only with their consent. If retention is inevitable, ensure that you use trusted technology to protect data, which helps ensure privacy and security.
Design for explainability
AI model results must be explainable and justifiable. You should be able to trace the origins of the data, the inference processes, and the journey of data from its source to the serve layer. In discriminative AI, decisions can be justified with each step. In generative models, explainability can be complex. Document the decision-making process, both manually and through technical capabilities.
Explainable results help ensure the system's transparency and accountability to gain users' trust.
Stay ahead of model decay
Model decay is a unique challenge in AI that affects design decisions. The quality of AI model outputs can deteriorate over time without any changes to the code. Sometimes deterioration can even occur suddenly because of changes in data or external factors.
This deterioration affects various aspects of the system. These aspects include data-ingestion speed, data quality, monitoring needs, evaluation processes, and reaction times to correct problems. Implement early detection through a combination of automated processes for continuous monitoring and model evaluation. Take advantage of user feedback to help identify model decay.
Regardless of the methods that you use to identify model decay, the operations team must engage data scientists to promptly research and address potential decay problems.
Design for adaptability
AI is advancing at a rapid pace in terms of technological advancement and adoption. Be aware that what you build today might become obsolete quickly. Keep this consideration in mind when you make design decisions and create processes.
AI advancement emphasizes the need for agility and flexibility. Recognize that some components might have a limited lifespan. Adopt a pause-and-think approach that focuses on the research of model discovery, programming libraries and frameworks, and processing technologies.
Next step
Learn about design principles to build and operate AI workloads on Azure.