Best practices for cost optimization in Azure Kubernetes Service (AKS)
Cost optimization is about maximizing the value of resources while minimizing unnecessary expenses within your cloud environment. This process involves identifying cost effective configuration options and implementing best practices to improve operational efficiency. An AKS environment can be optimized to minimize cost while taking into account performance and reliability requirements.
In this article, you learn about:
- Holistic monitoring and FinOps practices.
- Strategic infrastructure selection.
- Dynamic rightsizing and autoscaling.
- Leveraging Azure discounts for substantial savings.
Embrace FinOps to build a cost saving culture
Financial operations (FinOps) is a discipline that combines financial accountability with cloud management and optimization. It focuses on driving alignment between finance, operations, and engineering teams to understand and control cloud costs. The FinOps foundation has several notable projects, such as the FinOps Framework and the FOCUS Specification.
For more information, see What is FinOps?
Prepare the application environment
Evaluate SKU family
It's important to evaluate the resource requirements of your application before deployment. Small development workloads have different infrastructure needs than large production ready workloads. While a combination of CPU, memory, and networking capacity configurations heavily influences the cost effectiveness of a SKU, consider the following virtual machine (VM) types:
SKU family | Description | Use case |
---|---|---|
Azure Spot Virtual Machines | Azure Spot Virtual machine scale sets back Spot node pools and deployed to a single fault domain with no high availability or service-level agreement (SLA) guarantees. Spot VMs allow you to take advantage of unutilized Azure capacity with significant discounts (up to 90%, as compared to pay-as-you-go prices). If Azure needs capacity back, the Azure infrastructure evicts the Spot nodes. | Best for dev/test environments, workloads that can handle interruptions such as batch processing jobs, and workloads with flexible execution time. |
Ampere Altra Arm-based processors (Arm64) | Arm64 VMs are power-efficient and cost effective but don't compromise on performance. With Arm64 node pool support in AKS, you can create Arm64 Ubuntu agent nodes and even mix Intel and ARM architecture nodes within a cluster. These ARM VMs are engineered to efficiently run dynamic, scalable workloads and can deliver up to 50% better price-performance than comparable x86-based VMs for scale-out workloads. | Best for web or application servers, open-source databases, cloud-native applications, gaming servers, and more. |
GPU optimized SKUs | Depending on the nature of your workload, consider using compute optimized, memory optimized, storage optimized, or even graphical processing unit (GPU) optimized VM SKUs. GPU VM sizes are specialized VMs that are available with single, multiple, and fractional GPUs. | GPU-enabled Linux node pools on AKS are best for compute-intensive workloads like graphics rendering, large model training, and inferencing. |
Note
The cost of compute varies across regions. When picking a less expensive region to run workloads, be conscious of the potential impact of latency as well as data transfer costs. To learn more about VM SKUs and their characteristics, see Sizes for virtual machines in Azure.
Review storage options
For more information on storage options and related cost considerations, see the following articles:
- Best practices for storage and backups in Azure Kubernetes Service (AKS)
- Storage options for applications in Azure Kubernetes Service (AKS)
Use cluster preset configurations
It can be difficult to pick the right VM SKU, regions, number of nodes, and other configuration options. Cluster preset configurations in the Azure portal offloads this initial challenge by providing recommended configurations for different application environments that are cost-conscious and performant. The Dev/Test preset is best for developing new workloads or testing existing workloads. The Production Economy preset is best for serving production traffic in a cost-conscious way if your workloads can tolerate interruptions. Noncritical features are off by default, and the preset values can be modified at any time.
Consider multitenancy
AKS offer flexibility in how you run multitenant clusters and isolate resources. For friendly multitenancy, you can share clusters and infrastructure across teams and business units through logical isolation. Kubernetes Namespaces form the logical isolation boundary for workloads and resources. Sharing infrastructure reduces cluster management overhead while also improving resource utilization and pod density within the cluster. To learn more about multitenancy on AKS and to determine if it's right for your organizational needs, see AKS considerations for multitenancy and Design clusters for multitenancy.
Warning
Kubernetes environments aren't entirely safe for hostile multitenancy. If any tenant on the shared infrastructure can't be trusted, more planning is needed to prevent tenants from impacting the security of other services.
Consider physical isolation boundaries. In this model, teams or workloads are assigned to their own cluster. Added management and financial overhead will be a tradeoff.
Build cloud native applications
Make your container as lean as possible
A lean container refers to optimizing the size and resource footprint of the containerized application. Check that your base image is minimal and only contains the necessary dependencies. Remove any unnecessary libraries and packages. A smaller container image accelerates deployment times and increases the efficiency of scaling operations. Artifact Streaming on AKS allows you to stream container images from Azure Container Registry (ACR). It pulls only the necessary layer for initial pod startup, reducing the pull time for larger images from minutes to seconds.
Enforce resource quotas
Resource quotas provide a way to reserve and limit resources across a development team or project. Quotas are defined on a namespace and can set on compute resources, storage resources, and object counts. When you define resource quotas, it prevents individual namespaces from consuming more resources than allocated. Resource quotas are useful for multitenant clusters where teams are sharing infrastructure.
Use cluster start/stop
When left unattended, small development/test clusters can accrue unnecessary costs. You can turn off clusters that don't need to run at all times using the cluster start and stop feature. This feature shuts down all system and user node pools so you don't pay for extra compute. The state of your cluster and objects is maintained when you start the cluster again.
Use capacity reservations
Capacity reservations allow you to reserve compute capacity in an Azure region or availability zone for any duration of time. Reserved capacity is available for immediate use until the reservation is deleted. Associating an existing capacity reservation group to a node pool guarantees allocated capacity for your node pool and helps you avoid potential on-demand pricing spikes during periods of high compute demand.
Monitor your environment and spend
Increase visibility with Microsoft Cost Management
Microsoft Cost Management offers a broad set of capabilities to help with cloud budgeting, forecasting, and visibility for costs both inside and outside of the cluster. Proper visibility is essential for deciphering spending trends, identifying optimization opportunities, and increasing accountability among application developers and platform teams. Enable the AKS Cost Analysis add-on for granular cluster cost breakdown by Kubernetes constructs along with Azure Compute, Network, and Storage categories.
Azure Monitor
If you're ingesting metric data via Container insights, we recommend migrating to managed Prometheus, which offers a significant cost reduction. You can disable Container insights metrics using the data collection rule (DCR) and deploy the managed Prometheus add-on, which supports configuration via Azure Resource Manager, Azure CLI, Azure portal, and Terraform.
For more information, see Azure Monitor best practices and managing costs for Container insights.
Log Analytics
For control plane logs, consider disabling the categories you don't need and/or using the Basic Logs API when applicable to reduce Log Analytics costs. For more information, see Azure Kubernetes Service (AKS) control plane/resource logs. For data plane logs, or application logs, consider adjusting the cost optimization settings.
Optimize workloads through autoscaling
Establish a baseline
Before configuring your autoscaling settings, you can use Azure Load Testing to establish a baseline for your application. Load testing helps you understand how your application behaves under different traffic conditions and identify performance bottlenecks. Once you have a baseline, you can configure autoscaling settings to ensure your application can handle the expected load.
Enable application autoscaling
Vertical pod autoscaling
Requests and limits that are higher than actual usage can result in overprovisioned workloads and wasted resources. In contrast, requests and limits that are too low can result in throttling and workload issues due to lack of memory. The Vertical Pod Autoscaler (VPA) allows you to fine-tune CPU and memory resources required by your pods. VPA provides recommended values for CPU and memory requests and limits based on historical container usage, which you can set manually or update automatically. Best for applications with fluctuating resource demands.
Horizontal pod autoscaling
The Horizontal Pod Autoscaler (HPA) dynamically scales the number of pod replicas based on observed metrics, such as CPU or memory utilization. During periods of high demand, HPA scales out, adding more pod replicas to distribute the workload. During periods of low demand, HPA scales in, reducing the number of replicas to conserve resources. Best for applications with predictable resource demands.
Warning
You shouldn't use the VPA with the HPA on the same CPU or memory metrics. This combination can lead to conflicts, as both autoscalers attempt to respond to changes in demand using the same metrics. However, you can use the VPA for CPU or memory with the HPA for custom metrics to prevent overlap and ensure that each autoscaler focuses on distinct aspects of workload scaling.
Kubernetes event-driven autoscaling
The Kubernetes Event-driven Autoscaler (KEDA) add-on provides extra flexibility to scale based on various event-driven metrics that align with your application behavior. For example, for a web application, KEDA can monitor incoming HTTP request traffic and adjust the number of pod replicas to ensure the application remains responsive. For processing jobs, KEDA can scale the application based on message queue length. Managed support is provided for all Azure Scalers.
Enable infrastructure autoscaling
Cluster autoscaling
To keep up with application demand, the Cluster Autoscaler watches for pods that can't be scheduled due to resource constraints and scales the number of nodes in the node pool accordingly. When nodes don't have running pods, the Cluster Autoscaler scales down the number of nodes. The Cluster Autoscaler profile settings apply to all autoscaler-enabled node pools in a cluster. For more information, see Cluster Autoscaler best practices and considerations.
Node autoprovisioning
Complicated workloads might require several node pools with different VM size configurations to accommodate CPU and memory requirements. Accurately selecting and managing several node pool configurations adds complexity and operational overhead. Node Autoprovision (NAP) simplifies the SKU selection process and decides the optimal VM configuration based on pending pod resource requirements to run workloads in the most efficient and cost effective manner.
Note
For more information on scaling best practices, see Performance and scaling for small to medium workloads in Azure Kubernetes Service (AKS) and Performance and scaling best practices for large workloads in Azure Kubernetes Service (AKS).
Save with Azure discounts
Azure Reservations
If your workload is predictable and exists for an extended period of time, consider purchasing an Azure Reservation to further reduce your resource costs. Azure Reservations operate on a one-year or three-year term, offering up to 72% discount as compared to pay-as-you-go prices for compute. Reservations automatically apply to matching resources. Best for workloads that are committed to running in the same SKUs and regions over an extended period of time.
Azure Savings Plan
If you have consistent spend, but your use of disparate resources across SKUs and regions makes Azure Reservations infeasible, consider purchasing an Azure Savings Plan. Like Azure Reservations, Azure Savings Plans operate on a one-year or three-year term and automatically apply to any resources within benefit scope. You commit to spend a fixed hourly amount on compute resources irrespective of SKU or region. Best for workloads that utilize different resources and/or different data center regions.
Azure Hybrid Benefit
Azure Hybrid Benefit for Azure Kubernetes Service (AKS) allows you to maximize your on-premises licenses at no extra cost. Use any qualifying on-premises licenses that also have an active Software Assurance (SA) or a qualifying subscription to get Windows VMs on Azure at a reduced cost.
Next steps
Cost optimization is an ongoing and iterative effort. Learn more by reviewing the following recommendations and architecture guidance:
Azure Kubernetes Service