How to change kube system pod images to pull from private registry

Azman Samad 5 Reputation points
2024-07-20T04:27:18.4833333+00:00

Issue: High nat gateway data transfer

We are running ephemeral workloads that creates 2000-3000 pods and corresponding to that autoscaler spins up around 500 odd nodes. Now even though we are pulling our application images from ACR via service endpoints , we still see a 60-70gb data transfer via nat gateway.

Our dynamic workloads run continuously , creates 500 nodes , nodes go down and come up again.

This has spiked up our nat gateway costs.

We suspect this is due to AKS data nodes pulling system pod images from microsoft container registry mcr.microsoft.com . Is there a way to pull these privately ?

Or do we have a way where we can store these images on ACR and modify kube-system daemonsets to pull images from our hosted ACR privately?

This is impacting all our customers. Please help.

Below are the microsoft images that i see on each node:

  • mcr.microsoft.com/aks/aks-node-ca-watcher

mcr.microsoft.com/aks/ip-masq-agent-v2

mcr.microsoft.com/aks/msi/addon-token-adapter

mcr.microsoft.com/azure-policy/policy-kubernetes-addon-prod

mcr.microsoft.com/azure-policy/policy-kubernetes-webhook

mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images

mcr.microsoft.com/azuremonitor/containerinsights/ciprod

mcr.microsoft.com/cbl-mariner/busybox

mcr.microsoft.com/containernetworking/azure-cni

mcr.microsoft.com/containernetworking/azure-cns

mcr.microsoft.com/containernetworking/azure-ipam

mcr.microsoft.com/containernetworking/azure-npm

mcr.microsoft.com/containernetworking/cni-dropgz

mcr.microsoft.com/mirror/docker/library/busybox

mcr.microsoft.com/oss/azure/secrets-store/provider-azure

mcr.microsoft.com/oss/calico/cni

mcr.microsoft.com/oss/calico/node

mcr.microsoft.com/oss/calico/pod2daemon-flexvol

mcr.microsoft.com/oss/cilium/cilium

mcr.microsoft.com/oss/kubernetes-csi/azuredisk-csi

mcr.microsoft.com/oss/kubernetes-csi/azurefile-csi

mcr.microsoft.com/oss/kubernetes-csi/blob-csi

mcr.microsoft.com/oss/kubernetes-csi/csi-node-driver-registrar

mcr.microsoft.com/oss/kubernetes-csi/livenessprobe

mcr.microsoft.com/oss/kubernetes-csi/secrets-store/driver

mcr.microsoft.com/oss/kubernetes/apiserver-network-proxy/agent

mcr.microsoft.com/oss/kubernetes/autoscaler/addon-resizer

mcr.microsoft.com/oss/kubernetes/autoscaler/cluster-proportional-autoscaler

mcr.microsoft.com/oss/kubernetes/azure-cloud-node-manager

mcr.microsoft.com/oss/kubernetes/coredns

mcr.microsoft.com/oss/kubernetes/kube-proxy

mcr.microsoft.com/oss/kubernetes/kube-state-metrics

mcr.microsoft.com/oss/kubernetes/metrics-server

mcr.microsoft.com/oss/kubernetes/pause

mcr.microsoft.com/oss/kubernetes/windows-gmsa-webhook

mcr.microsoft.com/oss/nvidia/k8s-device-plugin

mcr.microsoft.com/oss/open-policy-agent/gatekeeper

Azure Container Registry
Azure Container Registry
An Azure service that provides a registry of Docker and Open Container Initiative images.
461 questions
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,146 questions
Azure NAT Gateway
Azure NAT Gateway
NAT Gateway is a fully managed service that securely routes internet traffic from a private virtual network with enterprise-grade performance and low latency.
38 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Asaf Sofer 85 Reputation points Microsoft Employee
    2024-07-21T03:24:07.53+00:00

    Hello Azman ,

    As we know AKS service is managed by Microsoft.

    as part of the manage service, we are responsible to update all kube-system images security / code / patches etc. and making sure your always up to date.

    as part of that all FQDN below must be allowed:

    https://learn.microsoft.com/en-us/azure/aks/outbound-rules-control-egress#azure-global-required-fqdn--application-rules

    from other hand i can totally understand your concern, regrading huge load as you mentioned "60-70gb data transfer via nat gateway."

    Considering the impact on your solution and the complexity of it, best approach is to open ticket to AKS support in order to confirm if there can be tailor made WA in your specific requirements.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.