Hi Srinath NS,
Welcome to the Microsoft Q&A Platform! Thank you for asking your question here.
Based upon your question, when you are upgrading AKS nodes, the system gives a default time limit of 30 minutes to safely move any running tasks (pods) off the node before upgrading it. If there's a task that takes longer than 30 minutes to finish, the upgrade process will stop for that specific node. At that time, you have to increase this time limit with commands so that tasks get more time to complete before the upgrade moves ahead. it has to setup manually as below using command and also Please refer this document once Set node drain timeout value
# Set drain timeout for a new node pool
az aks nodepool add --name mynodepool --resource-group MyResourceGroup --cluster-name MyManagedCluster --drain-timeout 100
# Update drain timeout for an existing node pool
az aks nodepool update --name mynodepool --resource-group MyResourceGroup --cluster-name MyManagedCluster --drain-timeout 45 below is the document
If for some reason cluster fails a node upgrade fails, AKS will pause the entire process and won't continue upgrading other nodes until the problem is fixed. To help manage nodes automatically when something goes wrong, there is a feature called auto-repair, but it's turned off by default. You can enable it to make sure nodes get fixed without manual involvement.
For more information, please refer this document Azure Kubernetes Service (AKS) node auto-repair
I hope you got the clarity on this topic.!
If you found this information helpful, please click an accepting the answer and "Upvote" on my post for other community members reference