Troubleshoot Node Not Ready failures that are followed by recoveries

Artikkeli
12/11/2024

This article provides a guide to troubleshoot and resolve "Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "Not Ready" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions.

Cause

There are several scenarios that could cause a "Not Ready" state to occur:

The unavailability of the API server. This causes the readiness probe to fail. This prevents the pod from being attached to the service so that traffic is no longer forwarded to the pod instance.
Virtual machine (VM) host faults. To determine whether VM host faults occurred, check the following information sources:
- AKS diagnostics
- Azure status
- Azure notifications (for any recent outages or maintenance periods)

Resolution

Check the API server availability by running the kubectl get apiservices command. Make sure that the readiness probe is correctly configured in the deployment YAML file.

For further steps, see Basic troubleshooting of Node Not Ready failures.

Prevention

To prevent this issue from occurring in the future, take one or more of the following actions:

Make sure that your service tier is fully paid for.
Reduce the number of watch and get requests to the API server.

Jaa

Troubleshoot Node Not Ready failures that are followed by recoveries

Cause

Resolution

Prevention

Palaute

Lisäresursseja