Hyper-V 2016, 5120 CSV has entered a paused state due to status_io_timeout

Ryan 1 Reputation point
2021-10-30T02:54:47.27+00:00

Looking for help as our issue reoccurs every 1-2 days over the past 2 months. We have already enabled jumbo frames on iscsi ports, switch and san. Also enabled VLT on our 2 Dell s4048t switches. Firmware and patches are up to date. No luck resolving thus far.

Issue: Every few days our hyper-v hosts are acting up and becoming somewhat unstable all while our guest VMs sporadically reboot and become unstable.

Environment:-
Cluster – Non S2D
Node – 4 Nodes
VM- Hyper V
Storage – ISCSI SAN (Dell Compellent)
OS – 2016
Hardware: Dell r730s
AV - Defender
Backups - MABS/DPM (guest vm level only)

Event IDs 5120, 5142, 1069, 1146 and 1230. Mainly 5120 and the CSVs are entering a paused state due to IO_STATUS_TIMEOUT

Also see this in cluster log: Microsoft Failover Cluster Virtual Adapter (NetFT) has missed more than 40 percent of consecutive heartbeats

No solution, the only thing to remediate the issue is powering down guest vms to remove load on the hosts (note that our hosts are underwhelmed from a memory and cpu standpoint)

Windows Server 2016
Windows Server 2016
A Microsoft server operating system that supports enterprise-level management updated to data storage.
2,555 questions
Hyper-V
Hyper-V
A Windows technology providing a hypervisor-based virtualization solution enabling customers to consolidate workloads onto a single server.
2,787 questions
Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
1,021 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Limitless Technology 39,771 Reputation points
    2021-12-20T17:17:18.303+00:00

    Hello @Ryan

    1. There could be some Latency between your CSV Storage network and Hyper-V network.
    2. Please check if you have any File Server , SQL server or any application server which requires frequent use of Storage access which can lead to I/O bottleneck
    3. Please Disable any Antivirus program you may have for temporary purpose.
    4. Please check if you have any QoS at firewall or Switch or Dell storage level level which is not allowing full traffic flow between storage , host and VMs.
    5. Please try to disable Time of DPM back up during non-working hours or during weekend.
    6. Please run Hyper-V Cluster validation wizard to check all cluster configurations are identical and there should be no warning or errors in the Cluster report.

    Please have a look on below Microsoft article to troubleshoot the I/O issue in the Hyper-V Cluster.

    https://techcommunity.microsoft.com/t5/failover-clustering/troubleshooting-cluster-shared-volume-auto-pauses-8211-event/ba-p/371994

    -----
    --If the reply is helpful, please Upvote and Accept as answer--


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.