Leveraging Event Log Messages and Performance Counter Alerts To Automate Hyper-V
A few times I’ve had people ask me about how they automate moving VM’s off of a cluster node based on some type of detected failure or performance issue. In general my answer is to utilize System Center specifically Operations Manager – many customers I work with take this advice and leverage the management packs and broad insight that Operations Manager provides to create automatic Diagnostics and Recoveries tied to Monitors. But for other customers this is to heavy handed or they
don’t have the resources to leverage Operations Manager. So are they out of luck? Not really – there are two commonly overlooked feature in Windows, the Task Scheduler and Performance Monitor.
Within the Windows Task Scheduler you can create a task which is automatically executed every time an event is generated the task can then run a script. For example you can create a task which is tied to the event that is logged when a network adapter that is connected to a virtual switch has it’s network cable disconnected, when that event is detected you can have a script that pauses the cluster node and migrates all of the VM’s off of it. Similarly you can configure the Windows Performance Monitor to trigger a scheduled task when a performance threshold has been exceeded. What actions these tasks take is completely up to your imagination and the how complex you want to make the scripts they execute.
The biggest draw back to this method vs using something like Operations Manager is that magic term ‘centralization’ i.e. these tasks have to be configured on each server and any changes have to be replicated to each server. Additionally either the scripts have to be flexible enough work across varying server configurations (for example one NIC or two, teamed or not) or you have to configure the script/task individually for each server. However going back to our original statement this is really an answer for environments where Operations Manger is too heavy thus limited number of servers.
I’m going to provide a few basic examples just to get you thinking and as a proof of concept. Specifically I am going to focus on the disconnected network adapter and some overall performance metrics (CPU utilization, Disk throughput and Network throughput). We’ll start with the disconnected network adapter.
Creating a Task Triggered By Events
Creating a Performance Monitor Alert To Trigger a Task
More Robust Disconnected NIC Script
This is a more robust script for the disconnected host NIC – what the script does is identifies which NIC is disconnected and any VM’s that are impacted by that NIC being disconnected. It then checks all of the other cluster nodes to see if there NIC is also disconnected before migrating only the effected VM’s.
$NicDisconnectedLog = [String]::Empty
$NicDisconnectedLog += "NIC Adapter Disconnection Detected - Attempting to Move VMs`n"
$Event = Get-EventLog -LogName System -Source Microsoft-Windows-Hyper-V-VmSwitch `
-InstanceId 24 -Newest 1
$NicDisconnectedLog += ("Disconnected NIC Description: " + `
$Event.ReplacementStrings[3] + "`n")
$Switch = Get-VMSwitch -SwitchType External | Where-Object `
{$_.NetAdapterInterfaceDescription -eq $Event.ReplacementStrings[3]}
$NicDisconnectedLog += ("Associated Switch Name: " + $Switch.Name + "`n")
$NicDisconnectedLog += ("Determining Available Cluster Nodes`n")
$AvalableClusterNodes = @()
foreach ($clusterNode in (Get-ClusterNode | Where-Object {$_.State -eq "Up"}))
{
$destSwitch = Get-VMSwitch -Name $Switch.Name -ComputerName $clusterNode.Name
if ((Get-NetAdapter -InterfaceDescription $destSwitch.NetAdapterInterfaceDescription `
-CimSession $clusterNode.Name).Status -eq "Up")
{
$AvalableClusterNodes+= $clusterNode
$NicDisconnectedLog += ("Node: " + $clusterNode.Name + " is available.`n")
}
else
{
$NicDisconnectedLog += ("Node: " + $clusterNode.Name + `
" also has a disconnected switch.`n")
}
}
if ($AvalableClusterNodes.Count -eq 0)
{
$NicDisconnectedLog += ("No Available Cluster Nodes - Exiting`n")
}
else
{
$NicDisconnectedLog += ("Determining Effected VM's`n")
$EffectedNics = Get-VMNetworkAdapter -VMName * | Where-Object `
{$_.SwitchName -eq $Switch.Name}
$VMsToMove = @()
foreach ($Nic in $EffectedNics)
{
if (!$VMsToMove.Contains($Nic.VMId))
{
$NicDisconnectedLog += ("VM: " + $Nic.VMName + "ID:(" + `
$Nic.VMId + ")" +" is effected - preparing to move.`n")
$VMsToMove += $Nic.VMId
}
}
$NicDisconnectedLog += ("Preparing to Move Effected VM's`n")
for ($MoveCounter = $VMsToMove.Count; $MoveCounter -gt 0; $MoveCounter--)
{
$attemptCount = 0
do {
$destinationNode = $AvalableClusterNodes[(($MoveCounter+$attemptCount) `
% $AvalableClusterNodes.Count)]
$NicDisconnectedLog += ("Moving VM with ID: " + $VMsToMove[($MoveCounter-1)] `
+ " to node: " + $destinationNode.Name + "`n")
$result = Move-ClusterVirtualMachineRole -VMId $VMsToMove[($MoveCounter-1)] `
-Node $destinationNode.Name -MigrationType Live
$attemptCount++
}
while (($result.OwnerNode -ne $destinationNode.Name) `
-and ($attemptCount -lt $AvalableClusterNodes.Count))
}
}
$NicDisconnectedLog | Out-File "C:\Scripts\NicLog.txt"
-taylorb
Comments
- Anonymous
September 25, 2012
thanks for sharing