Monitoring Nodes
Applies To: Windows HPC Server 2008
A key step in monitoring and maintaining cluster health is to identify any deviance from normal operational state or performance. HPC Cluster Manager enables you to view cluster and node status at a glance, identify problem nodes, and drill down into node details for further investigation.
Note
HPC Cluster Manager provides several charts and reports to monitor and analyze cluster resource usage and job and node statistics. For more information, see Understanding Charts and Reports.
View cluster status at a glance
Drill down into individual node details
Monitor node operations
Correlate the monitoring information between nodes, jobs, operations, and diagnostics
View cluster status at a glance
In Node Management you can monitor your cluster at a glance using the node List view or the node Heat Map view. For more information, see:
Drill down into individual node details
The List and Heat Map views provide a starting point for identifying problem areas. Double-click a compute node to see detailed information such as hardware, operating system properties, and current performance metrics. You can also select one or more nodes, then drill down into the node details to investigate performance.
Run Diagnostic Tests: Run diagnostic tests on one or more compute nodes.
View Performance Charts: View a chart of the performance metrics for a compute node over time.
View Node Events: View events generated by HPC services on a specific compute node.
Open a Remote Desktop Connection: Open a remote desktop session to one or more compute nodes.
Monitor node operations
Tracking recent or ongoing cluster operations is another monitoring aspect that is critical to administrating a cluster. Windows HPC Server 2008 archives recent operations and allows you to view the progress of ongoing operations in real time. For more information, see:
Correlate the monitoring information between nodes, jobs, operations, and diagnostics
In HPC Job Manager, you can use the Pivot To actions to correlate the monitoring information between nodes, jobs, operations, and diagnostics. For example, you can select one or more nodes in the views pane, and then pivot to the Jobs for the Selected Nodes. This takes you to a job list view that is filtered by the nodes that you selected.
The supported pivot paths are:
Nodes: pivot to jobs, test results, and operations.
Jobs: pivot to nodes.
Test results: pivot to failed nodes, and operations.