Jaa


Understanding Node Metrics and Properties in HPC Cluster Manager

This topic describes the node properties and metrics that are available in HPC Cluster Manager to help you monitor your cluster. The node list and heat map view in HPC Cluster Manager can be modified to display various node metrics and properties. The heat map view only displays metrics. For information about creating custom node views, see Understanding Node List, Heat Map, and Custom Tab Views. For information about adding more metrics, see Customize Metrics Collection in Windows HPC Server.

In this topic:

Alphabetical list of node properties and metrics

The following table describes the available values for node properties and metrics in HPC Cluster Manager.

Note

In the “Property or metric” column, the names of metrics and of node properties that reflect node status are denoted by bold font.

Property or metric Description Category
Affinity Displays the affinity setting for this node. Possible values:

- Null – affinity for the node is managed according to the job scheduler affinity policy (see Understanding Affinity)
- True – the HPC Node Manager Service sets affinity for all tasks that run on this node
- False - affinity on the node is not managed by the HPC services, and the operating system or the application manages placement of tasks on physical cores

This value is set by the HPC cluster administrator.
Cores/memory/disk
Application IP The IP address for the network adapter that is bound to the Application network. Network
Application Link Speed The link speed for the network adapter that is bound to the Application network. Network
Application Link State The link state for the network adapter that is bound to the Application network. If your cluster topology does not include an Application network, or if the node is not connected to this network, the value appears as Disconnected. Possible values are Connected and Disconnected

This value is periodically updated by the HPC Management Service during the discovery operation.
Network
Application NetworkDirect Whether or not a NetworkDirect provider is installed for the Application network. Possible values are True and False.

This value is periodically updated by the HPC Management Service.
Network
Available Physical Memory (MBytes) The amount of physical memory available to processes running on the computer, in megabytes. AvailableMBytes is calculated by adding the amount of space on the Zeroed, Free, and Standby memory lists. Free memory is ready for use; Zeroed memory is pages of memory filled with zeros to prevent later processes from seeing data used by a previous process; Standby memory is memory removed from a process's working set (its physical memory) en route to disk but still available to be recalled. This counter displays the last observed value only; it is not an average. Cores/memory/disk
Boot Information Information related to booting over the network from an iSCSI server. This specifies how the head node should respond to a PXE request from the node. Deployment
Context Switches / second The combined rate at which all processors on the computer are switched from one thread to another. Context switches occur when a running thread voluntarily relinquishes the processor, is preempted by a higher priority ready thread, or switches between user-mode and privileged (kernel) mode to use an Executive or subsystem service. Cores/memory/disk
Cores The number of physical cores on the computer.

This value is periodically updated by the HPC Management Service during the discovery operation. Note: If you change the hardware configuration of a compute node, ensure that the configuration change is detected and updated in the job scheduling database by taking the node Offline (preferably before making the hardware change), and then bringing the node Online again.
Cores/memory/disk
Cores In Use The number of physical cores that are currently allocated to jobs. Cores/memory/disk
CPU Usage (%) User and system time for all physical cores on the node, divided by the sampling interval times the total number of physical cores on the node. Cores/memory/disk
Description A description for the node.

This value is set by the HPC cluster administrator.
Deployment
Disk Queue Length An indication of the number of transactions that are waiting to be processed. This counter provides a primary measure of disk congestion. The queue length is representative of not only the number of transactions, but also the length and frequency of each transaction. Cores/memory/disk
Disk Throughput (Bytes/sec) An indication of the rate that data is being transferred. Describes the performance of disk throughput for the disk subsystem. Cores/memory/disk
DNS Name The fully qualified DNS name for the node, including the DNS suffix. For example, “myNode.myDomain.com”. Network
Domain Name The domain name specifications for the node. Network
Durable Queues Total Bytes Total number of bytes of Message Queuing messages on the broker node. The broker node stores messages using Microsoft Message Queuing (MSMQ) when SOA clients create sessions on the cluster using the Durable Session APIs. Responses that are stored by the broker can be retrieved by the client at any time, even after intentional or unintentional disconnect. Messages are deleted when SOA clients retrieve their responses and close the session, or when the job history retention period is reached (by default, this is set to three days).

By default, the MSMQ storage limit is 8 GB. When the MSMQ quota is reached, durable sessions stop working.
SOA
Durable Queues Total Messages Total number of Message Queuing messages on the broker node. SOA
Durable Requests Queue Length Total number of requests stored in local Message Queuing. SOA
Durable Responses Queue Length Total number of responses stored in local Message Queuing. SOA
Enterprise IP The IP address for the network adapter that is bound to the Enterprise network. Network
Enterprise Link Speed The link speed for the network adapter that is bound to the Enterprise network. Network
Enterprise Link State The link state for the network adapter that is bound to the Enterprise network. If the node is not connected to this network, the value appears as Disconnected. Possible values are Connected and Disconnected

This value is periodically updated by the HPC Management Service during the discovery operation.
Network
Enterprise NetworkDirect Whether or not a NetworkDirect provider is installed for the Enterprise network. Possible values are True and False.

This value is periodically updated by the HPC Management Service.
Network
Free Disk Space (%) Percentage of total usable space on the local disk. Cores/memory/disk
Groups The node groups to which the node belongs. Membership in the default node groups is determined at deployment or by changing the node role. Membership in custom node groups is determined by the HPC cluster administrator. Status/workload
HPC SOA Calculations/Sec Current calculating calls from the broker node. This is a moving average of the past N seconds. This value can be significantly higher than the number of cores because of caching on the service host.

The HPC SOA metrics, along with the memory and CPU metrics, can help you determine how to scale your broker nodes. For example, when the SOA throughput, memory, and CPU usage are high on your broker nodes, add more brokers. When these metrics are low, convert some brokers to compute nodes. For more information, see Multiple roles and broker scaling.
SOA
HPC SOA Faults/Sec The number of faulted calls on the node per second. SOA
HPC SOA Requests/Sec The number of requests to the broker node per second. SOA
HPC SOA Responses/Sec The number of responses on the broker node. This is a moving average of the past N seconds. SOA
Idle Whether or not the workstation node is idle. Possible values:

- Null – applied to any node that is not a workstation node, and to workstation nodes that do not use the activity detection policy.
- True – the user activity that is detected on this node is below the threshold that is defined in the Workstation Availability Policy. The node can be used to run jobs.
- False – the user activity that is detected on this node is above the threshold that is defined in the Workstation Availability Policy. The node cannot be used to run jobs.
Status/workload
Install Path The path where the HPC Pack software is installed.

This value is not listed for Windows Azure nodes.
Deployment
Installed Service Roles The HPC node roles that are installed on the node. Node roles that are installed can be enabled or disabled by changing the node role (enabled roles are listed in the Node Role property). For more information, see Understanding Node Roles in Microsoft HPC Pack.

Dedicated, on-premises nodes can have the following node roles installed:

- HeadNode (head nodes only)
- BrokerNode
- ComputeNode

Windows Azure nodes can have one of the following node roles installed:

- Windows Azure Worker Node
- Windows Azure Virtual Machine Node Note: The Windows Azure Work Node role is available starting with HPC Pack 2008 R2 with Service Pack 1 (SP1). The Windows Azure Virtual Machine Node role is available starting with HPC Pack 2008 R2 with Service Pack 2 (SP2).

Workstation nodes can have the following role installed:

- Workstation Node

Unmanaged server nodes can have the following role installed:

- Unmanaged Server Node Note: The Unmanaged Server Node role is available starting with HPC Pack 2008 R2 with Service Pack 3 (SP3).
Deployment
Location The primary, secondary, and tertiary locations details for the node. For example, data center, server rack, chassis.

This property value can be specified by the HPC cluster administrator.
Deployment
LUN Mapping A GUID that identifies the iSCSI boot node. Deployment
Machine Guid The SMBIOS GUID of the node. Deployment
Management Ip Address The out-of-band management IP address for the node that you can use for scriptable power control tools such as Intelligent Platform Management Interface (IPMI) scripts. For example, this can be set to the IP address for the Base Management Controller (BMC) of the compute node. For more information, see Scriptable Power Control Tools.

This property value can be set by the HPC cluster administrator.
Deployment
Memory The amount of memory installed on the node. Cores/memory/disk
Memory Paging (Hard Faults/second) The number of hard page faults per second. A hard fault occurs when the address in memory of part of a program is no longer in main memory, but has been swapped out to the paging file, making the system look for it on the hard disk. When this occurs a lot, it causes slowdowns and increased hard disk activity. When it occurs excessively, the possibility of hard disk thrashing arises (when a program stops responding, but the hard drive continues to run for an extended period). Cores/memory/disk
Name The name of the node, including the domain. For example, DOMAIN\nodename.

For Windows Azure nodes, this name is AZURE\nodename.
Deployment
NetBoot MAC Address The MAC address of the network adapter that is bound to the Private network. This is the network that is used when deploying an operating system image to the node (PXE boot). Deployment
Network Usage (Bytes/second) An indication of the total network throughput for all networks on a node. This does not include NetworkDirect traffic, because NetworkDirect bypasses TCP/IP. Network
Node Health The overall indication of node health. Indicates whether or not there are any warnings or errors that the HPC services are aware of on that node, if the node is performing an operation that was initiated by the HPC cluster administrator, or if the node has not been added to the cluster. For information about node health values, see Understanding Node States, Health, and Operations. Status/workload
Node Name The name of the node.

For nodes that are deployed from bare metal, this name is automatically assigned according to the node naming series that the HPC cluster administrator defines in the node template.

For Windows Azure nodes, the name starts with “AzureCN-” followed by a number. For example, AzureCN-0001.
Deployment
Node Role The node roles that are enabled for the node. Dedicated, on-premises nodes can have more than one role enabled, depending on what roles are installed (installed roles are listed in the Installed Service Roles property). Possible values:

- ComputeNode
- BrokerNode
- Unmanaged Server Node
- Windows Azure Worker Node
- Windows Azure Virtual Machine Node
- Workstation Node

The head node role is not displayed in this property. Note: The Unmanaged Server Node role is available starting with HPC Pack 2008 R2 with Service Pack 3 (SP3). Note: The Windows Azure Work Node role is available starting with HPC Pack 2008 R2 with Service Pack 1 (SP1). The Windows Azure Virtual Machine Node role is available starting with HPC Pack 2008 R2 with Service Pack 2 (SP2).

For more information, see Understanding Node Roles in Microsoft HPC Pack.
Status/workload
Node State The node’s deployment state, or whether or not an administrator wants the node to be available as a resource for cluster jobs (Online or Offline). For information about node state values, see Understanding Node States, Health, and Operations. Status/workload
Node Template The name of the node template that was used to deploy the node or to join the node to the cluster. Deployment
OS Architecture The operating system architecture on the node. Deployment
OS Version The operating system version on the node. Deployment
Primary HeadNode For a head node that is configured for high availability in a failover cluster, the initial head node computer on which HPC Pack is installed has a value set to True for this property. Warning: This property is removed starting with HPC Pack 2012. Status/workload
Private IP The IP address for the network adapter that is bound to the Private network. Network
Private Link Speed The link speed for the network adapter that is bound to the Private network. Network
Private Link State The link state for the network adapter that is bound to the Private network. If your cluster topology does not include a Private network, or if the node is not connected to this network, the value appears as Disconnected. Possible values are Connected and Disconnected.

This value is periodically updated by the HPC Management Service during the discovery operation.
Network
Private NetworkDirect Whether or not a NetworkDirect provider is installed for the Private network. Possible values are True and False.

This value is periodically updated by the HPC Management Service.
Network
Processors Name and properties of the processors that are installed on the node. Cores/memory/disk
Product Key The Windows product key that will be used to activate the operating system on the node.

This property value can be specified by the HPC cluster administrator.
Deployment
Progress The most recent deployment log entry during deployment or provisioning operations. You can sort by this column to help monitor deployment progress. Deployment
Provisioned Whether or not HPC Pack is installed on the node. Possible values are True and False. Note: If you assign a node template that includes steps to deploy an operating system and this property is True, only the tasks in the Maintenance phase of the node template will run. If you want to reinstall the operating system, you can assign the template, then run the Reimage action. Deployment
Running Jobs The number of jobs that are currently using this node. Status/workload
Running Tasks The number of tasks, subtasks, or task processes (such as an MPI rank) that are currently using this node. The number can be higher than the number of physical cores or sockets if the subscribed cores or sockets properties are set on the node. Status/workload
Service Health The overall indication of the health of the HPC services. Indicates whether or not there are any warnings or errors that the HPC services are aware of on that node. Status/workload
Sockets The number of physical sockets on the node. Cores/memory/disk
Subscribed Cores The number of logical cores that the HPC Job Scheduler Service will use when it is allocating tasks to the node. It can be larger or smaller than the number of physical cores. Note: The “cores in use” metric reflects how many physical cores are in use. The “running tasks” metric can help you monitor how many subscribed cores are in use.

This value is set by the HPC cluster administrator.
Cores/memory/disk
Subscribed Sockets The number of logical sockets that the HPC Job Scheduler Service will use when it is allocating tasks to the node. It can be larger or smaller than the number of physical sockets.

This value is set by the HPC cluster administrator.
Cores/memory/disk
System Calls / second This counter is a measure of the number of calls made to the system components, Kernel mode services. This is a measure of how busy the system is managing applications and services. When compared to the Interrupts/Sec it will give you an indication of whether processor issues are hardware or software related. Cores/memory/disk
UnattendSetup Whether or not setup.exe ran with the –unattend flag. Deployment
Version The version number of HPC Pack that is installed on the node. For example:

- HPC Pack 2008 R2 has a value of 3.0.xxxx.x.
- HPC Pack 2008 R2 with SP4 has a value of 3.4.xxxx.x.
- HPC Pack 2012 has a value of 4.0.xxxx.x.
Deployment
Windows Azure Instance Name The computer name of the Windows Azure role instance. This value is assigned by Windows Azure. Azure
Windows Azure Node Address The IP address of the Windows Azure node. This value is assigned by Windows Azure. For a list of the public IP ranges, see the posted IP Ranges. Azure
Windows Azure Node Size The size of the Windows Azure node instance. The size determines number of CPU cores, memory capacity, and disk space as defined by Windows Azure.

This value is specified by the HPC cluster administrator when adding Windows Azure nodes to the cluster.
Azure
Windows Azure Service Name The public name of the hosted service (in the Windows Azure subscription) in which this Windows Azure node is deployed.

This value is defined by the HPC cluster administrator in the node template.
Azure
Windows Azure Storage Service Name The public name of the storage account (in the Windows Azure subscription) that is associated with the Windows Azure node.

This value is defined by the HPC cluster administrator in the node template.
Azure
Windows Azure Subscription ID The unique ID for the Windows Azure subscription account associated with the Windows Azure node.

This value is defined by the HPC cluster administrator in the node template.
Azure

Node properties and metrics by conceptual categories

The following lists group the properties and metrics by functional categories so that you can quickly identify what values are available for different aspects of the cluster. These lists can help you select which values to display in custom node views to help monitor different aspects of cluster performance. In the following lists, the names of metrics and of node properties that reflect node status are denoted by bold font.

Cores/memory/disk

  • Processors

  • Cores

  • Sockets

  • Cores In Use

  • CPU Usage (%)

  • Context Switches / second

  • System Calls / second

  • Affinity

  • Subscribed Cores

  • Subscribed Sockets

  • Memory

  • Available Physical Memory (MBytes)

  • Memory Paging (Hard Faults/second)

  • Free Disk Space (%)

  • Disk Queue Length

  • Disk Throughput (Bytes/sec)

Status/workload

  • Node State

  • Node Health

  • Node Role

  • Groups

  • Primary HeadNode

  • Service Health

  • Idle

  • Running Jobs

  • Running Tasks

SOA

  • Durable Queues Total Bytes

  • Durable Queues Total Messages

  • Durable Requests Queue

  • Durable Responses Queue

  • HPC SOA Calculations/Sec

  • HPC SOA Faults/Sec

  • HPC SOA Requests/Sec

  • HPC SOA Responses/Sec

Network

  • DNS Name

  • Domain Name

  • Enterprise IP

  • Enterprise Link Speed

  • Enterprise Link State

  • Enterprise NetworkDirect

  • Private IP

  • Private Link Speed

  • Private Link State

  • Private NetworkDirect

  • Application IP

  • Application Link Speed

  • Application Link State

  • Application Network Direct

  • Network Usage (Bytes/second)

Deployment

  • Name

  • Node Name

  • Node Template

  • Description

  • Location

  • Machine Guid

  • NetBoot MAC Address

  • Boot Information

  • Install Path

  • Version

  • Installed Service Roles

  • OS Architecture

  • OS Version

  • Product Key

  • Management Ip Address

  • LUN Mapping

  • Provisioned

  • UnattendSetup

  • Progress

Azure

  • Size

  • Windows Azure Instance Name

  • Windows Azure Node Address

  • Windows Azure Node Size

  • Windows Azure Service Name

  • Windows Azure Storage Service Name

  • Windows Azure Subscription ID

Additional considerations

HPC Pack 2008 R2 SP1 additions

The following properties or metrics were added in Service Pack 1 of HPC Pack 2008 R2. These changes are related to the ability to add Windows Azure nodes to the cluster. For more information, see Deploying Azure Nodes with Microsoft HPC Pack [RETIRED].

  • Size

  • Windows Azure Node Address

  • Windows Azure Service Name

  • Windows Azure Storage Service Name

  • Windows Azure Subscription ID

HPC Pack 2008 R2 SP2 additions

The following properties or metrics were added in Service Pack 2 of HPC Pack 2008 R2. These changes are related to the ability to oversubscribe and undersubscribe nodes.

  • Affinity

  • Subscribed Cores

  • Subscribed Sockets

Additional references