Introducing OMS Network Performance Monitor

[アーティクル]
07/27/2016

Summary: Perform near real-time monitoring of network performance and localize network faults in Microsoft Operations Management Suite.

Hi, everyone. Abhave Sharma here, and today I want to talk about a new solution in OMS, Network Performance Monitor (NPM), that helps you perform near real-time monitoring of network performance parameters (such as packet loss and network latency) and localize network faults. It not only detects network performance issues, but it also localizes the source of the problem to a particular network segment or device to make it easy for you to locate and fix a network performance issue.

How does the solution work?

NPM uses synthetic transactions (TCP ping, which is explained later in this post) as a primary mechanism to detect and locate network performance bottlenecks. The solution detects IPv4 and IPv6 subnets that are directly connected to the machines on which the OMS agent has been installed and uploads this information to OMS.

All agents know the other agents that they should ping and note the packet retransmissions and roundtrip time that is encountered for each ping. This data is used to determine the packet loss and network link latency that is then uploaded to OMS, aggregated by the service, and presented to you on solution dashboard.

The following diagram sums up how the solution works

Diagram that shows how the solution works.

Why TCP ping

You may be wondering why we aren’t using the usual internet control message protocol (ICMP) ping instead of TCP. One reason is that routers do not give the same priority to ICMP traffic as they do to TCP packets. Consequently, ICMP-based pings might provide incorrect results in certain scenarios. Another reason is that HTTP is based on TCP. If we measure TCP performance, we get a good handle on how application response time is affected by the network.

A point to note here is that these pings use almost negligible bandwidth because only TCP control packets are exchanged and no data packets are transmitted for pings.

Monitoring model

Before we talk about the monitoring model, let me explain some of the terminology.

A node here represents the machine (VM or host) on which the NPM solution has been enabled. Connectivity between two network nodes is represented by respective node links.

All nodes that are connected to the same subnet are grouped in a subnetwork. The network connectivity between two subnets is represented by subnetwork links that are composed of one or more node links. The performance metrics that are computed for node links are aggregated to deduce the loss and latency for the subnetwork link.

You can group one or more subnetworks that are related to each other in logical containers called networks and give any name to these networks. The network connectivity between two networks is represented by network links that are composed of one or more subnetwork links.

As an illustration, the following diagram shows two networks: Network A and Network B. Network A is composed of subnetworks 10.10.1.0/24 and 10.10.2.0/24. Network B is composed of a single subnetwork 10.10.4.0/24. Subnets are in turn composed of nodes. For example, subnet 10.10.1.0/24 is composed of Nodes 10.10.1.1 and 10.10.1.2.

Diagram that shows the relationship between Network A and Network B.

Network links, subnetwork links, and node links have a hierarchical relationship. A network link is composed of one or more subnetwork links. Similarly, a subnetwork link is composed of one or more node links.

Illustration that shows the hierarchical relationship among network links, subnetwork links, and node links.

The solutions view

Solution Overview tile

After NPM is deployed and configured, you can see a quick snapshot of the network health on the OMS homepage. The solution tile shows a doughnut chart that depicts the number of healthy and unhealthy subnetwork links.

Network performance monitor tile.

Click this tile to go to the solution dashboard.

Solution dashboard

The solution dashboard provides a quick glance of what’s happening in your network. The first blade shows a summary of your network: Number of subnetworks discovered (and their network-wise distribution), network links, subnet links, and paths in the system. A path consists of the IP addresses of two agents and all the hops between them.

The Top Network Health Events blade provides a list of most recent health events in the system and the time since the event has been active. A health event is generated whenever the packet loss or latency of a network or subnetwork link crosses a threshold.

The Top Unhealthy Network Links blade shows a list of unhealthy network links. These are the network links that have one or more adverse health event for them at the moment.

Screenshot of the top unhealthy network links tiles.

The next two blades show top subnetwork links by packet loss and top subnetwork links by latency.

The Common Queries blade contains a set of OMS search queries that you can use to fetch the raw network monitoring data directly. You can use these queries as a starting point to create your own queries for customized reporting.

Screenshot that shows blades for top subnetwork links by packet loss and top subnetwork links by latency.

Drill-down pages

You can click the various links on the solution dashboard to drill down into the area of interest. For example, when you see an alert or an unhealthy network link popup on the dashboard, you can click the unhealthy network link to investigate further. The next page will list all the subnetwork links for the particular network link. You can see the loss, latency, and health status of each subnetwork link and quickly determine the ones that are causing the problem. You can click View node links on the right side to see all the node links that comprise the unhealthy subnet link.

Screenshot that shows performance of subnetwork links.

You can now see individual node to node links and find the unhealthy node links.

Screenshot that shows unhealthy node links.

If you click the View topology map link, you will see the hop-by-hop topology of the routes between the source and destination nodes. The unhealthy routes or hops will be colored in red, which will help you to quickly localize the problem to a particular section of the network.

Screenshot of hop-by-hop topology of the routes between the source and destination nodes.

Trend charts

You can easily investigate the usually difficult-to-detect transient issues, which are manifested as sudden spikes, by analyzing the trend of loss and latency for a link. You can change the time windows for which the graph is plotted by using the time control at the top of the chart.

Screenshot that shows that shows how you can change the time windows for a graph is plotted by using the time control.

Hop-by-hop topology map

With NPM, you can visualize the hop-by-hop topology of routes between two nodes on an interactive topology map. It gives you a clear picture of how many routes exist between the two nodes and the paths that the data packets are taking. Network performance bottlenecks are marked in red on the topology map. You can locate a faulty network connection or a faulty network device by looking at red elements on the topology map.

Screenshot of the color-coded hop-by-hop topology map.

OMS search

All the data that is exposed graphically through the NPM dashboard and drill-down pages is also available natively in OMS search. You can directly query this data by using OMS query language and create custom reports by exporting the data to Excel or Power BI. The last blade in the NPM dashboard has some useful queries that can be used as the starting point to create your own queries and reports.

Screenshot of common queries.

That is all I have for you today. Join me next time when I talk about what’s coming next with the Network Performance Monitor in OMS.

For more information on this new solution, please visit the Operations Management Suite documentation webpage or sign up for a free trial. Follow us on Twitter @MSCloudMgmt.

Abhave Sharma
Microsoft Operations Management Team

Comments

Anonymous
July 26, 2016
Awesome, Please translate OMS, NPS to Japanese.I want to share our customers who not well English.
- Anonymous
  July 29, 2016
  @Yoshihiro,OMS framework supports localization. You can change the language to Japanese from the localization button in the header of the OMS portal.
Anonymous
July 26, 2016
I like this post. OMS Network Performance is good .We visit again for more updates .Thanks for sharing this article.Microsoft Office365
Anonymous
July 27, 2016
Sounds useful, Interested to test, Any idea\date when the solution pack will be available?.
- Anonymous
  July 28, 2016
  The solution is in public preview now, please see the announcement - https://blogs.technet.microsoft.com/systemcenter/2016/07/27/new-monitoring-features-for-network-performance-backup/ Thanks.
Anonymous
July 27, 2016
Like it a lot. Wonder how this maps to O365 troubleshooting where network performance issues exist.Can you report on TCP features enabled on network devices? Thanks for showing.Everyone loves an automated topology map too!
- Anonymous
  July 29, 2016
  @JRPritchard,Thanks for showing interest. Can you elaborate a little more on the TCP features you are interested in?
Anonymous
July 27, 2016
I'd like to see something similar to this in SCOM 2016.
- Anonymous
  July 28, 2016
  With regard to the existing network device monitoring in SCOM and the new OMS – NPM capability. It will be good to know your opinion/feedback on whether you would prefer to see the similar to OMS – NPM capability in SCOM or have the network device monitoring in SCOM integrate with OMS – NPM, such that the device health learnt by SCOM is leveraged to localize the fault to network device in OMS – NPM.
  - Anonymous
    November 18, 2016
    The comment has been removed
Anonymous
July 28, 2016
Hi,Would i like to know, how much time NPM spends to show data on dashboard?Thanks a Lot!
- Anonymous
  July 29, 2016
  @João,Once the agents are configured, NPM usually takes less than 30 minutes to show the data on the solution dashboard. The data on the dashboard is refreshed every 3 minutes.
Anonymous
August 02, 2016
MY OMS connect to SCOM 2012 R2,when I add NPM solution pack, I can't see NPMDAgent.exe running.But another environment SCOM 2016 connect to OMS, I can see NPMDAgent.exe is running.Why?
- Anonymous
  August 02, 2016
  @AllenPlease make sure you have run the PowerShell script on the machines where you want to enable NPM. Also please let us know if there is any difference in operating systems of the machines.
  - Anonymous
    August 02, 2016
    Already run the PowerShell script and all Operation System is Windows server 2012 R2
Anonymous
August 05, 2016
I would love to see this provide a full network map topology in addition to the two node pathway topology. Really enjoying the NPM solution. Great work!
- Anonymous
  August 08, 2016
  Thank you for the kind words, Dave! We will definitely keep your suggestion in mind while deciding our future investments.
Anonymous
October 03, 2016
Do we need to have SCOM installed for NPM to work?
- Anonymous
  October 17, 2016
  No. SCOM is not a prerequisite for NPM. It can work with SCOM as well as direct agents.
Anonymous
October 19, 2016
Is OMS and NPM available for on-prem only?
- Anonymous
  November 08, 2016
  @James,NPM can be used to monitor performance across on-premises, cloud (IaaS) as well as hybrid networks. Please read more about it here-https://blogs.technet.microsoft.com/msoms/2016/08/30/monitor-on-premises-cloud-iaas-and-hybrid-networks-using-oms-network-performance-monitor
Anonymous
February 09, 2017
The comment has been removed
- Anonymous
  March 09, 2017
  The comment has been removed
  - Anonymous
    March 10, 2017
    @Mike, @MathewThank you for highlighting this issue. Will you be willing share the details in the form here: https://aka.ms/npmcohortWe will investigate this and let you know
Anonymous
June 30, 2017
What ports need to be opened in a hardware firewall between subnets to allow NPM to work?

次の方法で共有