Network Infrastructure: Development

Developing for the Network Diagnostics Framework (NDF)

Windows Vista® greatly improves the reliability of the network experience. NDF is an extension of the Windows Diagnostics Infrastructure (WDI) that supplies the infrastructure that networking components use to convey status information. Before NDF, there was no standard way to obtain such diagnostic information in user-mode code. NDF uses the WDI Event Tracing for Windows (ETW) to report network events and, when appropriate to automatically start a diagnostics session. By working with NDF, network components can:

  • Report error conditions that are raised as error events by WDI and entered into the Windows Event Log by ETW.

  • Enable NDF to trace dependent components, and determine the status of those components. If the components are offline or malfunctioning, Windows Vista might be able to automatically repair the problem.

  • Empower users, aided by network troubleshooters, to diagnose and resolve network issues themselves.

  • Provide users with clear network status information and sufficient data to work effectively with product support teams.

  • Create custom application-specific network diagnostic and repair functionality through helper classes.

NDF is enabled for a network component through the creation of a corresponding network troubleshooter or helper class. The APIs for these classes are provided for COM/C++ developers. It is this helper class that monitors the status of the network component. Microsoft supplies NDF helper classes for all the key components in the Windows Vista network subsystem. ISVs and IHVs should supply helper classes for network components that they distribute, such as network card device drivers.

The following concepts apply to NDF:

  • Network Diagnostics Session — the process of gathering and diagnosing data through NDF in an effort to identify a set of resolutions

  • Health — the current status of the associated network component: offline, healthy, low-health, high-utilization, or failed

  • Network Root Cause — the originating cause of a network issue or malfunction

  • Network Resolution — the steps necessary to repair or ameliorate each network root cause

Each network helper class maintains a list of root causes that it knows how to detect, as well as a list of resolution steps associated with each root cause. When queried, it should supply a user-friendly description of these causes and their possible resolutions.

NDF also supplies a public invocation interface so that a network diagnostics session can be initiated by another component, service, application, or by the operating system itself. This interface is typically accessed through one of the new sets of network diagnostics API extensions added to the following interfaces: WinHTTP, WinINet, Winsock, RPC, or the .NET Framework.

For more information about NDF, see the Network Diagnostics Foundation in the Windows SDK.

Architecture and Flow

From an architectural perspective, NDF performs the following main functions:

  • NDF extends WDI to encompass the network subsystem. From the WDI perspective, NDF is the network troubleshooter. It acts as an intermediary between WDI and the helper classes. During a network diagnostic session, NDF also provides a mechanism for these helper classes to communicate results with each other.

  • NDF organizes and coordinates the smaller network troubleshooting units, the helper classes. It organizes the helper classes by discovering their dependencies and representing these dependencies as trees. During a network diagnostic session, NDF invokes the necessary helper classes when a networking incident is created or when following a dependency path during a diagnosis.

  • NDF supplies an invocation interface so that applications can initiate a network diagnostic session. This interface is exposed through extensions to the following standard network APIs: WinHTTP, WinINet, Winsock, RPC, and the .NET Framework. As the diagram shows, calls to this interface are first routed though the WDI client.

The following illustration shows the NDF architecture, including access points for third-party applications. Note that there are two APIs associated with this technology: the Invocation API and the Helper Class API.

Bb757011.Top10_C10(en-us,MSDN.10).gif

The Network Diagnostics Engine is the central base for network troubleshooting. It accepts the initial request from WDI, collects the information necessary to create a dependency tree, provides the necessary handoff to individual helper classes, and tracks the progress of the incident. Windows Vista supplies special extensible helper classes; as their name implies, these components are designed to aid in the development of vendor-specific helper classes.

The Diagnostics Engine moves through the following six major states during a network problem:

  1. Problem suspected — because of an encountered malfunction or a suspected issue, the NDF invocation API is invoked. As already described, this first calls into WDI.

  2. Incident creation — an NDF session for the problem is created through the following sub-steps:

    1. WDI checks local policies to see what operating parameters are associated with the incident. If diagnostics is not allowed, WDI will simply return to the caller. If diagnostics is allowed, WDI creates an incident or container that stores and tracks the data associated with a diagnostics session.

    2. WDI identifies and classifies the problem. Only network problems are handed over to NDF.

    3. A Watson schema is created and attached to the container, which makes it accessible by Windows Error Reporting (WER).

  3. UI notification — the user is notified that a diagnostics session has begun.

  4. Problem diagnosis — NDF diagnoses the incident by walking the dependency tree for the network process and, at each step, interacting with the helper class to identify the root cause. The UI is updated to reflect this new information.

  5. Problem resolution(s) — NDF collects the appropriate root cause resolutions and workarounds from helper classes, characterizes and prioritizes them, and then applies each prospective network resolution until the problem is solved. The UI is updated to reflect the problem resolution.

  6. Problem reporting — WDI is responsible for the data retention and reporting policy for diagnostics incidents. Throughout the previous steps, NDF was updating WDI on its progress. After resolution, the incident report can be sent to the local administrator, to a specific individual, or to Microsoft as controlled by the Event Reporting Console (ERC).

  7. Problem resolved — NDF cleans up and unloads until it is needed again.

Network Diagnostics Framework Invocation

The UI for NDF user messages can be provided either by WDI/Windows or by the process that invoked the session. The operating system will also initiate network diagnostics sessions through the following common entry points:

  • Applications or shell processes — this is the preferred method of initiating a network diagnostics session because it provides the user with proper context with respect to the current task.

  • Network-related tasks — within the Windows Vista GUI allow the user to invoke a session to troubleshoot a network problem. This might be necessary if the malfunctioning network component was not written to support NDF. For example, the following Control Panel path invokes NDF:

    Control PanelNetwork and InternetNetwork ConnectionsDiagnose Connection Problems

    Direct linkage to NDF is supplied by all relevant network support tasks, including the Help and Support Center, Network Status Indicator, Network Map, Network Errors, Network Connections folder, Network Neighborhood, and the Network Printer Connections.

  • Network components — invoke diagnostics outside of the context of a user application or network support task. If a network component determines it is malfunctioning, the component will be able to call diagnostics to help understand the problem. Based on the root cause and resolutions returned, the component can make appropriate corrective actions or alert the user through standard WDI actions.

  • Command-line or NetShell command — provide direct user invocation of NDF. The Netsh command uses the diagnostics netsh provider. Users can perform manual diagnostics themselves or write batch files or scripts that call into NDF and parse the output. To investigate a specific network stream or component, the user will have to provide the necessary contextual data necessary to perform a diagnosis (for example, the destination or port).

NDF Invocation API

NDF sessions can also be initiated and controlled through an invocation API, which has the following Win32 functions declared in Ndfapi.h.

Function

Description

NdfCreateIncident

Used internally by application developers to test the NDF functionality incorporated into their application.

NdfCreateConnectivityIncident

Diagnoses generic internet connectivity problems.

NdfCreateDNSIncident

Diagnoses name resolution issues in resolving a specific host name.

NdfCreateSharingIncident

Diagnoses network problems in accessing a specific network share.

NdfCreateWebIncident

Diagnoses web connectivity problems concerning a specific URL.

NdfCreateWinSockIncident

Provides access the Winsock Helper Class provided by Microsoft.

NdfExecuteDiagnosis

Diagnoses the root cause of the incident that has occurred.

NdfCloseIncident

Closes an NDF incident following its resolution.

NDF Helper Class API

All network components should have a NDF helper class associated with them. A helper class component is only loaded upon demand, when a network incident that possibly involves the associated component occurs. Creating a helper class involves two steps:

  • Create the helper class DLL using the C++/COM APIs. An individual DLL can contain one or more NDF helper classes.

  • Create an associated manifest that registers the helper class by entering the following information into the Windows registry: the name of the helper class, the dependency files for the class, the CLSID of the object that implements it, and the standard NDF interfaces INetDiagHelperInfo and INetDiagHelper.

After successfully creating or obtaining a helper class instance, the NDF validates the input key attributes by invoking the INetDiagHelperInfo interface. This is necessary to be sure the NDF has created an instance of the correct helper class and that this class has the information NDF needs to diagnose the problem. The INetDiagHelper interface is called by the NDF engine during most of the activities that happen during a diagnosis.

It is strongly suggested that helper classes follow the friendly naming convention: <Vendor>.<Application>.<Component>. For example, Microsoft Windows helper classes will use a prefix of microsoft.windows followed by the component feature name (for example, microsoft.windows.wireless).

NDF Helper Class API

INetDiagHelperInfo methods

Description

GetAttributeInfo

Retrieves the list key parameters needed by the helper class.

INetDiagHelper methods

Description

Cancel

Invoked by NDF to cancel an ongoing diagnosis or repair. Optional.

GetAttributes

Retrieves additional information about the problem that the helper class has diagnosed. Optional.

GetCacheTime

Called by NDF after diagnosis and repair to indicate how long NDF will retain the diagnosis/repair results in cache.

GetDiagnosticsInfo

Asks the helper instance for an estimate of how long the diagnosis might take and whether it requires impersonation of the original user context.

GetDownStreamHypotheses

Asks the helper class to generate hypotheses for possible causes of low health in the out-of-box components that it depends on.

GetHigherHypotheses

Asks the helper class to generate hypotheses for possible causes of high utilization in the local components that depend on it. Optional.

GetKeyAttributes

Retrieves the key attributes of the helper class.

GetLifeTime

Retrieves the lifetime of the helper class instance. Optional.

GetLowerHypotheses

Asks the helper class to generate hypotheses for possible causes of low health in the local components that it depends on.

GetRepairInfo

Retrieves the resolution information that the helper class has for a given problem type. Optional.

GetUpStreamHypotheses

Asks the helper class to generate hypotheses for possible causes of high utilization in the out-of-box components that depend on it. Optional.

HighUtilization

Performs the diagnosis to check whether the corresponding component is highly utilized. Optional.

Initialize

NDF calls this method to send the key parameters to the helper class to initialize its instance state.

LowHealth

Performs the diagnosis to check whether the corresponding component is in low health.

Repair

Asks the helper class to perform the specified repair. Optional.

SetLifeTime

Invoked by NDF to set the start and end time of a problem instance so that the helper class can limit its diagnosis to events within that time period. Optional.

Validate

Invoked by NDF after a repair is successfully completed in order to validate that a previously diagnosed problem is fixed. Optional.

For more information, see the NDF Reference in the Windows SDK.

Developing for the Windows Filtering Platform (WFP)

The new Windows Vista WFP architecture provides a standard infrastructure that enables third-party applications and components to integrate into the TCP/IP stack.

Architecture

The following illustration shows the WFP architecture, including access points for third-party applications, services, and drivers.

Bb757011.Top10_C11(en-us,MSDN.10).gif

The WFP architecture consists of the following components:

  • WFP API — a set of user-mode Win32 C-Level functions and structures that enable an application or component to plug into the TCP/IP stack to perform packet filtering and processing through the Base Filtering Engine.

  • Base Filtering Engine — the user-mode component of WFP that implements the filter requests made by filtering applications by plumbing filters into the Generic Filter Engine.

  • Generic Filter Engine — this kernel-mode component within the new TCP/IP protocol stack stores the filters created by filtering applications through the Base Filtering Engine, and interacts with the various layers of the new TCP/IP stack and the set of installed callout drivers. For example, as a packet is being processed up the new TCP/IP stack, each layer encountered contacts the Generic Filter Engine to see whether the packet is to be permitted or dropped. The Generic Filter Engine checks the configured filters and the installed callout modules to verify whether the packet is permitted or should be dropped.

  • Application Layer Enforcement (ALE) — this operates on Windows Sockets API events. It provides enforcement for WinSock communication by determining the application that generated a packet and asking the Generic Filter Engine to determine whether the communication should be filtered or blocked. ALE can also dynamically generate filters at other layers, based on the result of a classification. For example, it can generate dynamic IPSec filters to cover the sockets that the Ftp.exe application opens.

  • Callout modules — these kernel-mode components are supplied by third-party drivers. They are used when just checking the packet against filtering criteria to see whether the packet should be permitted or dropped is not enough. Callout modules are needed when deep inspection of packet contents or data modification needs to be performed. For example, anti-virus software must inspect application layer data to ensure that no viruses or worms are present in the incoming data stream.

The Filtering Platform performs its tasks by integrating three basic types of entities:

  • Shims are entities that make filtering decisions by classifying against a Filter Engine. Some examples of shims are: ALE, Transport Layer Module, Network Layer Module, and RPC Runtime.

  • Layers are objects managed by the Filter Engine to contain a set of filters. Each shim classifies against one or more layers. For example, the Transport Layer Module shim classifies against the Inbound Transport layer and the Outbound Transport layer. The FWPS_BUILTIN_LAYERS enumeration lists the types of layers, most of which reside in kernel mode.

  • Callouts are modules that get invoked by the Filter Engine when a corresponding callout filter is matched at a given layer.

As packets, streams, and events traverse the system, the shims call into the Filter Engine to evaluate them against the filters in a given layer. The Filter Engine can invoke one or more callout modules, as required by the evaluation. The shims do the actual dropping of packets, streams, and events, based on the result of the classification performed by the Filter Engine.

WFP API

There are two main ways that third-party ISVs can use the WFP architecture to build applications or services:

  • If only filtering functionality is required, then a user-mode application or service that uses the WFP APIs to set filters at the appropriate layers in the new TCP/IP stack is required. No kernel-mode callout drivers are needed.

  • For applications and services that perform deep packet content inspection or modification, a user-mode application or service, and one or more callout drivers are required. The user-mode application or service sets filters at the appropriate layers in the new TCP/IP stack, subject to further inspection by a specified callout driver. When incoming or outgoing traffic matches these filters, the Generic Filter Engine hands the packet to the callout driver, which performs inspection or modification before handing the packet back to the Generic Filter Engine.

The WFP Win32 API can be divided into 10 functional categories: two deal with the discovery of security tokens, and the rest with management of the interaction between third-party supplied components and the WFP platform. All the main entities in WFP have associated security credentials. Certain objects (for example, filters, callouts modules, providers, and provider contexts) can send change notifications to subscribed components.

WFP API category

Description

Internet Key Exchange (IKE)

Enables enumeration of IKE SA objects. IKE is the standard keying algorithm for IPSec. It provides computer-level authentication, some authorization, and end-to-end encryption through IPSec.

IPSec

Enables enumeration of available IKE SA objects.

Callout Management

Adds, deletes, enumerates, locates, and describes callout modules, and manages associated subscription and security information.

Filter Management

Adds, deletes, enumerates, and locates filters, and manages associated subscription and security information.

Layer Management

Adds, deletes, enumerates, and locates filters, and manages their sub-layers and security information.

Sub-layer Management

Adds, deletes, enumerates, locates, and describes sub-layers, and manages their security information.

Memory Management

Releases memory allocated by other functions.

Provider Management

Adds, deletes, enumerates, and locates providers, and manages associated subscriptions and security information.

Provider Context Management

Adds, deletes, enumerates, and locates provider contexts, and manages associated subscriptions and security information.

Transaction Management

Begins, aborts, and ends network session transactions.

Session Management

Opens, closes, and enumerates current network sessions.

For more information about the WFP API, see the Windows Filtering Platform API Functions section of the Windows SDK.

Best Practices for WFP

When contrasted with basic filtering approaches, solutions that deeply interact with the TCP/IP stack (for example, firewalls) will typically require much more effort to develop. A typical approach for developing a firewall might include:

  1. Develop callout modules to do specialized filtering. For example, a firewall ISV might develop an HTTP callout module to enforce parental controls.

  2. Design the firewall policies.

  3. Build a policy manager that will load the firewall policies from local or remote sources, convert those policies into the WFP API parameters, and call the WFP API.

  4. Build UI to configure the firewall.

  5. Build UI to monitor the status of the firewall. Because all firewall activity is maintained in a common platform, the status UI for one firewall can display filters and policies added to the system by other applications or firewalls.

  6. Build modules to access, process, and possibly transmit firewall events.