Troubleshooting AD with Network Monitoring tools
In general, if you have an AD-related issue the following logs are useful:
- Event logs from the affected machine(s)
- Component-specific debug logs from the affected machine(s) (Netlogon logs, Userenv logs, IIS logs, etc.)
- Network traces taken while the problem is happening
- Procmon traces that show file activity on the affected machine(s) covering the same time as the Network trace
In this blog entry I want to focus on #3; how to gather and analyze a useful network trace.
A trace by itself can be useful - for a trace to be REALLY useful however, you need to make sure you're:
- capturing network traffic on the NIC where the problem is occurring and WHEN it is occurring (this is harder than it sounds)
- have allocated a sufficiently large capture buffer so the frames containing the trace don't get overwritten
- are tracing simultaneously from both endpoints where the problem occurs
- noting down what you're doing in the trace and which error messages you're seeing.
A solitary trace without any description of what's happening in it is like a box of chocolates - "you never know what you're gonna get” :-)
A trace taken from both ends of the conversation AND where you also have or collect event logs and the component-specific logs for the problem you're troubleshooting are worth its weight in gold however (how much does a megabyte weigh anyway?).
At any rate....once you have a usable trace - you can start filtering and drilling down to the specifics like specific protocols or ports.
The most useful filters to put in from the AD perspective are:
- dcerpc (RPC traffic)
- kerberos (Kerberos ticket requests and other traffic containing Kerberos information)
- ldap or cldap (LDAP searches and writes over TCP or UDP)
- dns (not much goes on in AD without a preceeding DNS query - make sure you flush the DNS cache of the client before starting though)
- smb (Group Policy being applied from Sysvol on a DC for example)
Other things to look for in the network traces are:
- Retransmissions of packets (if you have a trace from both sides you should see whether the packet is reaching the other side or is being eaten by the firewall in-between)
- Packets leaving one end but never arriving at the other end
- Excessive Resets of TCP connections
- Excessive traffic coming from specific clients
At this point, you really need to have a good idea of what the component you're troubleshooting is doing. With that in place you effectively have a triangulating device to zoom in on the problem, i.e. “What's happening on the wire” (the network traces)+ “What's happening on the machine” (the component logs/event logs/procmon logs)+ ”What should be happening” (your knowledge of how the component should behave).
With that in place – the majority of issues should be solvable with time, patience and good old troubleshooting intuition (“troubleshooting with your fingertips”).
Network Monitor Team blog:
http://blogs.technet.com/netmon/
Intro to filtering with Network Monitor 3.0
http://blogs.technet.com/netmon/archive/2006/10/17/into-to-filtering-with-network-monitor-3-0.aspx
Capturing network traffic in Windows 7 with NetSH
http://blogs.technet.com/mrsnrub/archive/2009/09/10/capturing-network-traffic-in-windows-7-server-2008-r2.aspx
Wireshark Network Protocol Analyzer
http://www.wireshark.org/
Troubleshooting Replication
http://technet.microsoft.com/en-us/library/cc755349(WS.10).aspx
Troubleshooting IEEE 802.11 Wireless Access with Microsoft Windows
http://technet.microsoft.com/en-us/library/bb457017.aspx
Troubleshooting the “RPC server is unavailable” error
http://blogs.technet.com/abizerh/archive/2009/06/11/troubleshooting-rpc-server-is-unavailable-error-reported-in-failing-ad-replication-scenario.aspx