Group Policy Debug Troubleshooting: A Real World Example
Hi My name is Lakshman Hariharan and I work for Microsoft as a Premier Field Engineer supporting Active Directory.
One of the things I love about what we do on a daily basis is the fact that we get to work on some of the strangest issues. This is also one of the things that can be the most frustrating.
I recently worked an interesting issue. The issue as described to me was that on all Windows Server 2008 R2 domain controllers Group Policy updates have been happening every two minutes, on the dot. The default Group Policy update interval on domain controllers is five minutes. We found that the Group Policy operational log on all domain controllers contained the following event pictured in the screenshot below.
A Group Policy Object (GPO) was created as a stop gap mitigation strategy until we could find root cause for the Group Policy update spam. We configured the GPO to only update policy every 90 minutes and applied it to the domain controllers Organizational Unit (OU). But the Group Policy update still continued to happen every two minutes.
While this may sound like a corner case… which admittedly it is, the goal is to use this issue as an example to demonstrate how to start tracing and troubleshooting things such as seemingly random Group Policy updates being initiated from out of nowhere. Additionally, we can leverage this case study as an initiation on how to decipher Group Policy Service (gpsvc) debug logging.
Here is a screenshot of the event:
We the Resultant Set of Policy (RSOP) tools using a couple of different methods such as Group Policy Management Console (GPMC) and RSOP.msc using the Microsoft Management Console (MMC) to see what policies are being applied to the domain controllers and ensuring that there were no other policies or settings superseding or otherwise trumping the default settings for domain controllers to refresh policy.
Having ruled out any other conflicting settings to further troubleshoot the issue we first needed to find out what the PID for the Group Policy service (gpsvc) was.
One interesting item to note here is that the gpsvc service is hosted under a particular instance of svchost. There are several ways to find out what specific services are hosted under a svchost instance (Resource Monitor being another way) but the easiest way is to run the following command from a command line
tasklist /svc
The output will look similar to this, and we can see that the svchost instance with a Process ID (PID) of 860 is the one that hosts the gpsvc service.
Armed with this knowledge of the gpsvc PID we went about what we knew best to do when trying to find clues on what may be initiating a certain process or behavior on a machine. And that was to run Process Monitor (or ProcMon for short). So we monitored the PID of 860 for a few minutes.
Even though we could see clearly that every two minutes Group Policy update was being initiated and a detailed view using another tool called Process Explorer we could see that it was indeed processing Group Policy, (initiating an ldap connection to itself, enumerating the policies, pulling down settings etc.) we couldn’t find the caller Process Name or PID calling in to PID 860.
We also ran a network trace on the domain controller but since the connection was from the domain controller to itself we couldn’t necessarily see anything in the trace. Even if the network trace contained useful data, it would have been akin to finding the proverbial needle in a haystack.
The solution to our problem lay in finding the caller PID. And none of these tools were going to help us do that. After posting a question to an internal Microsoft distribution list that discusses Group Policy issues I was given the suggestion of turning on gpsvc debug logging, the steps for which are documented here.
For folks that are familiar with having to troubleshoot Group Policy issues from the Windows XP/Windows Server 2003 days, the log file generated by this logging technique should be very familiar to the User Environment Debug (UserEnv) log. Several excellent blogs have been written on deciphering UserEnv logs which I will refrain from referencing here, in the interest of brevity, but one specific point of finding what process is doing what at a particular instance is very useful.
We enabled gpsvc debug logging and waited the requisite two minutes at which point, sure enough, we saw the event ID 8004 being logged in the Group Policy operational log.
Below is the snippet from the Group Policy debug log. Notice the timestamps 10:17 a.m. and 10:19 a.m. in the log, highlighted in green. This is the two minute gap after which the Group Policy update happens.
Also notice the numbers right after “GPSVC”, highlighted in red. These are the process IDs, in hexadecimal format of the processes active during the timeframe, and as geek trivia, the numbers in hex after the period (2b98 and ca8 in this case) are the thread IDs, or the worker bees if you will, doing the actual work within those processes. Again, this should be quite familiar to anyone that has read UserEnv logs in the past.
GPSVC(35c.2b98) 10:17:44:850 ConnectToNameSpace: ConnectServer returned 0x0
GPSVC(35c.2b98) 10:17:44:850 ProcessGroupPolicyCompletedExInternal:
GPSVC(35c.2b98) 10:17:44:850 ProcessGroupPolicyCompletedExInternal: Finished processing extension <Audit Policy Configuration> at 634181894 ticks (ms)
GPSVC(35c.2b98) 10:17:44:850 ProcessGroupPolicyCompletedExInternal: Leaving GPSVC(1694.ca8) 10:19:43:311 GetGPOList: Entering.
GPSVC(1694.ca8) 10:19:43:311 GetGPOList: hToken = 0x310
GPSVC(1694.ca8) 10:19:43:311 GetGPOList: lpName = <NULL>
GPSVC(1694.ca8) 10:19:43:311 GetGPOList: lpHostName = <NULL>
GPSVC(1694.ca8) 10:19:43:311 GetGPOList: dwFlags = 0x1
GPSVC(35c.b10) 10:19:43:311 GetGroupPolicyObjectListInternal: Queried lpDNName = <CN=CONTOSODC1,OU=Domain Controllers,DC=contoso,DC=com>
GPSVC(1f8.b10) 10:19:43:311 GetLdapHandle: Getting ldap handle for host: contoso.com in domain: <Unspecified>.
GPSVC(35c.b10) 10:19:43:311 GetLdapHandle: Server connection established.
GPSVC(35c.b10) 10:19:43:311 GetLdapHandle: Binding using only kerberos.
GPSVC(35c.b10) 10:19:43:311 GetLdapHandle: Bound successfully.
From our previous tasklist /svc output we know that process 35c (860 in decimal) is the svchost instance running gpsvc.
From the log it is clear that right before the process with PID of 860 is called a process with PID 5780 (1694 in hexadecimal) is called.
Running tasklist /svc again we see that the process with PID of 5780 is PolicyAgent (screenshot below).
For those unfamiliar, PolicyAgent is the IPSec Policy Agent service. This was our caller process and corresponding PID.
Armed with this knowledge we ran another RSOP report against one of the domain controllers, and lo and behold we found out that there was a legacy* IPSec policy that was assigned to the domain controllers. Screenshot below of the legacy IPSec policy.
Investigating further regarding this policy I was informed that right around when this issue of Group Policy update every two minutes cropped up was when some changes were made to an old IPSec policy that was in place years ago per a directive from another business unit.
When the changes were made, the intent was to delete all the settings within the policy but just one of the policies ended up inadvertently being active. Upon further investigation we found out that because of invalid settings the policy was constantly failing and was triggering Group Policy update every two minutes.
Once we disabled the offending policy the Group Policy updated every two minutes stopped and life was back to normal.
*As an aside, the settings under Windows Settings à IP Security Policies are considered legacy policies. They are deprecated and should not be used anymore to configure IPsec settings. The recommended and supported way to configure IPsec settings on up level (read 2008/Vista and above) is to use Windows Firewall with Advanced Security (WFAS)
-Lakshman Hariharan