The case of the blocked FRS replication
So I haven't written anything in quite a while and felt compelled to write something I recently ran into. A customer of mine is in the process of implementing Windows Server 2008 RODC in their branch offices. In order to do that, they are planning to in-place upgrade the W2K3 DCs to W2K8 at the hub sites.
The customer did an in-place upgrade of their W2K3 DC in their lab and found that while the upgrade worked, the DC didn't appear to be completely functional. Specifically FRS replication appeared to be broken. Event 13508 was logged by the W2K8 DC reporting that it couldn't replicate. Similarly W2K3 DCs reported the W2K8 to be unavailable. The customer even turned the W2K8 firewall off to no avail. I asked for FRSDiag content from the customer and the results I got back were partial. There was no connstat.txt or any other content from ntfrsutl. Instead I had RPC errors scattered through the FRSDiag data.
The customer in question has a very large global AD implementation and site to site communication is secured with firewalls in place. To ensure AD and FRS replication works through firewalls, the customer has configured static ports for AD and FRS replication. Firewall rules are also configured to ensure these ports are open and the replication traffic between DCs are allowed. https://support.microsoft.com/kb/224196/ and https://support.microsoft.com/kb/319553/ have details of how to lock these ports down.
Analysis of the FRSDiag data showed that FRS was trying to listen on port TCP 49153. This had been enforced through Group Policies. However, it couldn't as the port was in use. In this case, we used "portqry -n <server> -e 135" to query the RPC endpoint mapper. This showed that the Event Log TCPIP was using the port. In W2K8 and Vista we made change to the ephemeral port range to start from 49152 (instead of 1025) and end at 65535 (instead of 5000). So the event log in W2K8 starts early on and grabs this port (49153). In W2K3 it would have grabbed something early in the 1025-5000 range. So the customer didn't see an issue until the upgrade to W2K8.
Changing FRS to use a dynamic port (by deleting the reg key that enforced a static port) and restarting FRS fixed the issue temporarily. So we knew we had the cause figured out. The problem was the customer did not want to have to go and update firewall rules to allow the in-place upgrade of a W2K3 DC to W2K8 work. So he wanted to know whether we could have the Event Log use another port. From the research I've done, it does not appear to be possible to do this. However, in W2K8 you can move the ephemeral port range to start from another value instead of the default 49152. So I pointed the customer to the fine article at Ned's Askds blog (https://blogs.technet.com/askds/archive/2007/08/24/dynamic-client-ports-in-windows-server-2008-and-windows-vista-or-how-i-learned-to-stop-worrying-and-love-the-iana.aspx) and advised him to move the range to start from 49160 ( which freed 49152-49159 for customer requirements) and test!
FRSDiag is such a great tool. For more good details on its usage, head on to Ned's blog for some quality articles such as How to get the most from your FRSDiag…and tips.