How to remove a failed server from DFS in Windows Server 2003 R2
This week has been a little strained, hence quiet on the blogging front. Apart from a hectic week at work (more to follow on that shortly), the reason was a "disaster" which happened late last Sunday evening - everything was working at home one moment, and dead the next.
Since the move over from the UK, I'm still in temporary accomodation. To save space, although my servers were couriered over, I didn't bring a monitor as all the monitors I owned only ran on 240v. The servers arrived a little shaken, but not too stirred - a few cards were loose, but no failed disks. Just a little prodding into place and they came back perfectly. Humming away for 6 weeks or so without fault.
If you've ever tried to figure out why a machine won't boot without a monitor attached, I know where you're coming from. Short answer is, it's next to impossible. It also happened that this machine was not just any machine, but a Domain Controller. And not just any domain controller, the domain controller holding all the FSMO roles for my home domain. Of course, it will probably come as no surprise to you it's also running a further 5 virtual machines including my website hosting, ISA and Exchange. So yes, it was somewhat of a disaster.
On Monday morning, I took the machine into the office and with a monitor attached, it was obvious it was continually rebooting (off both plexes in the boot mirror) before the GUI portion of the boot came up. Safe mode, last known good gave same symptoms. Similarly, boot logging didn't help as the boot log doesn't get written to disk until the GUI part of the boot comes up.
I borrowed another disk from Ben, plugged it in and installed XP SP2 (only 32 bit OS immediately to hand). However, during the first boot, it blue-screened. Sure enough, there was a problem with the hardware - either motherboard or memory.
Running a memory tester showed something wrong with one or more of the (expensive!) ECC memory slots. I saw a big bill coming :(. It was a tedious process of elimination by swapping DIMMs around until the failed chip or chips was identified. That at least got to the point of XP booting. Attempting to boot back with the failed DIMM removed (actually a pair as the system needs matched pairs), same symptoms as before. At this point, going back to XP, I discovered XP didn't have drivers for the RAID SCSI Controller for the system boot disk and worse, none were available. Onto plan B for recovery.
I re-installed a Windows Server 2003 on the loan disk with the recovery console enabled to attempt to see what was going on. Chkdsk showed the SCSI disks being corrupt and the mirror needing repair. Fixing those still wasn't getting past the text mode part of the boot.
Not being one to give up, I took the machine home on Monday night. During the day, my wife had bought a second hand 17" monitor for $20.00 - given it's in next to perfect condition, I thought that was pretty good value.
From the recovery console of Windows Server installed on the loan disk, I spent two very long and tedious evenings going through disabling drivers one-by-one in the hope I'd find the driver failing to load - every time the same 0x0000007b with 0xc000007b in the parameter list - inaccessible_boot_disk.
Well, two days later I did give up. In some ways I'm glad I did - when I took the decision to blow away the machine for real, I discovered the disks were also corrupt in some way - both of them. Blue screens on reinstall. Possibly the RAID controller? Nope, tried a spare one too :( Anyway, I've more disks on order and more memory on order - at least they're much cheaper in the US than in the UK.
In the meantime, with reduced RAM, on the loan disk I at least got the ISA server and the Exchange server back running. Cleaning up AD to seize the FSMO roles which were held by the previous installation is easy enough (https://support.microsoft.com/?id=255504). They're now safely on a Virtual domain controller running on another server.
However, there was one interesting side effect relating to DFS in Windows Server 2003 R2. Yes, the machine also was a file server replicating to another server using RDC using domain based DFS. Some of the DFS roots had the now decommissioned server as the preferred target. What this unfortunately means is that when you go into the DFS console from another machine (either another server or from an XP machine with the console installed), when examining the DFS Root, you get the error below: \domain.comshare: The namespace cannot be queried. The RPC server is unavailable.
This only happens on roots which were configured to have the failed server as the preferred target. Clients were still OK accessing the still working server as they failed over automatically
So, from the File Server Management Console, you're stuck - you can't remove the failed server. However, you can use the command line utility, dfsutil to forceably remove it.
First, run dfsutil /root:\domain.comshare /export:share.txt
Share.txt will look something like
<?xml version="1.0"?>
<Root Name="\DOMAINShare" State="1" Timeout="300" >
<Target Server="FAILEDSERVER" Folder="Share" State="2"/>
<Target Server="GOODSERVER" Folder="Share" State="2"/>
</Root>
To delete the failedserver, and remember this is a last ditch thing, run (on one line)
dfsutil /unmapftroot /root:\domainshare
/server:failedserver /share:share
You're now close. To make this work, you must have access to the share on a good server. You must also bounce (at least I had to) the DFS Replication service on the good server AND restart the File Server Management Console. However, once done, everything will be good again. Just need to re-introduce the new server once the new disks arrive.
So now you know one reason why it's been a quiet week of blogging!
Cheers,
John.
Comments
Anonymous
May 22, 2006
John, sorry to hear your dilemma. I think in the future you ought to have WinPE on hand. I actually use BartPE; much better. I have the Windows 2003 I386 / BartPE source files staged on a machine and when I need to create a custom Windows 2003 boot disk (WinPE) with certain drivers, I just add what I need, create CD, and boot the troubled machine. This saves ton of time versus installing a parallel OS install. I actually create / integrated a custom BartPE CD with over 30 RAID / SATA / NIC drivers for this type of reason. You may also include your tools for diagnosis in this CD - instead of hours or even days, troubleshooting - it would have been just minutes. By the way, its also great for RIS tools and VS migration (P2V). Just a thought...good luck.Anonymous
June 14, 2006
Wow sounds like a bit of a disaster there. Did you have backups, or did you lose everything that was on that machine?
And I agree, BartPE would have been useful.Anonymous
May 06, 2008
The comment has been removedAnonymous
July 17, 2008
The comment has been removedAnonymous
January 17, 2009
I had this problem -- I retired the domain controller that was hosting my DFS namespaces, and didn't move them across. In the end, I had to use ADSI Edit to delete the namespaces (under ..., CN=System, CN=Dfs-Configuration). Then I had to flush the caches, using dfsutil cache domain flush dfsutil cache referral flush ...then I could recreate the namespaces.Anonymous
March 29, 2010
Thanks for sharing. I was unsuccesfully trying to remove a root target DC after it was demoted without first removing the DC from the root targets. Found out by reading your post that the /share parameter of dfsutil /unmapftroot should be the name of the share and not \servernameshare as it is listed in the Help and Support Center and the TechNet website and other places.Anonymous
August 18, 2010
I too have had a simular problem, in trying your solution I receive the following: dfsutil /UnmapFtRoot /Root:\domain.localnamespace /Server:deadserver /Share:(Share to be removed) reports system error 1169 "There was no match for the specified key in the index" Anyone else seen this error, I can't find any inforamation on it. Thanks