ISTG what happens when it Fails ?
I had an interesting investigation the other day with a customer of mine with reference to the role and failover of the ISTG role.
In the customers scenario they switch off the ISTG and several other servers in a very busy site to simulate a outage scenario. This site is a main hub site in a very large enterprise environment Forest and has in excess of 200 sites hanging off it. Many of these sites are empty many have just 1 or 2 Domain Controllers in there.
So what should happen ?
When a Server that holds the ISTG is unavailable for a period of time then there is an inbuilt automated process which the Servers within the site go through to re-allocate this role to another server. The checking for the existence of a ISTG will effectively happen every 15 minutes as part of the KCC process.
So in the example I am about to make up with have 10 Domain Controllers in one site all servicing the Same Domain.
DC1, DC2, DC3, DC4, DC5 DC6 DC7 DC8 DC9 DC10 - All are Windows 2003 SP1 and we are running at 2003 FFL (Forest Functional Level)
DC1 is the ISTG
DC3 is the Bridgehead server to a second Site
For failover test we isolate the network which means that
DC1 , DC3 , DC5 DC6 DC 9 are all isolated.
DC2 and DC4, DC5 and DC7 and DC10 are still on the live network and can see the second site (network wise), however we now have no correct replication links and the ISTG is also unavailable due to it being part of the group that have been isolated
So what is the process by which the ISTG is reallocated and how long should it take ?
1. Generate a list of all domain controllers in the same site, in ascending order based on GUID, and by evaluating the msDS-Behavior-Version attribute on the NTDS Settings object of each DC, determine which ones have a version number greater than or equal to .NET Interim Forest Mode
2. With this list it determines which election algorithm to use by evaluating if we are 2003 or 2000.. in the example above I am using we are all W2003 so it uses this alogrithm to ascertain we are at the correct Forest Level.
3. Another parameter will read the interSiteTopologyGenerator - this will give a result of in this example of DC1 which is no longer available as it has been isolated.
4. Another parameter will determine if the DC (DC1) is in the list of valid DCs from the previous step 1. In this case and in my example it is .
The last synchronisation time is checked and compared to the following parameter,
interSiteTopologyFailover
CN=NTDS Site Settings,CN=SITENAME,CN=Sites,CN=Configuration
Default -> “not set” = 120min (W2003)
Eventually the 120mins period will fail and then :
5. We need to change the ISTG as too much time has past since the last successful synchronisation and we need to change to a new one.(If the ISTG could not be determined from the NTDS Site Settings object, start at the beginning of the DC list created earlier in step 1. If the ISTG could be determined but was deemed invalid, start at that DCs position in the list. From either position, for each interval that has passed, skip one domain controller in the list. ).
6. The new ISTG is set ?
Where we have a potential weakness is the last statement in step 5
“If the ISTG could be determined but was deemed invalid, start at that DCs position in the list. From either position, for each interval that has passed, skip one domain controller in the list. ).”
The algorithm works fine with a maximum failover if left to the defaults of 2 hours (120 mins) to the next Domain Controller in the Guid List. However if there are several Domain Controllers switched off and they are sequenced in the GUID list created in Step 1 then the transfer of the ISTG role could potentially take a longer amount of time.
One approach would be to reduce this time to lower the value from its 120 mins default.
interSiteTopologyFailover
CN=NTDS Site Settings,CN=SITENAME,CN=Sites,CN=Configuration
Default -> “not set” = 120min (W2003)
Also considering that the remaining Domain Controllers will be very busy recreating KCC connection objects plus also because of the consequence of that re-creation carrying out VVJOINS, it would be worth investigating the use of the following ;
Redundant connection mode or Branch Office Mode in Active Directory (AD)?
Normally, only one replication object is created per namespace between sites, which achieves the most efficient replication. In situations in which branch offices all connect to a hub location, if a domain controller (DC) at the hub goes down, all the remote locations must recalculate replication objects. This results in a huge amount of changes, which when the DC is back, won't fail back.
This mode requires two steps: Step one is to enable the redundant connection mode to have two connection objects to the hub location; the second step is to disable detection of failed connection objects because you're assuming a failed DC will be coming back so no need to modify the connection objects. You need to run the commands on all remote locations that will require the redundant connections.
C:\>repadmin /siteoptions /site:London +IS_REDUNDANT_SERVER_TOPOLOGY_ENABLED
Branch10
Current Site Options: (none)
New Site Options: IS_REDUNDANT_SERVER_TOPOLOGY_ENABLED
C:\>repadmin /siteoptions /site:London +IS_TOPL_DETECT_STALE_DISABLED
Branch10
Current Site Options: IS_REDUNDANT_SERVER_TOPOLOGY_ENABLED
New Site Options: IS_TOPL_DETECT_STALE_DISABLED IS_REDUNDANT_SERVER_TOPOLOGY_ENABLED
Information on KCC Branch Office Mode
KCC Branch Office mode was created to provide an easily managed redundant
topology for branch office deployments . This mode reduces VV join load on FRS by maintaining a relatively static topology between hub and branch DC's. KCC Branch Office mode can be enabled on a per site basis after the Forest Functional Level has been raised to Windows Server 2003.Under this mode the KCC will build 2 redundant connections between a DC in a branch-site and 2 DCs in the hub site. KCC Branch Office connections are created on preferred bridgeheads if defined, otherwise a random dc will be selected.
Also, when these connections are made they are given staggered schedules. Once KCC creates these connections it treats them as though they were created manually and disables KCC failover as long as the metadata for the preferred Bridgehead or randomly selected server remains in AD. Gracefully demoting a DC using DCPROMO or removing its metadata with NTDSUTIL "remove selected server" will cause KCC to re-evaluate its redundancy requirements. A new DC will be considered in the redundant topology when promoted into the forest.
Comments
- Anonymous
September 07, 2014
The comment has been removed - Anonymous
July 02, 2015
The comment has been removed - Anonymous
July 11, 2015
https://www.linkedin.com/grp/post/6981021-6017330014037491716
https://www.rebelmouse.com/WatchHitmanAgent47Online/
https://www.linkedin.com/grp/post/8337129-6017937517842567170
https://www.facebook.com/WatchMaxOnline
https://www.facebook.com/WatchKitchenSinkOnline
https://www.linkedin.com/grp/post/8338032-6017252274668138499
https://www.linkedin.com/grp/post/8338032-6017247400261926914
https://www.facebook.com/WatchSinister2Online
https://www.facebook.com/WatchBlackMassOnline
https://www.facebook.com/WatchWarRoomOnline
https://www.facebook.com/WatchSouthpawOnline
https://www.linkedin.com/grp/post/8337129-6017941544210821122
https://www.facebook.com/WatchTed2OnlineNow
https://www.linkedin.com/grp/post/6973703-6017391677738598404
https://www.linkedin.com/grp/post/6971553-6011496519566376963
https://www.linkedin.com/grp/post/6975089-6015035803892207618
https://www.rebelmouse.com/WatchAntmanOnline/
https://www.rebelmouse.com/MazeRunner2TheScorchTrials/
https://www.facebook.com/WatchFantasticFourOnline
https://www.rebelmouse.com/WatchSelflessOnline/
https://www.facebook.com/WatchTerminatorGenisysOnline
https://www.facebook.com/WatchTheGallowsOnline
https://www.facebook.com/WatchMadMaxFuryRoadOnlineNow
https://www.linkedin.com/grp/post/8308755-6006709887994773505
https://www.facebook.com/WatchMagicMikeXXLOnline
https://www.rebelmouse.com/WatchTheVisitOnline/
https://www.linkedin.com/grp/post/8338032-6017243258827141124
https://www.linkedin.com/grp/post/6981021-6017294592175517699
https://www.rebelmouse.com/WatchTerminatorGenisysOnline/
https://www.linkedin.com/grp/post/8337129-6017928009376362496
https://www.facebook.com/WatchJurassicWorldOnlineNow
https://www.facebook.com/WatchHitmanAgent47Online
https://www.rebelmouse.com/WatchFantasticFourOnline/
https://www.linkedin.com/grp/post/6981021-6017300035971067904
https://www.linkedin.com/grp/post/6980115-6017740502923755520
https://www.rebelmouse.com/WatchMagicMikeXXLOnline/
https://www.facebook.com/WatchAmericanUltraOnline
https://www.facebook.com/WatchTheGiftOnline
https://www.rebelmouse.com/WatchRegressionOnline/
https://www.rebelmouse.com/WatchMastermindsOnline/
https://www.rebelmouse.com/WatchVacationOnline/
https://www.rebelmouse.com/WatchMaxOnline/
https://www.rebelmouse.com/WatchSinister2Online/
https://www.linkedin.com/grp/post/6975089-6015035619363807236
https://www.rebelmouse.com/WatchPaperTownsOnline/
https://www.linkedin.com/grp/post/6973703-6017386985365192706
https://www.rebelmouse.com/WatchTheManFromUncleOnline/
https://www.rebelmouse.com/WatchTed2Online/
https://www.facebook.com/WatchMissionImpossible5RogueNationOnline
https://www.rebelmouse.com/WatchTransporterRefueledOnline/ - Anonymous
September 10, 2015
Thanks for the best post in the world for sure and th eother parts of the reckoning and the watch of youtubehttps://www.youtube.com/watch?v=FMVdbiWghqk 2 wheel self balancing scooter when the self balancing scooter and the best affordable speakers for sure and we could see the only ones for surehttps://www.youtube.com/watch?v=zKX_XepLt9Q and we will know the under money clips for surehttp://powerstarvoice.com/cheap-leather-money-clips-and-credit-card-holder-gold-mens-womens/ money clips for sale could see the making for sure and this is a great site for me. Now thehttps://www.youtube.com/watch?v=tjAtm0NtYs0 we can see the partial and the good ones of the meaning of affordable watches.