Udostępnij za pośrednictwem


Managing RID Pool Depletion

Hiya folks, Ned here again. When interviewing a potential support engineer at Microsoft, we usually start with a softball question like “what are the five FSMO roles?” Everyone nails that. Then we ask what each role does. Their face scrunches a bit and they get less assured. “The RID Master… hands out RIDs.” Ok, what are RIDs? “Ehh…”

That’s trouble, and not just for the interview. Poor understanding of the RID Master prevents you from adding new users, computers, and groups, which can disrupt your business. Uncontrolled RID creation forces you to abandon your domain, which will cost you serious money.

Today, I discuss how to protect your company from uncontrolled RID pool depletion and keep your domain bustling for decades to come.

Background

Relative Identifiers (RID) are the incremental portion of a domain Security Identifier (SID). For instance:

S-1-5-21-1004336348-1177238915-682003330-2100

==>

S-1-5-Domain Identifier-Relative Identifier

A SID represents a unique trustee, also known as a "security principal" – typically users, groups, and computers – that Windows uses for access control. Without a matching SID in an access control list, you cannot access a resource or prove your identity. It’s the lynchpin.

Every domain has a RID Master: a domain controller that hands each DC a pool of 500 RIDs at a time. A domain contains a single RID pool which generates roughly one billion SIDs (because of a 30-bit length, it’s 230 or 1,073,741,823 RIDs). Once issued, RIDs are never reused. You can’t reclaim RIDs after you delete security principals either, as that would lead to unintended access to resources that contained previously issued SIDs.

Anytime you create a writable DC, it gets 500 new RIDs from the RID Master. Meaning, if you promote 10 domain controllers, you’ve issued 5000 new RIDs. If 8 of those DCs are demoted, then promoted back up, you have now issued 9000 RIDs. If you restore a system state backup onto one of those DCs, you’ve issued 9500 RIDs. The balance of any existing RIDs issued to a DC is never saved – once issued they’re gone forever, even if they aren’t used to create any users. A DC requests more RIDs when it gets low, not just when it is out, so when it grabs another 500 that becomes part of its "standby" pool. When the current pool is empty, the DC switches to the standby pool. Repeat until doomsday.

Adding more trustees means issuing more blocks of RIDs. When you’ve issued the one billion RIDs, that’s it – your domain cannot create users, groups, computers, or trusts. Your RID Master logs event 16644The maximum domain account identifier value has been reached. ” Time for a support case.

You’re now saying something like, “One billion RIDs? Pffft. I only have a thousand users and we only add fifty a year. My domain is safe.” Maybe. Consider all the normal ways you “issue” RIDs:

  • Creating users, computers, and groups (both Security and email Distribution) as part of normal business operations.
  • The promotion of new DCs.
  • DCs gracefully demoted costs the remaining RID pool.
  • System state restore on a DC invalidates the local RID pool.
  • Active Directory domains upgraded from NT 4.0 inherit all the RIDs from that old environment.
  • Seizing the RID Master FSMO role to another server

Now study the abnormal ways RIDs are wasted:

  • Provisioning systems or admin scripts that accidentally bulk create users, groups, and computers.
  • Attempting to create enabled users that do not meet password requirements
  • DCs turned off longer than tombstone lifetime.
  • DC metadata cleaned.
  • Forest recovery.
  • The InvalidateRidPool operation.
  • Increasing the RID Block Size registry value.

The normal operations are out of your control and unlikely to cause problems even in the biggest environments. For example, even though Microsoft’s Redmond AD dates to 1999 and holds the vast majority of our resources, it has only consumed ~8 million RIDs - that's 0.7%. In contrast, some of the abnormal operations can lead to squandered RIDs or even deplete the pool altogether, forcing you to migrate to a new domain or recover your forest. We’ll talk more about them later; regardless of how you are using RIDs, the key to avoiding a problem is observation.

Monitoring

You now have a new job, IT professional: monitoring your RID usage and ensuring it stays within expected patterns. KB305475 describes the attributes for both the RID Master and the individual DCs. I recommend giving it a read, as the data storage requires conversion for human consumption.

Monitoring the RID Master in each domain is adequate and we offer a simple command-line tool I’ve discussed beforeDCDIAG.EXE. Part of Windows Server 2008+ or a free download for 2003, it has a simple test that shows the translated number of allocated RIDs called rIDAvailablePool:

Dcdiag.exe /test:ridmanager /v

For example, my RID Master has issued 3100 RIDs to my DCs and itself:

clip_image001 image

If you just want the good bit, perhaps for batching:

Dcdiag.exe /TEST:RidManager /v | find /i "Available RID Pool for the Domain"

For PowerShell, here is a slightly modified version of Brad Rutkowski's original sample function. It converts the high and low parts of riDAvailablePool into readable values:

function Get-RIDsRemaining

{

param ($domainDN)

$de = [ADSI]"LDAP://CN=RID Manager$,CN=System,$domainDN"

$return = new-object system.DirectoryServices.DirectorySearcher($de)

$property= ($return.FindOne()).properties.ridavailablepool

[int32]$totalSIDS = $($property) / ([math]::Pow(2,32))

[int64]$temp64val = $totalSIDS * ([math]::Pow(2,32))

[int32]$currentRIDPoolCount = $($property) - $temp64val

$ridsremaining = $totalSIDS - $currentRIDPoolCount

Write-Host "RIDs issued: $currentRIDPoolCount"

Write-Host "RIDs remaining: $ridsremaining"

}

image

Another sample, if you want to use the Active Directory PowerShell module and target the RID Master directly:

function Get-RIDsremainingAdPsh

{

param ($domainDN)

$property = get-adobject "cn=rid manager$,cn=system,$domainDN" -property ridavailablepool -server ((Get-ADDomain $domaindn).RidMaster)

$rid = $property.ridavailablepool

[int32]$totalSIDS = $($rid) / ([math]::Pow(2,32))

[int64]$temp64val = $totalSIDS * ([math]::Pow(2,32))

[int32]$currentRIDPoolCount = $($rid) - $temp64val

$ridsremaining = $totalSIDS - $currentRIDPoolCount

Write-Host "RIDs issued: $currentRIDPoolCount"

Write-Host "RIDs remaining: $ridsremaining"

}

image

Turn one of those PowerShell samples into a script that runs as a scheduled task that updates a log every morning and alerts you to review it. You can also use LDP.EXE to convert the RID pool values manually every day, if you are an insane person.

You should also consider monitoring the RID Block Size, as any increase exhausts your global RID pool faster. Object Access Auditing can help here. There are legitimate reasons to increase this value on certain DCs. For example, if you are the US Marine Corps and your DCs are in a warzone where they may not be able to talk to the RID Master for weeks. Be smart about picking values - you are unlikely to need five million RIDs before talking to the master again; when the DC comes home, lower the value back to default.

The critical review points are:

  1. You don’t see an unexpected rise in RID issuance.
  2. You aren’t close to running out of RIDs.

Let’s explore what might be consuming RIDs unexpectedly.

Diagnosis

If you see a large increase in RID allocation, the first step is finding what was created and when. As always, my examples are PowerShell. You can find plenty of others using VBS, free tools, and whatnot on the Internet.

You need to return all users, computers, and groups in the domain – even if deleted. You need the SAM account name, creation date, SID, and USN of each trustee. There are going to be a lot of these, so filter the returned properties to save time and export to a CSV file for sorting and filtering in Excel. Here’s a sample (it’s one wrapped line):

Get-ADObject -Filter 'objectclass -eq "user" -or objectclass -eq "computer" -or objectclass -eq "group"' -properties objectclass,samaccountname,whencreated,objectsid,uSNCreated -includeDeletedObjects | select-object objectclass,samaccountname,whencreated,objectsid,uSNCreated | Export-CSV riduse.csv -NoTypeInformation -Encoding UTF8

Here I ran the command, then opened in Excel and sorted by newest to oldest:

image
Errrp, looks like another episode of “scripts gone wild”…

Now it’s noodle time:

  • Does the user count match actual + previous user counts (or at least in the ballpark)?
  • Are there sudden, massive blocks of object creation?
  • Is someone creating and deleting objects constantly – or was it just once and you need to examine your audit logs to see who isn’t admitting it?
  • Has your user provisioning system gone berserk (or run by someone who needs… coaching)?
  • Have you changed your password policy and are now trying to create enabled users that do not meet password requirements (this uses up a RID during each failed creation attempt).
  • Do you use a VDI system that constantly creates and deletes computer accounts when provisioning virtual machines - we’ve seen those too: in one case, a third party solution was burning 4 million computer RIDs a month.

If the RID allocations are growing massively, but you don’t see a subsequent increase in new trustees, it’s likely someone increased RID Block Size inappropriately. Perhaps they set hexadecimal rather than decimal values – instead of the intended 15,000 RIDs per allocation, for example, you’d end up with 86,016!

It may also be useful to know where the updates are coming from. Examine each DC’s RidAllocationPool for increases to see if something is running on - or pointed at – a specific domain controller.

Recovery

You know there’s a problem. The next step is to stop things getting worse (as you have no way to undo the damage without recovering the entire forest).

If you identified the cause of the RID exhaustion, stop it immediately; your domain’s health is more important. If that system continues in high enough volume, it’s going to force you to abandon your domain.

If you can’t find the cause and you are anywhere near the end of your one billion RIDs, get a system state backup on the RID Master immediately. Then transfer the RID Master role to a non-essential DC that you shut down to prevent further issuance. The allocated RID pools on your DCs will run out, but that stops further damage. This gives you breathing space to find the bad guy. The downside is that legitimate trustee creation stops also. If you don’t already have a Forest Recovery process in place, you had better get one going . If you cannot figure out what's happening, open a support case with us immediately.

No matter what, you cannot let the RID pool run out. If you see:

  • SAM Event 16644
  • riDAvailablePool is “4611686015206162431
  • DCDIAG “Available RID Pool for the Domain is 1073741823 of 1073741823

... it is too late. Like having a smoke detector that only goes off when the house has burned down. Now you cannot create a trust for a migration to another domain . If you reach that stage, open a support case with us immediately. This is one of those “your job depends on it” issues, so don’t try to be a lone gunfighter.

Many thanks to Arren “cowboy killer” Connor for his tireless efforts and excellent internal docs around this scenario.

Finally, a tip: know all the FSMO roles before you interview with us. If you really want to impress, know that the PDC Emulator does more than just “emulate a PDC”. Oy vey.

 

UPDATE 11/14/2011:

Our seeds to improve the RID Master have begun growing and here's the first ripe fruit - https://support.microsoft.com/kb/2618669

 

 

Until next time.

Ned “you can’t get RID of me that easily” Pyle

Comments

  • Anonymous
    September 12, 2011
    I wrote a few related blogs recently that may be of interest to the readers: AD Internals: Display RID Allocation Pools (www.remkoweijnen.nl/.../ad-internals-display-rid-allocation-pools) and AD Internals: Reset RID Allocation Pool (www.peppercrew.nl/.../ad-internals-reset-rid-allocation-pool)

  • Anonymous
    September 12, 2011
    The comment has been removed

  • Anonymous
    September 12, 2011
    How weird - thanks for catching that. Not sure what happened there, it's just a copy and paste from my Word doc... :(

  • Anonymous
    September 12, 2011
    Well, I reviewed my draft and at some point I managed to remove those asterisks (after testing carefully and getting the screenshot, no less!). So not Word's fault, purely my own.  All fixed. Thanks again :)

  • Anonymous
    September 12, 2011
    How about a Microsoft promotional give away for the most and least RIDs issued in a production domain?  :)  We wouldn't even come close to either end, but now I'm curious what the average is for some reason.

  • Anonymous
    September 12, 2011
    The comment has been removed

  • Anonymous
    September 12, 2011
    The comment has been removed

  • Anonymous
    September 12, 2011
    ...a bunch of wet suits? ...sharks swimming with sharks? Ok, yes I know the "real" answer too..  :)

  • Anonymous
    September 12, 2011
    On a more serious note, I would suggest the following modification to both scripts (or something similar) param ($domainDN=([ADSI]"").distinguishedName) That way it the parameter would default to the DN of the domain the machine is currently a member of.  That is assuming the system you are running the script on is a member of the necessary domain.  Otherwise the param can still be passed as needed.   Otherwise if I didn't say it earlier, another great post!

  • Anonymous
    September 12, 2011
    Phew, 1073594723 remaining.  wipes sweat from brow I meant to comment this morning that the * was missing but I never hit post.  Glad to see the update. Microsoft promotions are great...sure, sometimes there are forms to complete (even on our end!) but usually they're worth it... ;-)

  • Anonymous
    September 12, 2011
    Get-RIDsRemaining ([adsi]"").distinguishedName OR function Get-RIDsRemaining{   Param(         $domainDN=([adsi]"").distinguishedName     )( .....

  • Anonymous
    September 12, 2011
    Ok. after reviewing all of the comments all I have to say i t- gret bolg and good informantion. very usable.

  • Anonymous
    September 13, 2011
    ...or, in the case of the AD PowerShell Module     (Get-ADDomain).DistinguishedName Ahh... PowerShell, thank you for so many options!  :D

  • Anonymous
    September 13, 2011
    Awesome stuff guys. I need to do more PowerShell posts, they bring out the commenters and the techniques.

  • Anonymous
    September 13, 2011
    I was talking to a customer last week about this on an ADRAP and thought....hmm I should write a post about that later. Then you destroyed me. Though I was going to name it "RID Pool Depletion, The Silent Domain Killer...Wait What?" Back to XPerf for me.

  • Anonymous
    September 14, 2011
    You suffered blog pool depletion. Maybe System Center 2012 has a monitor for that?

  • Anonymous
    September 16, 2011
    That's curious. in the past I had the chance to change the 500-block of assigned RIDs via registry key, because of a problem on my customer... so, you can actually choose how many RIDs are assigned to each DC. Now that I read this article, since assigning 500 rids block could be risk after ten years of AD working in large enterprises, I don't understand why the assigned pool is still made of 500 RIDs. Why did MSFT not change to 50 RIDs for example? I am pretty sure that there would not be performance or network traffic issues, reducing the pool size of one tenth could actually make the difference in terms of waste (I just think about how many DCs I decomissioned on enterprises).

  • Anonymous
    September 17, 2011
    500 per DC by default is not so bad. Let's say you were issuing 500 RIDs each day on each DC, and you had a 100 DCs. That is 50,000 new users/groups/computers you created every day, or 18,250,000 security principals a year. Which would make you by far the largest AD domain in the world. You still would not run out for 58 years. :) The main idea is not to protect the network from traffic - as you're right, the actual data sent is miniscule - but instead to allow a DC some autonomy when there is no network. If you are a Marine comm platoon in Afghanistan with a domain controller in your firebase, it may not be able to talk to other DCs in the domain for a few weeks at a time. So to create new groups or add laptops to the domain in that location, you'll need a RID pool that works offline for awhile.

  • Anonymous
    October 10, 2011
    Why not put in a "fix" in a service pack or something which stops issuing when there is x% left, ie by default 10%.  Then if a customer DOES get into that situation, they are offline only temporarily, they can identify the problem, stop whatever is automatically generating new ids, lower the block to 5% or something, and then have time to migrate to a new domain structure? If an exchange disk runs out of space, the store dismounts, but once you do something to alleviate the issue, you can be up and running again.  Say its on a SAN, add 50 GB of space and it'll give you a month to figure out where to relocate the storage, break up the database mailboxes, etc.

  • Anonymous
    October 10, 2011
    I'd love to talk about future plans, but cannot. Rest assured that they exist.

  • Anonymous
    December 05, 2011
    The comment has been removed

  • Anonymous
    December 05, 2011
    We need customers to open a support case and ask for it. So far, only R2 customers have asked. So the real question is "why haven't YOU made us make one yet, Brian?" :)

  • Anonymous
    December 07, 2011
    Just one question regarding the hotfix 2618669: does it produce any kind of warning or an event to monitor? Or does it just fix a known issue but not prevent anyone from wasting RIDs through some stupid action? Thanks Chris

  • Anonymous
    December 07, 2011
    Not this one, no. It's just a bug fix for Null RID set references to prevent accidentally gobbling up RIDs unnecessarily. DCDIAG will tell you if you have the issue though, so periodically checking with that would give you the info you need. It will show: Starting test: RidManager Warning: attribute rIdSetReferences missing from CN=name,OU=Domain Controllers,DC=name,DC=name,DC=name,DC=name Could not get Rid set Reference :failed with 8481: The search failed to retrieve attributes from the database. If you are running DCDIAG every so often to track RID consumption, you'd know of the issue.

  • Anonymous
    May 11, 2014
    Pingback from The RID Master FSMO Role | Sean Blake dot CA

  • Anonymous
    May 19, 2014
    Pingback from The RID Master FSMO Role | Sean Blake dot CA Sean Blake dot CA

  • Anonymous
    February 08, 2015
    Managing RID Pool Depletion - Ask the Directory Services Team - Site Home - TechNet Blogs

  • Anonymous
    February 12, 2015
    Managing RID Pool Depletion - Ask the Directory Services Team - Site Home - TechNet Blogs

  • Anonymous
    May 27, 2015
    The comment has been removed