Domain not available when trying to TS onto a Windows 2003 server.

Issue came in this week where when you attempted to logon to a server it would not authenticate your request and would give you a message indicating the "domain is not available".  If you tried logging on via your UPN, then it would give a slightly different error message indicating that "there is not enough storage to complete this operation".

After ruling out DNS and routing, I had the person run nltest /sc_query:BRADFOREST to see what DC it was pointing at and found that it did not have a secure channel to a DC which might be a reason we can't authenticate to the server. :) When we tried to reset the secure channel it would fail with error code 8 (ERROR_NOT_ENOUGH_MEMORY).) So we cranked up netlogon debug logging and then I repro'd the issue again.  We could then see this in the netlogon debug log:

08/14 22:55:06 [SESSION] BRADFOREST: NlSetServerClientSession: New DC is an NT 5 DC: \\brad-dc-01.bradforest.local
08/14 22:55:06 [SESSION] BRADFOREST: NlSetServerClientSession: New DC is in closest site: \\brad-dc-01.bradforest.local
08/14 22:55:06 [SESSION] BRADFOREST: NlSetServerClientSession: New DC runs the time service: \\brad-dc-01.bradforest.local
08/14 22:55:06 [SESSION] BRADFOREST: NlSetServerClientSession: New discovery flags: 0x1dc; Old flags: 0x0
08/14 22:55:06 [SESSION] BRADFOREST: NlDiscoverDc: Found DC \\brad-dc-01.bradforest.local
08/14 22:55:06 [SESSION] BRADFOREST: NlStartApiClientSession: Bind to server \\brad-dc-01.bradforest.local (TCP) 0 (Retry: 0).
08/14 22:55:06 [MAILSLOT] Going to wait on mailslot. (Timeout: 45000)
08/14 22:55:06 [CRITICAL] NlPrintRpcDebug: Dumping extended error for I_NetServerReqChallenge with 0xc0000017
08/14 22:55:06 [CRITICAL] [0] ProcessID is 780 <-------------------------LSASS.exe
08/14 22:55:06 [CRITICAL] [0] System Time is: 8/14/2007 21:55:6:372
08/14 22:55:06 [CRITICAL] [0] Generating component is 8
08/14 22:55:06 [CRITICAL] [0] Status is 14
08/14 22:55:06 [CRITICAL] [0] Detection location is 313
08/14 22:55:06 [CRITICAL] [0] Flags is 0
08/14 22:55:06 [CRITICAL] [0] NumberOfParameters is 0
08/14 22:55:06 [CRITICAL] [1] ProcessID is 780
08/14 22:55:06 [CRITICAL] [1] System Time is: 8/14/2007 21:55:6:372
08/14 22:55:06 [CRITICAL] [1] Generating component is 8
08/14 22:55:06 [CRITICAL] [1] Status is 10055
08/14 22:55:06 [CRITICAL] [1] Detection location is 311
08/14 22:55:06 [CRITICAL] [1] Flags is 0
08/14 22:55:06 [CRITICAL] [1] NumberOfParameters is 3
08/14 22:55:06 [CRITICAL] Long val: 1025
08/14 22:55:06 [CRITICAL] Pointer val: 0
08/14 22:55:06 [CRITICAL] Pointer val: 0
08/14 22:55:06 [CRITICAL] [2] ProcessID is 780
08/14 22:55:06 [CRITICAL] [2] System Time is: 8/14/2007 21:55:6:372
08/14 22:55:06 [CRITICAL] [2] Generating component is 8
08/14 22:55:06 [CRITICAL] [2] Status is 10055
08/14 22:55:06 [CRITICAL] [2] Detection location is 315
08/14 22:55:06 [CRITICAL] [2] Flags is 0
08/14 22:55:06 [CRITICAL] [2] NumberOfParameters is 0
08/14 22:55:06 [CRITICAL] BRADFOREST: NlSessionSetup: Session setup: cannot I_NetServerReqChallenge 0xc0000017
08/14 22:55:06 [MISC] Eventlog: 5719 (1) "BRADFOREST" 0xc0000017 c0000017 ....

 

Some interesting things to look at, first off what is 0xc0000017?  Well we can use err.exe to see what that translates to.

C:\Windows\system32>err 0xc0000017
# for hex 0xc0000017 / decimal -1073741801
STATUS_NO_MEMORY
# {Not Enough Quota}
# Not enough virtual memory or paging file quota is available
# to complete the specified operation.
 

Well that pretty much flies with what I was seeing when trying to logon via UPN.  We can also see two status codes being returned during the secure channel setup: 14 and 10055.

C:\Windows\system32>err /winerror.h 14
# winerror.h selected.
# for decimal 14 / hex 0xe
ERROR_OUTOFMEMORY
# Not enough storage is available to complete this operation. <-- This is what I was getting when trying to TS via UPN.

C:\Windows\system32>err /winerror.h 10055
# winerror.h selected.
# for decimal 10055 / hex 0x2747
WSAENOBUFS <--------------------HMMMMM?
# An operation on a socket could not be performed because the
# system lacked sufficient buffer space or because a queue
# was full.

So now that is interesting, so the next thing I did was do a netstat -s and looked at the statistics of ports and didn't see anything obvious and I then added the handles column in task manager and noticed that their custom application had 17,000 handles open.  Turns out that most of those handles were outgoing calls and used up all the ephemeral ports.  We had to set the MAXUSERPORT value in the registry to allow more ports to be used, once we did that everything returned to normal.

Ephemeral Ports

The number of user-accessible ephemeral ports that can be used to source outbound connections is configurable using the MaxUserPorts registry parameter. By default, when an application requests any socket from the system to use for an outbound call, a port between the values of 1024 and 5000 is supplied. The MaxUserPorts parameter can be used to set the value of the uppermost port that the administrator chooses to allow for outbound connections. For instance, setting this value to 10,000 (decimal) would make approximately 9000 user ports available for outbound connections.

Here is the KB article for the issue: https://support.microsoft.com/kb/196271

Here you can read about another setting called TCP TIME-WAIT delay which is how long the port hangs around before being terminated completely (4 minutes).  This can also cause issues with apps that perform many outbound connections in a short time may use up all available ports before the ports can be recycled.

Technorati tags: Windows 2003, Networking

Comments

  • Anonymous
    January 01, 2003
    Hope it was easy to find the answer when you ran into this issue.  If not, its on the intertubes now...

  • Anonymous
    August 27, 2007
    Had a similar thing happening on a busy web server connecting to a SQL Server. It used up all ports before they could be reused due to the TCP TIME-WAIT delay.

  • Anonymous
    October 31, 2007
    Hello use the registry fix on all your Mashines here it is Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlLsaKerberosParameters] "MaxTokenSize"=dword:0000ffff "MaxPacketSize"=dword:00000001

  • Anonymous
    February 28, 2008
    The comment has been removed

  • Anonymous
    June 10, 2008
    Thanks for this information. We had this issue on several Windows 2000 adv. Server (NetLogon Error: 5719). A third party service which opens network connections automatically blocked all available ports (appr. 18.000 handles)with the result that users could not log on to the domain.