Getting A Handle on Server Network Logon Statistics

Windows servers can run into situations where it may be mighty handy to get a better understanding of what computers are connecting to a server for services.  What does “connecting to a server for services” mean?

Consider the Exchange server scenario. For this scenario let’s postulate that the client is a Windows Outlook client connecting to get email. To do that action the client does an RPC connection to the Exchange server. This RPC session is authenticated (hopefully using Kerberos though there are scenarios where NTLM is used) and does a Network Logon.

A network logon is simply a non-interactive logon to a computer from a remote client. Logon types are classified and tracked as Interactive, Network, RemoteInteractive and a few others. More information on auditing and logon type descriptions can be found in this TechNet article.

Auditing can give you a 1:1 idea of which users a logging onto a server and from where, but collating that into a more holistic view is more difficult using in box tools. If you need to track specific network logons over time-and if you have a requirement to keep forensic data regarding what client users and computers are connecting to a server then auditing is the way to go.

The network logon can be tracked via user logon auditing-if the server has that enabled-but it may also be tracked by SamLogon entries in a NetLogon debug log. SamLogon entries are roughly analogous to a network logon.

More to the point, there are plenty of times where you simply need to answer questions like “what domains to clients belong to who are connecting to the server?” or “are any of the clients submitting malformed NTLM authentication where the domain portion of the domain\user is missing?” or “did any of the clients connecting get timeout errors?”.

Worse still is performance bottlenecks where a server is having to perform so many NTLM password validations or Kerberos PAC validations that some of them time out and give errors. Errors like credential prompts or simple “spinning donut” waits.

Errors like that are the result of the dreaded MaxConcurrentApi issue where too many NTLM or Kerberos PAC validations are coming than can be serviced before timing out. More info on MaxConcurrentApi issues can be found in this TechNet wiki article.

If you need to run a diagnostic in your environment to see if MaxConcurrentApi issues are happing then you can run the Microsoft CSS MaxConcurrentApi Diagnostic from the Self Help Diagnostic Center. More info on the diagnostic is available in this KB article:

[SDP 3][52a5f43b-2db9-40b7-88ae-fd9842cca85f] MaxConcurrentApi Diagnostic

No Microsoft support case needed to run it you simply need a Live account. Alternatively you can run the MaxConcurrentApi script from the TechNet Script center (available for download from this link).

Now, let’s get to the point of this article. A new PowerShell script is available which can give you the statistics about what SamLogon events are occurring to a server.  It is intended to help give you supplemental information so that you can know what is happening to your servers in aggregate.  The script parses NetLogon debug log files to give that info. Netlogon debug logging can be enabled using the KB article:

 Enabling Debug Logging for the Net Logon Service

The PowerShell script can be download from the TechNet Script Center at this link.

An example of the switches to run the script:
PS C:\Scripts> .\MaxConcurrentApiNetlogonParser.ps1 -NetlogonFilePath $Log -DomStats $true -AuthFailureDetails $true

More information about using the script:

  • The script takes a required string parameter of –NetlogonFilePath. This path is to the NetLogon debug log to analyze. The default is to look in the default path of the NetLogon log which is c:\Windows\debug\netlogon.log.
  • The script will display the start time and end time of the log as indicated by the time and date stamp of the first and last entry to the log.
    The script takes approximately 5-10 minutes to do all parsing options of a full sized (20Mb) NetLogon log.
  • The script will display results in the PowerShell console but also create a text file output to c:\windows\temp. The output file will be named  NLParseResults plus the date.
  • The script has four switches:  GetDomainsOnly, DomStats and AuthFailureDetails, and FileSizeOverride
    • [bool]AuthFailureDetails: This switch tells the script to parse through the NetLogon log and find all entries that indicate a user received a timeout error (typically a credential prompt) due to MaxConcurrentApi bottlenecks. This switch defaults to False.  
      • If there are timeout errors the script will parse the log and find whether any of the entries are from computers that submitted malformed authentication requests-requests which were missing the domain name or had the user name in the incorrect format.
      • If malformed requests are found the user name and the computer name which submitted the request are returned for review.
      • NOTE: This switch requires that the NetLogon debug log was gathered from a Windows Server 2012 or later server. The reason for that is that Windows Server 2012 added the thread ID of the caller to the NetLogon debug log. The script uses that Thread ID to tie the failure error code to the user/computer/domain of the failing users initial SamLogon entry.
    • [bool]FileSizeOverride: Processing very large Netlogon debug logs may lead to processor utilization and memory performance problems. For that reason logs which are 50Mb or greater are automatically prevented from being processed. This switch, when passed with as $True, will over ride that and allow the processing to continue.  NOTE: Perform the log processing on non-production computers only!
    • [bool]GetDomainsOnly: This switch will tell the script to simply parse the log for the domain names in it and return them in an array. The switch defaults to false.
    • [bool]DomStats: This switch tells the script to analyze the NetLogon debug log and give a summary of successful logons. This switch defaults to True.
      • How many SamLogon total entries were in the log. These entries are roughly analogous (though not exactly the same) as Network Logons of users.
      • How many SamLogon entries broken down by domain of the user logging on and then ginned up into a percentage table.

Here’s an example:

Netlogon Analysis of SamLogon Entries

Analysis will show a percentage breakdown of the domains from which users are logging onto the server.

The logon sessions are Network logon sessions.

NOTE: Computers may show up as domains if the auth is from non-domain joined clients or apps which submit authn improperly.

Analysis of Netlogon Log File C:\windows\debug\netlogon.log

Log start time 04/10 06:55:31

Log end time 04/10 07:38:39

**********************************************************

Total Samlogon (network logon) entries were: 58482

Domain CHILD.TREYRESEARCH.COM : 14 %

Domain (null) : 1 %

Domain TREYRESEARCH.COM : 30 %

Domain CHILD.CONTOSO.COM : 30 %

Domain CONTOSO.COM : 2 %

Domain TAILSPIN : 2 %

Domain TAILSPINTOYS.COM : 13 %

Domain RESEARCH.TAILSPINTOYS.COM : 1 %

Domain CONTOSO : 4 %

Analysis of Netlogon Log File C:\windows\debug\netlogon.log
Log start time 04/10 06:55:31
Log end time 04/10 07:38:39
*******************************************
Summary of User Failures by Domain (estimated):
Note: Computers names may appear as domains if the authentication is improperly submitted by the client.

NTLM user auth failures for domain:REGIONAL
NTLM user auth failures count:6

NTLM user auth failures for domain:CHILD.TREYRESEARCH.COM
NTLM user auth failures count:4044

NTLM user auth failures for domain:(null)
NTLM user auth failures count:5

NTLM user auth failures for domain:TREYRESEARCH.COM
NTLM user auth failures count:164

NTLM user auth failures for domain:RESEARCH.TAILSPINTOYS.COM
NTLM user auth failures count:7942

NTLM user auth failures for domain: CONTOSO
NTLM user auth failures count:7932

NTLM user auth failures from Problematic NTLM Auth
Count of NTLM user timeouts from (null)\ domain:5

User names of users whose computers or devices submitted problematic NTLM auth (without a domain name)
Null user is shakala
Null user is billy.bob@contoso.com
Null user is junie.may@treyresearch.com
Null user is eeriend
Null user is sally.sue@tailspintoys.com

Computers or devices from where users submitted problematic NTLM auth (without a domain name).
Null computer is shakalapc-1
Null computer is CCFNG-WIN7-2
Null computer is RSRCH2008
Null computer is EMEA-WS66342
Null computer is JIMMYSTABLET

If you are in a situation where you are ramping up the load on a new server or server farm, if you suspect you are seeing load based issues related to authentication on a server or if you simply want to know more about the clients connecting to a server in a holistic way then this script can help you out.

Comments

  • Anonymous
    April 29, 2014
    Really great tool but unfortunatly detailed authentication timeout analysis only works for windows 2012.
    Thanks anyway. You rock !!