SharePoint 2016: ULS logging has stopped on all farm servers
Problem
Discovered that ULS logging seemed to have stopped on all servers in a SharePoint 2016 farm, and no new ULS logs have been generated for several weeks. No farm health rule violation for the tracing service were found. Immediately began analyzing.
Analysis
01) Identified the date and time of the most recent ULS log file that was created. This was found to be immediately after upgrade operation performed post-CU installation. Verified this observation on all of the farm's SharePoint 2016 servers.
02) Reviewed most recent ULS log on each SharePoint 2016 server for any clues and found none: last line entry in each ULS log gave no indication of any issues.
03) Reviewed upgrade and diagnostics files and also found no entries associated with ULS logging failure.
04) Reviewed Services administrative tool on farm servers and found SharePoint Tracing Service disabled. Noted down the service account used as the identity for the tracing service in this farm.
05) Performed Internet search and found this posting: SharePoint trace service keeps getting disabled.
06) Executed this script:
((Get-SPFarm).Services | ? {$_.Name -match "SPTraceV4"}).Instances " | ft Server,TypeName,Status,ID
This returned the IDs of each service instance in the farm and the servers that those instances were associated with. All SharePoint Tracing Service service instances had Status Disabled.
07)Then executed these scripts
Start-SPServiceInstance -Identity [IDofFirstInstance] Start-SPServiceInstance -Identity [IDofSecondInstance] Start-SPServiceInstance -Identity [IDof3rdInstance...etc]
08) Reviewed Services administrative tool on each farm server again and found SharePoint Tracing Service now had Startup Type Automatic and Status [not running]. This result was somewhat different than the authors of the posting reported. 'Expected service instance to also startup immediately, but this didn't happen. Waited a few minutes to see if a timer job might launch these, but after several minutes, status still didn't change.
09) Clicked on Start for the service, and then experienced this prompt:
Windows could not start the SharePoint Tracing Service service on Local Computer. Error 1069: The service did not start due to a logon failure.
10) Reviewed farm's Managed Accounts (CA > Security > General Security > Managed Accounts) and found that a password change and been pushed out by the farm (the password for this account was managed by the farm) at around the same date and time of the most recent ULS log, indicating a possible password issue.
11) In the SharePoint Tracing Service Properties dialog, updated the password, and then started the service. The service started. Repeated for each SharePoint server in farm.
Solution
- Re-enable each service instance in the farm, update the service instance password using the Properties dialog for the instance, and then start the service instance from within the Services administrative tool.
References
Notes
- I didn't try changing the Startup Type through the service instance's Properties dialog. This might have worked too rather than using PowerShell to change it.