What data to collect from a SQL Server failover instance after a problem occurs
All too often, after a problem occurs, the administrators will not collect and save any logs from the issue, thinking that the logs are not getting recycled often.
This is untrue, especially about the cluster logs, which in some environments can be recycled in under 24 hours.
So what data do I need to collect and save, so that me or someone else can review them at a later time and understand what was the root cause of the issue?
Here we go:
1. Application and System Event logs (from each node, if this is a cluster). You can export these logs from inside the Eventviewer tool. Prefer to export them in .txt format, so that they can be reviewed easily at any version of Windows.
2. Cluster logs. These are very important and they can be easily recycled in a few hours. To export the cluster logs, you can use the steps from this article:
How to create the cluster.log in Windows Server 2008 Failover Clustering - https://blogs.msdn.com/b/clustering/archive/2008/09/24/8962934.aspx
3. SQL Server ERRORLOG files. These are more difficult to be recycled, unless you do frequent reboots or frequent SQL Server service restarts. They are located at this folder by default:
The Default Location of the Microsoft SQL Server Error Logs - https://support.microsoft.com/kb/966659
4. The list of group policies that are currently running on this SQL Server:
gpresult /z
Troubleshoot Group Policy from the Command Line with GPRESULT - https://technet.microsoft.com/en-us/magazine/ff812646.aspx
5. If the issue is related to Analysis Services, you can also collect this set of data:
Data collection for troubleshooting Analysis Services issues - https://blogs.msdn.com/b/as_emea/archive/2012/01/02/initial-data-collection-for-troubleshooting-analysis-services-issues.aspx
6. If the issue is related to Reporting Services, you can also collect this set of data:
Log Files for SQL Reporting Services - https://technet.microsoft.com/en-us/library/cc512029.aspx