Udostępnij za pośrednictwem


10 Top SQL Server Issues Uncovered By The SQL Server Risk Assessment Program


Our very own editor-at-large by night and Microsoft Principal Premier Field Engineer by day (and night as well!) Arvind Shyamsundar takes us on a tour of some of the top issues detected across Microsoft’s broad customer base by the Microsoft Premier Support SQL Server Risk Assessment Program (a.k.a. SQL RAP) offering. Hopefully this info will help you prevent the “perfect storm” from happening in your workplace.


The ‘Perfect Storm’

The Perfect StormImagine this Monday morning scenario: you had a great weekend with your friends at the beach. Those bones are aching just a bit after all the football you played. Still, you drag yourself out of the bed, go to the office, hoping, just hoping that nothing major happens today.

BUT… when you arrive at the office, you find all hell is breaking loose. The phones are ringing all over the place, and your manager is shouting at you and your colleagues to do something. And exactly what is wrong? Well, your mission critical database, all 2TB of it, is suddenly offline and unreadable. You look at the error logs and find the dreaded words – database corruption. You look for the backups and then figure out that you were never taking transaction log backups for this database. And later during a root-cause analysis it comes to light that the first signs of corruption actually showed up 6 months ago! And you never knew about it because there were no proactive alerting or checks configured for this database instance.

PFE to the rescue!

None of us ever want to face this ‘perfect storm’ scenario.  Microsoft Services and the Premier Field Engineering (PFE) team, through our series of Premier Proactive Assessment Programs, will help you to proactively identify, remediate and mitigate risks proactively.

The SQL Server Risk Assessment Program (SQLRAP) is currently available for SQL Server versions 2000-2008 R2. It includes an onsite PFE visit (typically for 4-5 days) and a deep knowledge transfer element.

In October 2012, we also have released SQL RAP as a Service (here) which targets SQL Server versions 2005-2012. It is a remote offering with a PFE delivering the analysis through an online meeting.

Both options are currently available in most markets worldwide, and you can select the one which most suits your business needs.

We've got you covered!

In the last 4 years, Microsoft Premier Field Engineers have delivered over 6,500 individual RAPs for SQL Server, identifying an average of 140 issues in each engagement. That’s a lot of stuff! And it keeps getting better – the SQL RAP as a Service offering increases the scope and reach, including the latest SQL Server 2012 features such as AlwaysOn Availability Groups. There are too many issues to list in this single blog post, so here is a ‘tag cloud’ of the coverage in these offerings.

Tag Cloud of SQL RAP Technology Coverage

Now, wondering what are the most common issues which we find? Come on, let’s take a look!

The 10 most common issues uncovered by SQL RAPs (in no particular order)

1. Security Updates not applied

The most common critical issue we find is that Windows Updates have not been applied regularly. And not only do we identify the current issue with health (specific updates missing) but we go further and through a series of operational interview questions, identify that there is a risk because there is no comprehensive patching policy.

2. SQL is running under an administrative account

We find very commonly that while installing SQL Server, an ‘administrator’ or equivalent service account has been selected. This leads to an unwanted elevation of privilege security risk. We identify all instances of SQL Server which are running under such privileged accounts.

3. Xp_cmdshell is enabled

This one comes up in many instances, and in many cases, in combination with the above (admin account) issue. Many administrators do not realize that xp_cmdshell is not required to be turned on all the time. They can either enable and disable it on demand, or use the proxy account feature to mitigate the risk.

4. Database integrity checks not scheduled

Many times, we find that the DBCC CHECKDB command has never been executed, or is not executed on a regular basis. This can very well fail to alert us to impending catastrophic failure of the database due to corruption issues. We identify databases for which CHECKDB has either never run or has not run at least once in the last week. It is recommended that the CHECKDB be scheduled on a monthly basis and on-demand for significant events such as unexpected shutdown / failover of the server.

5. SQL Server has not been updated to the current service pack

Another very common issue is that SQL instances are not updated with the current service pack. Not only is this important from a supportability point of view, but it is also important to receive the latest security, performance and stability fixes. We identify all instances which have older service packs installed. The onsite PFE in the case of the SQL RAP offering will also advise you through a knowledge transfer exercise on the importance of testing this update prior to application.

6. The usage of NOLOCK

In the SQLRAP, we also perform a limited static code analysis of your T-SQL stored procedures, views and functions. During this analysis, we will identify objects which are using a NOLOCK table hint and using that data to perform a modification (INSERT or UPDATE typically.) As we know, the NOLOCK hint can result in inconsistent (a.k.a. ‘dirty’) data reads and in some cases can actually cause errors as well. Note that this static code analysis check is currently not available in the remote SQL RAP as a Service.

7. Large growth values (1GB+) for data / log files

This issue is typically the outcome of using the default 10% auto-growth setting for data / log files. SQL RAP detects all such files and provides us the remediation advice to set the growth increment to a fixed value, such as 300-500MB. Specifically for transaction log files, we also perform a related check to determine the number of virtual log files and flag if they are greater than a threshold.

8. Weak and blank passwords for SQL logins

This is one of many security checks which are performed in the SQL RAP. Specifically, this check uses the PWDCOMPARE function to determine if the password is the same as the login name. In addition the check also identifies passwords which are blank. We also provide you scripts so that you can perform this check on a regular basis.

9.  Disaster Recovery Plan is not documented or does not exist

The SQL RAP has a strong operational element to it. During the SQL RAP the DBA staff has to answer a questionnaire which aims to uncover issues such as those outlined above. Specifically this issue is uncovered after a conversation between the PFE and the DBA staff, and in many cases the business owners of the system. The SQL RAP also identifies related issues around the SLA (or lack thereof) of database backup / restore.

10.  Clustered MSDTC configuration issues

The SQL RAP inherits a lot of checks from the Cluster RAP. One of the most common issues we find therein is that the MSDTC resource has actually been created in the cluster resource group. This can cause performance and stability issues.

Conclusion

By now, you should have a pretty good idea of what the Proactive Premier Assessment Services can do for you. Contact your Microsoft Premier Technical Account Manager to book your SQL RAP or SQL RAP as a Service today!


Posted by Frank Battiston, MSPFE Editor

Comments

  • Anonymous
    May 09, 2013
    Wow! You need to get an EA to do a SQLRAP to be told this...

  • Anonymous
    May 12, 2013
    Only you were just told this, so whats your point?

  • Anonymous
    November 15, 2013
    Running a SQL RaaS on a clustered SQL Server 2005 SP4 and all the collection steps fail with "There were no Nodes found to iterate against. ViewName: SQLView, etc. I ran the Scoping tool and everything for prerequistes passed validation. It even identified the system as a cluster during the iniital configuration. What possible permissions issue can this be attributed, I am senior system administrator for my company and I know I have all the necessary permissions on servers.