Directly querying SharePoint databases
One of the first rules of SharePoint that you learn when you start diving in to the product is that you shouldn't ever touch the SharePoint databases - it's fairly well known that you should leave them alone, but I've been asked the question a few times over the years "Well what if I'm just reading from the databases? Surely that's ok because I'm not making any changes to the data", so what I wanted to cover off in this post is why it's not a great idea to read from the databases and what the impact of doing this is.
Is reading the from the databases supported?
To answer this question the best place to start is a knowledge base article we have published at https://support.microsoft.com/kb/841057. This article discusses what is supported in regards to executing queries against the SharePoint databases, which I won't bother reproducing here but this is where the "don't touch the database" advice is formally documented. There is a section at the bottom of the page though that specifically calls out making reads from the database though, stating that this should be avoided as it can cause unexpected locking within the databases which can adversely affect performance and cause issues within the product.
What is locking in SQL Server?
So what exactly is locking and why does it matter? The short version of the answer here is that locking is how SQL server keeps resources isolated during a transaction to ensure that two transactions won't impact on each other. SQL server can apply locks automatically at different levels based on the impact of the query - for example, a query might impact on only a few rows which might result in SQL locking only just those rows, it might decide locking off a page or even up to locking a whole table. While a resource is locked other transactions won't be able to get to them, which is why SQL automatically manages where these locks apply to minimise the impact based on the query you have executed. When you execute queries against SQL server you can use the NOLOCK or READUNCOMMITTED table hints to specify that you want to make what is called a "dirty read" from the database, which will ignore current locks but you run the risk of retrieving data that isn't current which may result in errors in your logic.
Why database locks impact on performance
Putting it bluntly, if SharePoint is making requests to SQL and the queries have to wait for other locks to be released (or worse they time out all together) then it's going to appear slower to your end users while it loads the content they have requested - that simple. At any given time SharePoint will be making any number of calls to the databases for various things and this is not something you want to interfere with in ways that have not been tested and assessed by the product team - this is why SharePoint has a number of API options to query it's data, so you can get the information you need in more consistent and tested ways so that overall performance and stability of the environment won't risk being impacted by outside queries.
But using the API makes my application slower than querying SQL directly!
Yes we have an API and it lets you get to lots of data, but the reality of this matter is that the end result of using this is that it might take longer to get your data than querying the database directly. This however is the trade off you take for using SharePoint for your application. As a development platform SharePoint brings a lot to the table that you don't need to develop for yourself, but that comes at the cost of relying on using the API and it's inbuilt querying methods to get the data from SQL so that the platform can remain stable. Now this isn't to say that you can't have a negative impact on performance through using the API poorly, it's entirely possible to cause issues that way as well - but thanks to some performance management and throttling features built in to SharePoint this can be mitigated and managed, at least more so than a direct SQL query would be where the onus to get things write is entirely on the person writing the queries.
So that's a quick run through of what you should know about writing queries against SharePoint content databases - you might think that your read only queries are completely harmless, but be aware of the potential impact of what you are doing on the performance of your system, and how that might impact on support during a case. I hope that clears things up a little for you!
Comments
- Anonymous
December 08, 2013
nice - Anonymous
December 30, 2013
If I ever have to do this, I restore my DB to dev environment and query away! - Anonymous
March 03, 2014
So, with all those words, is that a, yes, I can query directly? or no? - Anonymous
March 05, 2014
It's a "You shouldn't" - if you are going to do it, be smart about it. Restore databases to other locations and query that way if you can, or at minimum use the nolock statement in your queries. But there are very few scenarios I can think of that should be addressed by direct database queries, so this really should be avoided. - Anonymous
January 21, 2015
Brian,I'll admit that my understanding of NOLOCK isn't extensive but here's what I've gathered. I'm looking into this now since direct query has been recommended recently by well meaning folk as a way to get around problems with direct query of SP databases.The NOLOCK hint is only a hint and is not guaranteed to be honored in all cases at runtime by SQL Server. NOLOCK is deprecated in SQL Server 2012 so going forward using it is not ideal. Even if this were not the case employing a read query on a production database as part of a solution will make getting support from Microsoft difficult to impossible depending on the implementation and whether or not it can be removed should it be discovered. Using this hint does not change the guidance provided in KB841057. It is clearly unsupported. Short term quickly solutions, not matter how much time they save, are not worth losing support over or corrupting client data. I have worked on solutions that didn't follow best practices and they were maddening for admins and end users alike.The important thing to note here is that we are talking about production databases. I have zero qualms about querying, for my own education, SharePoint databases in a development environment but my client and their users will not suffer if my querying goes awry there.The NOLOCK hint as described in MSDN:Do not issue shared locks and do not honor exclusive locks. When this option is in effect, it is possible to read an uncommitted transaction or a set of pages that are rolled back in the middle of a read. Dirty reads are possible. Only applies to the SELECT statement.Cheers! - Anonymous
March 02, 2015
So how does one get large data set from project subsides?How can you do this if list data love within content database - Anonymous
March 02, 2015
Hi Bharat,You can get to the data through any of our supported API's. For SharePoint 2013 I would recommend looking at our client side object models, or our REST endpoints.Have a look at this to get started with the client side object models - msdn.microsoft.com/.../fp179912.aspxYou can get started with the REST endpoints at msdn.microsoft.com/.../jj860569.aspxI hope that helps - Anonymous
May 14, 2015
The comment has been removed - Anonymous
May 14, 2015
Why are you trying to edit the database? You should not be making changes to databases. I would try posting some specifics in to the MSDN forums to see if you can get some more support that way as a starting point to investigate the cause. social.msdn.microsoft.com/.../home