Query performance
Understanding the reason for slowness of a SQL query and then tuning to boost it is a slightly complicated process, but it can be extremely rewarding in some cases. In this post, I am going to list down some ways in which you can tune the query performance for SQL Server Compact, along with pointers to relevant online resources. If you are a seasoned database developer, you probably know most of the tricks already.
Before I jump into it, an overview of the basics first.
Basics:
- Displaying actual and estimated Execution plans for SQL Server Compact: You should be familiar with Execution Plans (also called as query plans and showplans).
- For non-parameterized queries: There are several easy ways to do this
- When database is on a docked device or desktop: Generate a query plan using SQL Server Management Studio (SSMS) that gives a nice graphical view of the query plans. (You can connect to the database on device as long as it is docked. No need to copy the file to desktop.)
- When database is on a device:Generate a query plan using Query Analyzer. This generates a .sqlplan file that can be viewed graphically in SQL Server Management Studio as before (preferable) or opened as a plain XML file.
- For Parameterized (as well as non-parameterized) queries:
- Use the TSQL Statements SET SHOWPLAN_XML (for estimated execution plan) and SET STATISTICS_XML (for actual execution plan), followed by SELECT @@SHOWPLAN to generate the query plans. Store the results as an XML file. (Tip: To view the query plans graphically, just rename the file with a .sqlplan extension and it is ready to be opened in SSMS like an ordinary non-parameterized query plan.)
- For non-parameterized queries: There are several easy ways to do this
- Indexes and statistics on indexes:
- One of the important tasks of the query optimizer is to select the right index to execute the query. An optimizer can make better decisions if it has histogram data (i.e. statistics) about the distribution of values for an index.
- CREATING/UPDATING/DELETING statistics on an index: SQL Server Compact 3.5 creates statistics on all indexes by default and updates them as required. (You can surely tweak these settings, but it is advisable not to. See CREATE STATISTICS /UPDATE STATISTICS/DROP STATISTICS).
- To view the statistics distribution, the built in stored procedure sp_show_statistics is pretty handy.
- Statistics were not created by default on primary key indexes till version 3.1, so you might want to check this and add them yourself :- ).
- One of the important tasks of the query optimizer is to select the right index to execute the query. An optimizer can make better decisions if it has histogram data (i.e. statistics) about the distribution of values for an index.
- ... A lot of other things too J, but this will do for the time being.
The Meat:
If you are trying to improve the performance of your application that uses SQL Server Compact, make sure you have checked the following resources on MSDN already (In this post I will not go into any of them):
- Database Design and Performance: This page talks in some detail about the following:
- Database denormalization
- Variable and fixed sized columns
- Effect of row and key lengths
- Query Performance Tuning: This page explains some of the following:
- Improving indexes by creating selective indexes, choosing the columns in a multi-column index, issues in indexing a small table etc.
- When to create indexes
- Using parameterized queries where possible.
- Rewrite queries where possible
- Using base table cursor (CommandType.TableDirect) mode for simple queries.
(The article on Query Process Overview and Performance Tuning approaches for SQL Server CE 2.0 covers the same points as above, but I found it better than the MSDN links J. It is relevant for SQL Server Compact 3.5 too.)
Now here are some additional ways you can improve performance:
1. Recompile Stale query plans
<StoryTime>Sometime back I got a complaint about slow performance of our database even though the user was using parameterized queries and had created the correct indexes everywhere. The sequence in which he was performing operations was something like this:
- Create the full database schema (tables, indexes etc).
- Prepare SqlCeCommand objects for *all* queries that will ever be used by the application.
- Populate the database with actual data. Until now all tables were empty.
- Execute the queries using previously prepared command objects
What's wrong with this?
A prepared command object (SqlCeCommand) holds a query plan. The query optimizer generated this query plan and had optimized it for the state of database when the plan was generated. In this user's case, the database had just empty tables at that point in time. The optimizer therefore inferred that using a table scan is the best way of executing the query (as opposed to using any index). As a result, all his query plans used just table scans instead of the right indexes even when the table size increased!
</StoryTime>
Moral of the story: Caching query plans is good, but only as long as the state of database when they are compiled is representative of the average state of database during execution. If your data is changing rapidly, it is better to recompile the queries every once in a while.
So, beware of stale query plans!
Note that query plans can be cached for both parameterized and non-parameterized queries and you can run into this problem in either case. (Another insider hack: SqlCeCommand.Prepare doesn't really prepare the query plan. It just marks the command object for plan (re)compilation and the plan is compiled when SqlCeCommand.ExecuteXXX is called next time.)
2. Impression of using parameterized queries without really using them
Question: What's wrong (rather sub-optimal) with the following code:
SqlCeCommand cmd = con.CreateCommand();
cmd.CommandText = "SELECT * FROM table1 WHERE C1_DECIMAL = @p1";
cmd.Parameters.Add("@p1", SqlDbType.Decimal);
cmd.Parameters["@p1"].Value = 3.5;
SqlCeDataReader rdr = cmd.ExecuteReader();
while (rdr.Read()) { /* read the data */}
rdr.Close();
cmd.Parameters["@p1"].Value = 335.01;
rdr = cmd.ExecuteReader();
while (rdr.Read()) { /* read the data */ }
rdr.Close();
Answer: The query plan gets compiled twice, once for every ExecuteReader call!
Reason: First time the plan is compiled, the parameter's precision is 2 and scale is 1 (as inferred from its value ‘3.5'). The query plan therefore uses this precision and scale values. The only other values it can accept are those that fit within this range.
When the query is executed again, the parameter's precision is 5 and scale is 2 (as inferred from its value ‘335.01'). Since it doesn't fit in the parameter as inferred during first plan compilation, the plan is recompiled silently!
Solution: If you can anticipate the range (size, precision or scale) of the parameter values, then specify it explicitly. The plan will be generated based on the specified range then. This holds true for all character, binary and numeric types. (The flip side is that for any parameter value that does not fit into the specified range, an error will be thrown.) So the above code can be modified as follows:
SqlCeCommand cmd = con.CreateCommand();
cmd.CommandText = "SELECT * FROM [TABLE1] WHERE [C1_DECIMAL] = @p1";
cmd.Parameters.Add("@p1", SqlDbType.Decimal);
cmd.Parameters["@p1"].Precision = 10; // Playing safe by taking a larger range!
cmd.Parameters["@p1"].Scale = 5;
cmd.Parameters["@p1"].Value = 3.5;
SqlCeDataReader rdr = cmd.ExecuteReader();
while (rdr.Read()) { /* read the data */}
rdr.Close();
cmd.Parameters["@p1"].Value = 335.00;
rdr = cmd.ExecuteReader();
while (rdr.Read()){ /* read the data */ }
rdr.Close();
3. Query optimizer did not choose the best index
The query optimizer usually does a good job of choosing the index. However, it does it on a best effort basis and there can be cases where it doesn't pick the right index. In such cases, it makes sense to use index hints and outsmart the optimizer:
E.g. ...FROM [TABLE1] AS T1 JOIN [TABLE2] AS T2... can be rewritten with index hints as
...FROM [TABLE1] AS T1 WITH (INDEX(IX_On_Table1))
JOIN [TABLE2] AS T2 WITH (INDEX(IX_On_Table2))...
How do you know if your index is indeed better than the index chosen by optimizer?
Well, the easiest and brute force method is to run the query with and without an index hint and see which is better. I often use the Query Analyzer on device and SSMS on desktop to get the execution times. Another way is to generate query plans for both queries and study them for better predicate matches, index ordering etc.
4. Query optimizer did not choose the best join order
As in the case of choosing a wrong index, an optimizer can sometimes choose a wrong join order too. Again a corner case, but not impossible to run into. In such cases, you can specify the join order explicitly:
E.g. ...FROM [TABLE1] AS T1 INNER JOIN [TABLE2] AS T2 ON ... can be rewritten with a forced join order as
...FROM [TABLE1] AS T1
INNER JOIN [TABLE2] AS T2 ON ...
OPTION (FORCE ORDER)
As with any sort of query hint, you should be extra careful that you are doing the right thing. For instance, the join order chosen for a query can change between multiple runs due to several reasons. The cardinality of the join tables could have changed significantly, or the data distribution could have changed. So even though the join order that you selected at the time of deployment was optimal, it may not remain so all the time. Ordinarily the optimizer would decide the join order on fly and therefore can adjust to such changes.
Not so when the join order has been forced.
That's all for the time being.
-Pragya Agarwal
Comments
Anonymous
April 29, 2008
PingBack from http://microsoftnews.askpcdoc.com/performance/query-performanceAnonymous
May 01, 2008
Good post! Nicely written, thanks!