SQL Server 2008 Indexing for JOINs
Here are some notes on "SQL Server 2008 indexing for JOINs" I took during an advanced class I attended taught by Kimberly Tripp (https://sqlskills.com/AboutKimberlyLTripp.asp).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
Tables per select
- Old limit of 16 tables per select (before SQL Server 7) went up to 256
- Lots of people needing large join worked around using temp tables
- The limit is gone, but people still do it, sometime for simplicity
- Try turning temp tables into views and joining the views
- Maximum Capacity Specifications for SQL Server at https://msdn.microsoft.com/en-us/library/ms143432.aspx
Main ideas
- Each table has a join condition and search arguments
- For large joins, start with the most expensive table, extract and optimize it
- one of the tables becomes the "driving table"
Finding the "driving table"
- For the most selective of the two tables, index on the search argument
- For the least selective of the two tables, index on the join column foreign key
- If you don't know which will be more selective, how to do it?
- You could have indexes on both search argument and join column for both.
- Then SQL can have the best indexes not matter which is more selective.
Combine the two
- Another step: Cover the combination of the two
- Index on "search argument, join column" or "join column, search argument"
- Index on "join column, search argument" is probably more useful in other cases
- When in doubt: create both, see which one is used, drop the other one.
- Helps when "join column, search argument" is selective, but each one is not
Covering index
- Ultimate step: Create a covering index (INCLUDE all data columns).
- You will not need any bookmark lookups since all the data is in the index.
- This is the most expensive index. Might not be worth the trouble.
Statistics
- SQL cannot have perfect statistics for a table at all times.
- Think orders table with hundreds of millions of rows.
- Average of 5 orders/customers, some corporate customers will have thousands.
- Statics are calculated in steps, so it's possible that SQL won't know enough.
- Statistics are by table, not by partition.
- Having multiple partition tables with a partition view works around that.
- You could also use filtered statistics to help SQL make the right decision.
Force?
- You might end up thinking about forcing the plan to use an index
- You could make a mistake (force for small customers, query hits corporate ones)
- Avoid forcing SQL to do something if at all possible.
DTA
- Database Tuning Advisor (DTA) will help here. DTA is all about coverage.
- DTA is great for testing specific queries, suggesting options.
- Grab a backup of your production database, don't run DTA in production.
- Evaluate for recommendations AND select "Do not keep any existing PDS"
- Be careful when you're dropping things you don't want at the end of DTA...
- DTA hypothetical indexes are actually statistics.
- You need real data in your set, or else DTA can't figure statistics correctly.
- Consider taking the recommended statistics, it might help in other places.
- DMVs could be good enough, if you don't want to spend the time on DTA.
- How much time do you want to spend on optimizing a query? It depends...