Jaa


Cautions with Indexed Views

If you’ve been following my blog, you may remember an article I did extolling the virtues of indexed views - https://blogs.msdn.com/microsoftbob/archive/2009/05/13/sql-data-warehousing-tip-2-indexed-views.aspx.  I stand behind that and have found this feature extremely useful.  In fact in my last post the 11 seconds it was taking to perform a few thousand transactions against a 10-million row history table with 3 million transaction went down to just 2 seconds using an indexed view.

However, before you go out and make every view into an indexed view, it is important to understand what an indexed view is and what are the consequences.  I was reminded of this very sharply when I did the following “routine” update command on my table containing all of the trading symbols used for my simulation processing:

 /*------------------------
update dbo.EquitySymbol 
set Shortable = 1 
from Load.ShortableList l
where l.Symbol = dbo.EquitySymbol.TradingSymbol
------------------------*/ 

(6334 row(s) affected)

Now, normally this had been taking less than 2 seconds.  But, this was before I added the indexed view that happened to include the “Shortable” column.  So, how long did it take with the indexed view in place?

image

Almost 19 minutes?  Yes, 18 minutes and 51 seconds?  That is actually about twice as long than it takes to create the indexed view along with all of it’s alternate indexes!  Pardon the drama, but my point is that Indexed views are dramatic by their nature.  They are dramatic in that on one hand you can analyze millions of rows and help enabling performing something as complex as generating 7 years of trading simulations consisting of over 3 million orders for 10 million rows in a space of a couple of hours, while on the other hand causing a small update to take over a thousand times longer than usual…

So, how do we explain this?  First, we need to understand what happens when you create an indexed view.  The first step is to enable schemabinding on the view.  This ensures that SQL can tag a dependency between the table and the view so it can prevent and detect changes that might cause the view to stop functioning, that isn’t where the performance hit comes, (see https://msdn.microsoft.com/en-us/library/ms187956.aspx for an explanation of the process for creating an indexed view including the schemabinding step).  The key threshold happens (pun intended) when you create the primary key for the view which is the unique clustered index for the view and required before you can add secondary indexes to the view.  What that does it “materializes” the view such that it is no longer just a query into your tables, but it becomes a table itself with real data.  And of course, that data must be maintained to match the underlying tables. 

So, what was the situation in my case?  Well, the “shortable” flag was part of the indexed view that joined the equity symbol table (which has about 9,000 rows) to the Equity History table, which happens to have about 10 million rows.  My “little” update of the 6,000 or so Equity Symbol rows cascaded into about 7 million updates necessary to maintain all of the rows in the indexed view.  Here you can see the major time-consuming aspects of the query plan with the indexed view in place:

image 

Notice the large time for the sorts, these were apparently needed to prepare for secondary index updates that included the shortable column as an included column.  And notice that the update time for the clustered index for the table actually containing the column ([EquitySymbol].[PK_EquitySymbol]) is estimated to take virtually 0 percent of the total query time.

So, what does this mean – avoid indexed views?  Not at all!  It just means as the title suggest: use caution.  So what could I have done differently to avoid this issue.  To start with, my update command is flawed.  I told SQL to update every row and didn’t even look to see if the rows needed to be updated.  SQL Server will happily do what you tell it to, it doesn’t have a built-in mechanism that says - “Hey, you don’t need to update that column, it already contains the value you specified”.  It doesn’t do that for a very good reason.  For example, what if I had a trigger on a table that checked to see if a particular column was updated.  Since SQL trigger processing with the “IF UPDATE(column)” doesn’t check to see if the value is actually changed, just whether it is updated, it is easy for me to force a trigger to execute just by updating the column to its current value.  So, what I should have done was:

 update dbo.EquitySymbol 
set Shortable = 1 
from Load.ShortableList l
where l.Symbol = dbo.EquitySymbol.TradingSymbol
AND dbo.EquitySymbol.Shortable <> 1

Since there were only a handful of rows that had changed since the last time this update command was run, doing it this way, should have taken only a few seconds, albeit it still would have required a couple thousand updates to cascade the updates to the related rows the view. Another option is that if I really did need to update a few thousand rows, would have been to simply drop the indexed view, perform the update, and recreate the indexed again. There may be some more exotic approaches such as disabling the indexes on the indexed view, but effectively the approach is the same.

So, here are a few suggestions when dealing with indexed views:

  1. Do not include more columns than you really need, particularly if the columns may be subject to change.
  2. Consider the cardinality of the values in the indexed view.  If your indexed view ends up repeating a few value thousands of times, consider the source of the data.  If that data is often changed, then expect those updates to take many times longer than without the indexed view.  Be very careful when joining a table with relatively few rows (i.e. 6,000) that is frequently updated to a much larger table (i.e. 10 million), even if the larger table is infrequently updated.
  3. Consider carefully the use of low-cardinality items in the indexes either as segments of the indexes or as included columns, as this will add to the update time for the view.
  4. Don’t create an indexed view unless there is at least one critical query that can benefit significantly and measure that benefit against the cost of other queries that may need to update the table. 
  5. When considering the costs/benefit of an indexed view, don’t forget to factor in that SQL will use the indexed view for queries that do not directly reference the indexed view.  Absent a query hint to indicate otherwise, SQL Server will leverage the indexed view in query plans that access any tables related to the indexed view.  So, when you create an indexed view intended to help one query, you may find it helping other queries as well.

So, the bottom line is that like just about everything else in SQL Server 2008, indexed views have their place, but use them wisely…

Technorati Tags: Indexed Views,SQL Server 2008,SQL Server,SQL Server Performance