次の方法で共有


Understanding SQL Server Fast_Forward Server Cursors

SQL Server's server cursor model is a critical tool to many application writers. Fast_forward cursors are very popular as an alternative to read_only forward_only cursors, but their inner workings are not well-publicized. So I thought I'd give it a go.

Background

A server cursor is a cursor managed by SQL Engine. It consists of a query execution and some runtime state, including a current position. SQL clients can use a server cursor to fetch results of a query one-at-a-time, or in batches of N-at-a-time, rather than by the usual all-at-a-time, firehose, default query result set. Many applications and client libraries have iteration models, but server cursors are the only way for SQL Server engine to do incremental query processing.

Server cursors' popularity is derived from three advantages:

1.       Matching very well the iterative model of programming that non-database programmers are most comfortable with. This advantage is more perceived than actual, and does more harm than good. Books Online puts it well:

           

            If the data that is to be retrieved will be consumed all at once, and there is no need for positioned updates or scrolling, default result sets are recommended.

2.       Reducing unnecessary processing of unused data when a client application stops reading data early. Because in many cases (though not all), server cursor processing is incremental and on-demand, server load and data transmission can be reduced.

3.       Mapping very well to screen-at-a-time or window-at-a-time viewing and scrolling through large data sets. There is simply no better way to tell SQL Server to "give me the next row(s)."

SQL Server engine has four server cursor models: static, keyset, dynamic, and fast_forward. [Apologies for not using the ADO.NET terminology -- it tends to confuse me. How various settings on ADO.NET and ODBC commands translate to server cursor models is a topic for another time.] There is lots of further background on the models and their differences in SQL Server 2008 Books Online's section on cursors. But my concern here is to demystify fast_forward cursors, about which Books Online say very little, perhaps worse than nothing:

FAST_FORWARD

Specifies a FORWARD_ONLY, READ_ONLY cursor with performance optimizations enabled.

What's a fast_forward cursor in a nutshell?

It's a cursor model equivalent to read_only, forward_only that compiles to a static-like or dynamic-like cursor plan. It does not downgrade to other cursor models, like dynamic and keyset do.

Aren't fast_forward cursors redundant? We already have read_only forward_only cursors. Why do we need them?

Fast_forward is redundant. Read_only forward_only cursors are sufficient for many apps. But some apps got poor query plans. Read_only forward_only cursors are dynamic cursors, and dynamic cursors use a dynamic plan if one is available. The reason this is a problem is that sometimes, the best dynamic plan is much, much worse than a static one. Fast_forward cursors take a more balanced approach, choosing a static plan if it looks better.

When should I use fast_forward vs. read_only forward_only?

On balance, fast_forward is a little better. [1][1] However, performance testing should be done before a final decision is made for a particular application. This is because the way the decision is made to use a dynamic-like or a static-like plan is fundamentally different (see below). Either way, index tuning and plan hints may be an important part of tuning your cursors, whichever cursor model you choose.

What's a dynamic plan?

A dynamic plan can be processed incrementally. In SQL Server we do this by serializing the state of the query execution into what we call a marker. Later, we can build a new query execution tree, use the marker to reposition each operator. Moreover, a dynamic plan can move forwards and backwards relative to its current position. Dynamic plans are used by both dynamic and some fast_forward cursors.

A dynamic plan consists only of dynamic operators -- operators that support markers and moving forwards and backwards. This corresponds closely, but not exactly, to the query processing notion of streaming operators (vs. stop-and-go). But not every streaming operator is dynamic. In SQL Server, dynamic means:

1.       The operator can be repositioned to its current position using a marker, or to a relative position (either next or previous) from its current one.

2.       The operator's state has to be small, so the marker can be small. No row data can be stored in the operator. In particular, no sort table, hash table, or work table. Not even one row can be stored, since a single row can be very large.

Without a dynamic plan, the cursor would need temporary storage to keep the query result set (or keyset thereof). A dynamic plan does no such thing! However, certain operators are disqualified -- hash join, hash agg, compute sequence, and sort, for example. This leads to sub-optimal plans.

When is a dynamic plan worse than a static one?

In some queries, like those that use row_number, a dynamic plan is simply not available. The trouble comes when there are both dynamic and static alternatives. This can happen when an operator, e.g. join, has dynamic (nested loops) and non-dynamic (hash) implementations. This can also happen when some indexes support interesting orders and some do not.

Here is an example of the latter drawn from a real customer case. Table ORDERS has an index on DATE and an index on SUBTOTAL (clustered/non-clustered is not relevant for the example). The application wants to see very large orders:

      SELECT DATE, SUBTOTAL, ORDERID, CUSTOMERID

      FROM ORDERS where SUBTOTAL > 10000

      ORDER BY DATE

For argument's sake, say there are 100 orders this large out of a table of 100 million orders. A dynamic cursor cannot sort, so it must use the DATE index, touching all the rows, filtering on SUBTOTAL. A static cursor has the option of seeking the SUBTOTAL index for the range >10000, sorting the rows, and storing them in the cursor.

How do fast_forward cursors solve this problem?

Fast_forward cursors under certain conditions[2][2] choose the cheaper of the best static and best dynamic plans. In an extreme case like the above, it can use a static plan. It should be the best of both worlds.

What's the hitch?

The optimizer relies on cost estimates to decide on the cursor model. Sometimes, when it picks static rather than dynamic, it will simply be wrong, and the application will be very slow. There are three reasons why this happens.

1.       The cost of a query depends on the "row goal": the number of rows estimated to be fetched. In many cursor applications, user are viewing a screen with a few rows and a next button or scroll bar. If the user is likely to scroll through all the data, the plan cost is different than if the user is likely to quit after viewing the first page. We have to pick a row goal at compile time to estimate a cost, but the user's behavior is unknown at that time. Typically, static plans are not sensitive to row goal, while dynamic plans are critically sensitive. In the case above, the dynamic plan cost scales linearly with the row goal (and when there are joins, it's super-linear). That makes the dynamic plan superior by an arbitrary factor for low row goals, and inferior by an arbitrary factor for high row goals. The factors depend on query and data parameters (such as the 10000, and the data distribution, in the example). Application writers can control row goals by specifying OPTION (FAST <N>) on their cursor queries, but often they cannot predict the correct value either.

2.       Parameterized queries are compiled based on the parameters at hand, and then reused for different parameters. In the example above, it makes a big difference whether the cut-off is at 10000 or at 10. This is no different from non-cursor queries.

3.       The choice of plan is subject to estimation errors, which can be extreme, especially when combined with the previous factors. A typical flavor of disaster involves two indexes like in the example, where the server estimates that the predicate is super selective, and therefore will scan half the table before finding the first qualifying row, therefore picking the static plan. However, if there is a qualifying row on the first page, and the user only wants the first row, then the dynamic plan is very, very cheap.

What can an application developer do?

For canned queries, via trial and error, you can use hints to get you the cursor plan you want. For query generators, it is much more difficult, and there's no one solution. You do have a few levers at your disposal, though.

1.       Picking a neutral value for OPTION (FAST <N>).

2.       Avoiding plan reuse for different parameters by calling sp_cursorprepare or by using OPTION (RECOMPILE).

3.       Avoiding ORDER BY in cursors when not really needed.

4.       Use equality predicates and multi-column indexes to support interesting orders efficiently. Modifying the example slightly, if we had a computed column SIZE with values {S,M,L,XL} for ranges of SUBTOTAL, we could query WHERE SIZE = 'XL' ORDER BY DATE, and index on <SIZE, DATE>. In general, extend the index columns to include the equality columns followed by the ordering columns.

By Marc Friedman, 8/2009

Marc.friedman@donotreply.microsoft.com

 

Note to self: ref: VSTS# 273442

 


 


[1][1] For SQL Server 2005 users, it's worth mentioning that there is a performance enhancement to fast-forward prefetched I/O in SP3 Cumulative Update 5 that brings fast_forward up to parity. SQL Server 2008 and beyond, this is a non-issue.

[2][2] In situations where the dynamic plan looks promising, the cost comparison may be heuristically skipped. This occurs mainly for extremely cheap queries, though the details are esoteric. In that case no difference between dynamic and fast_forward will be noticed.

Comments

  • Anonymous
    August 20, 2009
    Unless the order of the data is critical to calculation (such as with cumulative total), a cursor is the #worst# possible practice. For 99.9% of all problem the right solution is to write a set-based query.

  • Anonymous
    September 24, 2009
    i am a learner to ms sql server 2008. i would like to learn more about the sql server 2008 from this article.

  • Anonymous
    February 20, 2010
    Hi , I need to pull data in batches and populate ui asnynchronously. I am looking for something like google suggests but with  thousands of records and populate UI . Do you thing cursor processing will meet my requrirement.

  • Anonymous
    October 14, 2012
    @Paul, the author here is not saying that one should CURSORS every time and instead of SET-Based approaches. Yes, CURSORS do have a cost compared with SET-Based queries. But there are times when you are not able to devise a SET-Based query for some complex scenarios and thus end up with CURSOR's. Author's main intention is to let people know the best options available and to use with CURSORS to make them more optimal.