Jaa


No Seatbelt - Expecting Order without ORDER BY

One of the greatest lessons I've learned in building software is that every technical implementation has a human element to how it is used, perceived, and interpreted.  While I've seen my share of poor technical solutions to problems, sometimes the most interesting "failure" point is the human operator and how he or she uses/perceives/interprets a software feature.  Some of them are easy to understand (trying to sell a Chevy Nova in Mexico, "All your base are belong to us" from the poor translation in Zero Wing, etc).  Others are harder for me to grok.

Occasionally I get asked about ordering guarantees in SQL Server.  In some cases, SQL Server had to break previously undocumented but somewhat consistent behavior (TOP 100 Percent... ORDER BY, as I blogged a few years ago here).  In other cases, I can find it difficult to reason through the logic ;).  Today's question has to do with not specifying an ORDER BY but still getting rows back in sorted order.  In their zeal to keep me employed with interesting corner cases in the SQL language, some have concluded:

"I don't have to specify the ORDER BY because I *know* that it will return rows in sorted order.  It always does! every time! I promise!"

So, we're talking about the top-level ORDER BY clause in a query that defines the "presentation order" in ANSI SQL.  The common reason for this conclusion is that, during development, the database developer found that a query plan always came back with the same plan.  So, if I get out the "Jump to Conclusions" game and start playing, I'll decide that everyone will get the same plan from now until the end of the universe.  The primary reason for this is that the query probably scanned an index that happens to be in the order that you desired:

CISCAN

Will that query plan change?  Well, if keep the same machine configuration (CPUs, Memory, etc) and don't ever insert data into the table, perhaps you can get lucky and keep the same plan... perhaps.  Of course, when you upgrade to the next release of SQL Server, perhaps the server has changed, new optimizations have been added, or someone adds a column to the clustered index that makes it slower than before.  All of these could change the plan, even if you didn't change a thing. 

Amazingly, if I happen to just add an "ORDER BY col1" into this query, I'll note that the query plan stays the same.  So, no runtime cost to pay to get that guarantee - perhaps just a few more characters to type.

ciorderby

So the "human" part of these kinds of problems arise when I get questions like "Where does it SAY that the system will not return the rows in order?"  It's difficult to undo the conclusion in your head when many experiments have shown, for a certain set of conditions, that the query will return rows in sorted order in "all" cases.  So, I'll try to help resolve a few of these discussions today by giving an example when SQL Server will indeed stop returning rows in sorted order.

I'm running this on a Developer Edition build of SQL Server 2008 on my personal machine, a dual-core machine with 8 GB of RAM, FWIW.  For this test, you'll need a machine that has more than one CPU available for SQL Server to use.

 CREATE TABLE [dbo].[orderingtest1](
    [col1] [int] NULL,
    [col2] [int] NULL,
    [col3] [int] NULL,
    [col4] [int] NULL,
    [col5] [binary](4000) NULL
) ON [PRIMARY];
create clustered index i1 on orderingtest1(col1)

DECLARE @i INT=0;
set nocount on
begin transaction
WHILE @i<20000
BEGIN
INSERT INTO [orderingtest1](col1, col2, col3, col4) VALUES (rand()*1000, rand()*1000, rand()*1000, rand()*1000);
SET @i+=1;
END;
commit transaction

I've created a table with 20,000 rows in it.  Perhaps this is your inventory table for the database backing your web site.

So let's go run an query that yields a plan that happens to give me results in a specific order:

 select * from orderingtest1 where col2 = 5

 

output1

Hey, that's ordered on col1!  Let's add some more rows just to be sure.

 DECLARE @i INT=0;
set nocount on
begin transaction
WHILE @i<20000
BEGIN
INSERT INTO [orderingtest1](col1, col2, col3, col4) VALUES (rand()*1000, rand()*1000, rand()*1000, rand()*1000);
SET @i+=1;
END;
commit transaction

Another 20,000 rows, and sure enough my query is fine.. There's only the one index anyways, so what could happen?  Let's ship this guy.

Then we get a new set of inventory and add yet another 20,000 rows into our table and run our query again:

 select * from orderingtest1 where col2 = 5

parallel

Whoa! we just got a fancy parallel query plan.  It uses multiple threads to scan the index and apply the filter condition, only sending up rows to the main thread if they qualify.  This can be faster on larger data sets.

(You may need to run DBCC FREEPROCCACHE if you aren't getting this plan - since the data is random, you may also need to try another value for col2.  Assuming you have enough CPUs and are running a SKU that can use them in SQL, you should get this plan eventually, just keep adding rows ;)

unordered1

Oh my - it's not ordered!

The hard part here is that there is no reasonable way for any external user to know when a plan will change .  The space of all plans is huge and hurts your head to ponder.  SQL Server's optimizer will change plans, even for simple queries, if enough of the parameters change.  You may get lucky and not have a plan change, or you can just not think about this problem and add an ORDER BY.  Interestingly enough, once you have enough rows *and* add an ORDER BY, SQL Server still generates a plan that looks like this:

parallelorder

However, if you run the query, you'll notice that the rows DO come back in order... The Exchange operator can run in a mode where it preserves order, but it only does so if you ask it through an ORDER BY clause:

properties

So, adding an ORDER BY gives things back in order.  (I have visions of the two guys in the pub mentioning this and one guy saying "A way to order the rows that come back from a query? Brilliant!")

 

The example I've done is fairly simple - one table with a clustered index using a simple query with a predicate on only one column.  Even if there were only 2 possible plans considered for this query, one of them doesn't guarantee order unless you ask for it.

There are lots of situations where plans can change in the Optimizer - for more complex queries, there can be thousands of plan choices or more, and each of them has a case when it would likely be picked.  For each of those plans, the sort for that plan can be different if you don't specify it.  So, I hope this gives you something to go try - I find that examples help me learn, so perhaps it can help you as well.

So, my advice for the day:

If you need order in your query results, put in an ORDER BY. It's that simple. Anything else is like riding in a car without a seatbelt.

 

Happy Querying!

 

Conor Cunningham

Architect, SQL Server Core Engine

Comments

  • Anonymous
    August 29, 2008
    Thanks Conor - great post and very timely! Particularly like the example that proves the point.

  • Anonymous
    October 20, 2013
    Hi Connor, What are your recommendations for how to sort a GROUP BY WITH ROLLUP?  As you know, such a thing is a great example of a "natural order of return" regardless of the presence of any kind of indexes but the general concensus is that even these need to be sorted.  Would you consider them to be an exception to the rule or not?  And, yes, I realize that if you add something like a ROW_NUMBER() to the return, that will trash the "natural order" but, if no such thing were added, would you still recommend sorting (usually by GROUPING()) GROUP BY WITH ROLLUP?

  • Anonymous
    October 20, 2013
    And apologies for the misspelling of your name.  It seems that the letter "N" on my keyboard sometimes has a mind of its own.

  • Anonymous
    October 20, 2013
    no, group by and group by with rollup and all other operators do not guarantee a return order.  You ALWAYS need to use ORDER BY on the queryspec if you want the output order guaranteed.