Puzzle Pieces
I was investigating a query slowdown recently, and came across something kind of odd with windowing functions and order by.
Let’s talk about these three queries:
SELECT u.DisplayName,
ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first
ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY UpVotes; --Order by UpVotes
SELECT u.DisplayName,
ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first
ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY u.DownVotes; --Order by DownVotes
SELECT u.DisplayName,
ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever, --DownVotes first
ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever --UpVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY UpVotes; --Order by UpVotes
Goings On
If we’re going to generate row numbers on these columns, we need to sort them.
I know and you know, we can add indexes to put column data in the order we want it in, and that’ll cut down on the amount of work Our Server™ has to do to execute this query. But we can’t just index everything, that’d be insane. I know because I’ve seen your servers, and I’ve seen you try to do that.
Plus, they just get fragmented anyway.
Here are the execution plans. This is a big picture, because I want you to spot the difference.

Fascination Street
That first plan has an extra Sort operator in it. See it up there? Right next to the Select operator?

That sort is ordering by UpVotes ascending, which is a shame because we’ve already done that once. That sort doesn’t occur in the second two plans, because the row number function has already sorted data by them. If the optimizer were a little smarter here, it could reorder the sequence it generates row numbers in to avoid that, but it doesn’t.
If we rewrite the query to do that on our own, the data ends up in the right order. In case you’re wondering, we get the same results referencing the row numbers in the order by instead of the underlying column:
SELECT u.DisplayName,
ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first
ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY UpVotesWhatever; --Order by UpVotesWhatever
SELECT u.DisplayName,
ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first
ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY DownVotesWhatever; --Order by DownVotesWhatever
SELECT u.DisplayName,
ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever, --DownVotes first
ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever --UpVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY UpVotesWhatever; --Order by UpVotesWhatever
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Heh… and people continue to say that the order in which the query is built doesn’t matter. I never took the time to try to demonstrably prove that it does. This is certainly demonstrable proof that it does. Thanks, Eric.
Sometimes it’s harder than it looks, other times you stumble on it by accident.