Puzzle Pieces
I was investigating a query slowdown recently, and came across something kind of odd with windowing functions and order by.
Let’s talk about these three queries:
SELECT u.DisplayName, ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second FROM dbo.Users AS u WHERE u.Reputation > 100000 ORDER BY UpVotes; --Order by UpVotes SELECT u.DisplayName, ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second FROM dbo.Users AS u WHERE u.Reputation > 100000 ORDER BY u.DownVotes; --Order by DownVotes SELECT u.DisplayName, ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever, --DownVotes first ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever --UpVotes second FROM dbo.Users AS u WHERE u.Reputation > 100000 ORDER BY UpVotes; --Order by UpVotes
Goings On
If we’re going to generate row numbers on these columns, we need to sort them.
I know and you know, we can add indexes to put column data in the order we want it in, and that’ll cut down on the amount of work Our Server™ has to do to execute this query. But we can’t just index everything, that’d be insane. I know because I’ve seen your servers, and I’ve seen you try to do that.
Plus, they just get fragmented anyway.
Here are the execution plans. This is a big picture, because I want you to spot the difference.
Fascination Street
That first plan has an extra Sort operator in it. See it up there? Right next to the Select operator?
That sort is ordering by UpVotes ascending, which is a shame because we’ve already done that once. That sort doesn’t occur in the second two plans, because the row number function has already sorted data by them. If the optimizer were a little smarter here, it could reorder the sequence it generates row numbers in to avoid that, but it doesn’t.
If we rewrite the query to do that on our own, the data ends up in the right order. In case you’re wondering, we get the same results referencing the row numbers in the order by instead of the underlying column:
SELECT u.DisplayName, ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second FROM dbo.Users AS u WHERE u.Reputation > 100000 ORDER BY UpVotesWhatever; --Order by UpVotesWhatever SELECT u.DisplayName, ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second FROM dbo.Users AS u WHERE u.Reputation > 100000 ORDER BY DownVotesWhatever; --Order by DownVotesWhatever SELECT u.DisplayName, ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever, --DownVotes first ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever --UpVotes second FROM dbo.Users AS u WHERE u.Reputation > 100000 ORDER BY UpVotesWhatever; --Order by UpVotesWhatever
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
Heh… and people continue to say that the order in which the query is built doesn’t matter. I never took the time to try to demonstrably prove that it does. This is certainly demonstrable proof that it does. Thanks, Eric.
Sometimes it’s harder than it looks, other times you stumble on it by accident.