Learn T-SQL With Erik: ORDER BY
Video Summary
In this video, I delve into the intricacies of non-deterministic order by clauses and unexpected query results in T-SQL, drawing from my upcoming course that is currently on presale. With the material nearing completion and tech review underway, I’ve already started recording some sections, which will begin appearing on the training site shortly. This video serves as a sneak peek for those who might be attending the pre-cons at Past Data Summit in Seattle, where you’ll gain free access to this valuable content. For everyone else, now is the time to act if you want to avoid doubling your regret later—grab the presale deal before prices go up. We explore how non-deterministic order by clauses can lead to inconsistent query results and discuss best practices for ensuring stable outcomes in T-SQL queries. Through practical examples, I demonstrate the importance of incorporating unique columns as tiebreakers and highlight common pitfalls that developers might encounter.
Full Transcript
Erik Darling here with Darling Data, and in today’s video we are going to go over a little bit of the material from my T-SQL course. You know, the usual spiel about this stuff, it is on presale right now. The course is completely written at this point, just finishing up the last bits of tech review and tidying up and I’ve already started recording some of the already done content. So, that’ll start appearing over on the training site in the next week or so. Of course, this is the companion material to the stuff Kendra and I will be teaching at Past Data Summit. If you’re attending the pre-cons in Seattle, then you will get free access to the material. If you’re not attending the pre-cons in Seattle, you get nothing. You have to buy it just like everyone else. Once the material is fully published, of course, it will be going up to the full price, which of course will be double the presale price. So, get in now to have half the regret later on. Who knows what you could have done with the other half of the money that would have been good for you. You could have bought so many lottery tickets. Anyway, we’re going to talk a little bit about order buy. And if you find yourself watching this material and saying, yeah, Erik, I know, well, congratulations. This is the beginner material and you just might not be a beginner, which is wonderful for you, right? But for those who have not yet progressed to your lofty place in the T-SQL world, this sort of education is necessary. So, we’re going to talk a little bit about order buy and specifically non-deterministic order buys.
And then, of course, expecting queries to return in a specific order when there is no order buy, which is another thing that people tend to do. They observe a query running and returning results in a specific order. And then if it ever doesn’t return the results in the order, they freak out. They’re like, but it was all this way before. So, do that a little bit.
So, first, let’s look. Query plans are turned on. Look at that. Baked right into the demo. You’d think I’ve been practicing or something. If we run this query a few times, what we’re going to see is sort of in general, the results, like just over here in the ID column, are going to come back in all different orders. And, of course, the reason for that is that this particular sort order is non-deterministic.
The reputation at the beginning part of the index, or rather at the beginning part of the values, there’s no index on reputation currently, the beginning part of the values for the reputation column are all ones. Right. And so, all we’re doing is ordering by the ones, but the ones can come back in any order that they want. Right. As long as they’re just ordered over here. So, if we keep running this a few times, we’re going to notice that the ID column flips around quite a bit.
Right. This looks different from the last one. We were on this and just keep running it. Now negative one is at the top. Negative one is still at the top with some other stuff switched around. All right. We just keep doing this over and over again. We’re going to see just like different results depending on like threads and timing and all sorts of things.
So, if we want a deterministic order by, if we want to always present results in a specific way, we need to incorporate a unique column as sort of a tiebreaker to our order by elements in here. Right. So, we need to add in, in this case, the ID column from the users table, which is the clustered primary key. Right. So, if we run this, then not only are we ordered by reputation here, but in order to tiebreak the ordering to make sure this is deterministic, now we order by the ID column as a secondary element.
So, we order first by reputation, but then for duplicates and reputation, we order by ID. And this will return absolutely stable results every single time. We can’t, we can’t mess with this ordering because of that, because of the uniqueness of the ID column.
Now, this behavior can be sort of misleading to a lot of people. If their data doesn’t have a lot of duplicates, just some duplicates, right? So, if we flip the ordering of reputation and we just say order by reputation descending, then, you know, there’s not a lot of overlapping values at the top of the food chain here.
The reputation column is like, largely like, once you get up this high, not a lot of people have like, duplicate reputations where you would need a sort of tiebreaker thing. So, you could observe behavior like this working quite well for a while. But then, after a certain point, you are going to start hitting duplicates and things might start flipping around on you.
So, just for example, the first reputation at the high end that I could find that had a duplicate value when it was 160303. And sometimes this takes a few runs to kind of get to work correctly. But here you see the reputation 160303 at the very top here.
I don’t think there’s any other dupes down below. But you see IDs 206403 and 19679. If we just kind of run this enough times, you’ll see those two columns flip back and forth.
And that’s because, you know, they are duplicates. And so, SQL Server only has to return this thing in order to a certain point. But within duplicates in there, it can return those in any sort of flip-floppy way that it wants, right?
We don’t know which one we’re going to get back. This might not seem like a very big deal to you in this pretty narrow case. But I guarantee you, at some point in your T-SQL developer career, you are going to have someone file a bug report and say something like, Hey, I’m getting weird results back here.
And you’re going to have a non-deterministic order by somewhere in your query that is messing things up for everyone. So, let’s just look at a quick example of where you might see, like, just the results change if your database changes, right? So, let’s pretend that for years we had this table and this query.
And every time we ran this query, I’ve got a couple index hints on here just to make sure that, just to show you what happens when we use different indexes. And so, this first query is always going to be hinted to use the clustered primary key on the post table. And this query is always going to be hinted to use the nonclustered index last editor display name on the post table.
That’s the index that we created right here on last editor display name. It’s very, very explanatory. It’s almost nearly self-documenting if you want to feel special about it.
But let’s just say that for years we had this query running. And every time this query ran, we got results in a specific order and everyone was happy. And at some point, someone was like, oh, well, there’s a missing index request or something.
We’re going to add this index and SQL Server is going to use this index and we’re all going to be thrilled. And if we run this, just run these back to back. Let’s zoom it, getting a little weird on me.
You’ll see that these two queries return data in very different orders, right? If we just kind of make that about half and half, we just get this stuff back. Like these two result sets are just way different, right?
5, 2, 2, 2, 2, 3, 4, 2, 2, whatever. This one, 2, 2, 2, 2, 3, 2, 2, 4, 18, 4, 2, 16. So we just got way different results back running the same query just using different indexes.
If we look at the execution plans, we get some sense of why. Well, we do an index seek into the post table here and then we do a hash match aggregate. And of course, hashes, when you see a hash, there’s absolutely no ordering required for that, right?
Hashes just mush everything together and throw it along to the next thing. And then the second query, we do an index seek into the nonclustered index. And then we have a stream aggregate.
And stream aggregates do require sorted data. If we didn’t have an index on last set or display name and SQL Server chose a stream aggregate, you would have to pre-sort the data for us.
In this case, we have the index, though, so we’re good there. So SQL Server just streams the results into there and shoots it along to the next operator, the filter operator. But in both of these cases, like neither one has an order by, but they both return results in slightly different order because the query plan is different and they use a different object to initially access data in the table.
So this is the kind of stuff that you’re going to have to deal with as a T-SQL developer, as you’re writing queries, as you’re troubleshooting bugs and results and stuff, non-deterministic order buys or just expecting results to always return in the same order no matter what, without any order buy whatsoever.
Two very, very big mistakes that people make quite a bit. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope you will consider purchasing this course for the pre-sale price. The link is down in the video description.
If not, I will see everyone who is going to get this for free at Past Data Summit, I guess, in November. Anyway, thank you for watching. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.