A Little About Index Sort Order And The Order By Clause

A Little About Index Sort Order And The Order By Clause


Video Summary

In this video, I delve into the fascinating world of SQL Server indexes and query plans, specifically focusing on a phenomenon known as “surprise sorts.” You’ll see how different indexing strategies can lead to unexpected sorting operations in your queries. I explore this concept by running a series of queries against the Stack Overflow database, demonstrating how the order of columns in an index impacts whether or not SQL Server needs to sort data during execution. This video is perfect for anyone looking to deepen their understanding of query optimization and indexing techniques, as it provides practical insights into avoiding unnecessary sorts that can impact performance. Whether you’re a seasoned DBA or just starting out, this content offers valuable lessons on how to craft more efficient queries and indexes.

Full Transcript

Look at you. You look great. You look great, smell great, everything about you. Great, great, great. Erik Darling here with Darling Data. Me and my pal Bats been chatting. We had an interesting client call where, you know, when I talk with people about SQL Server, SQL Server performance. We talk about a lot of stuff. stuff to do with queries and indexes and query plans and the way different things you do in your query and design your indexes has an effect on the query plan that you ultimately end up getting, which should also should all be at least fairly clear to my distinguished audience. But, you know, some people need a little bit more help and guidance than others. That is what I’m here for. So in this video, we’re going to talk about a sort prize. Now, now, keep in mind, this is not like a sort sort prize, like, hey, you won. Congratulations. You get a thing. Here is your honorarium of some variety. This is a surprise sort. And we’re not, this isn’t going to cause a big performance issue today. This is, we’re just going to look at how index, like the intersection of indexes and querying and how you can sometimes end up with a sort that you might not expect in your query plan. But before we do that, Bats would like to remind you that you can become a loyal paying member of the channel to say thank you for all of the hard, diligent work that I do producing this content. There is a link down in the video description there where Bats is pecking away, where you can, for as few as $4 a month, support your local SQL Server enthusiast. If you have perhaps engorged your Bats engorged your Bats with Pez candies, you’ve spent all your money there, you can do other things to support the channel. You can like, you can subscribe, you can comment. And if you would like to ask me questions that I will answer during my officially branded Bats Maru office hours, you can go to that link, which is also in the video description.

You can ask me questions that I will answer. If you need help with SQL Server beyond what you can get from mere YouTube videos, you can do all, you can hire me to do all sorts of things that people find useful. Bats is being a little, a little cranky today. I can do health checks, performance analysis, hands-on tuning of your SQL Servers, helping you with SQL Server performance emergencies, and of course, training your developers so that you don’t have those performance emergencies anymore. These are all things that, according to BeerGut Magazine, I excel far beyond anyone else in the world at. So, you are free to hire me to do all of these things. And you can rest peaceably with the knowledge that you have hired a BeerGut Magazine certified SQL Server consultant.

And as always, my rates are reasonable. If you would like to get some high quality, low cost SQL Server training from me and you old bats here, you can get all 24 hours of mine to fill your brain with for about $150 US dollars. And that will last for the rest of your life. All you have to do is use the coupon code right there at the link up there, which is also down in the video description. And gosh darn it, you can, you can start being as good at SQL Server as batsmaru.

SQL Saturday, New York City, 2025. That is this year. That is just a couple months away now. Coming up, May the 10th, with a performance tuning pre-con on May the 9th. I highly suggest you attend both. And you hang out with me and become my best friend. Maybe, maybe that’ll be the start of something beautiful. Who knows?

But without it in the way, let’s talk about surprise sorts, because they’re interesting things. Now, we’re going to run a query with a few different things going on in it. And we’re going to use this query and the index that we have available.

And then we’re going to, I’m going to show you what happens with a slightly different index. So we’ve got right now this index up here. Zoom it would be so kind as to zoom. There we go.

We’ve got this one here where reputation is in ascending order. Now, for those of you who are new to this whole thing, because the users table in the Stack Overflow database has a clustered, the important part here is clustered primary key, on the ID column.

The ID column is a hidden key column. It is a hidden key column because this is a non-unique index. You notice that there is a distinct lack of a word, the word unique, in here.

So because this is a non-unique index, the ID column, Zoom it would be so kind as to erase the squares instead of just having me click buttons mindlessly. The ID column is hanging over here as an additional key column. If the index, well, that D was a little aggressive, huh?

Let’s dot that I. If this were a unique index, then the ID column would be hiding, well, somewhere in this region is an included column. But since this is a non-unique clustered index, it is an additional key column, which means that this index is sorted by the reputation column first.

Right? So all of the values for reputation are sorted from one to whatever John Skeet is, one million and something. And for all of the duplicates, right?

Because we index all the data. So let’s say for the million or so people who have a reputation of one, the ID column is in order for that. But as soon as we go to reputation two, the ordering of ID resets and we start from whoever has the lowest ID to the highest ID within the next range of values.

So we have an index where reputation is in order and an index where ID is in order within all of those ranges of reputation. So we would expect to be able to order things, have an order by in the query that helps with all sorts of stuff, helps us avoid sorting data. Now, like I said before, this isn’t going to show a big performance issue.

This is just going to show you some behavioral stuff. So if I select the top 1,000 rows from the users table and I order by reputation descending, right? Just reputation descending on its own here.

Then we get a backwards scan of this index, right? Scan direction is backwards. But we don’t have to sort any data.

We have a top and we have a scan. We do not have a sort operator in our query plan. SQL Server did not have to acquire any additional memory grant in order to put this data in the order that we have asked for it. If we run these two queries, well, the reason why we might do something like this is because the reputation column is not unique.

Remember, we talked about that when we were talking about the index definition. And if we have any ties in the reputation column, we might want to add a tiebreaker in the form of this unique ID column to our query so that we have a way to uniquely identify the top 1,000. Otherwise, if we could get duplicate reputation, not replication, we could get sort of unexpected results ordering by a non-unique column.

So let’s run these two queries. And let’s look at what happens. Now, you’ll notice that for the execution plan where we order by reputation descending and ID ascending, we have a top-end sort.

In the query plan where we have reputation, so let’s put a little square around that here, make it obvious. This is reputation descending ID ascending. In this query, we have a top-end sort.

In this one, we just have a top. We are back to our original plan. Now, the reason for this is somewhat complicated or maybe not incredibly intuitive to folks out there.

And let’s try to explain it well. If you look at the properties of this index scan, you’ll notice that we do not have a direction on this one. If you look at what happens, there’s no, like, we didn’t have like a backwards scan.

If we look at this one, the backwards scan will be back. So this one has the backwards property. This one is an unordered scan of the table.

That’s why we have to sort this over here. And, of course, the sort operation is ordering by reputation descending and ID ascending. So the question is, why, when we sort by reputation descending and ID ascending, do we need to sort the data?

But when we sort by reputation descending and ID descending, we do not need to do that. Well, what it really comes down to, and if we look at the results over here, it might be a little bit more obvious. I do have to do a little bit of surgery on this to get both of these query panes to the right place.

And, of course, we need to get to around row 276 for this to be incredibly obvious or for this to start to become obvious. So let’s look at both of those. Oh, come on, SSMS.

You’re really making a fool out of me here. So 276 or so has the first duplicate that I could find in the results. Yes, I did scroll through and look for them.

So we have the reputation 160303 here. Right? And that’s the same for both of these. But you’ll notice that the order of the IDs is slightly different for them. In the top query where we’re ordering by ID ascending, we have 1-9-6-7-9 first and then 2-0-6-4-0-3 second.

In the bottom query for 1-6-0-3-0-3, we have ID in descending order. So we have 2-0-6-4-0-3 first and 1-9-6-7-9 second. Right?

So clearly those two values flipped. Now, the way to think about why we need to sort the data in the top plan but not in the second plan does come back to the execution plan. So, again, the properties of the scan with the sort does not have a direction on it.

Right? There is no ordered. When you look at the ordered attribute, it says false. So we just read through stuff and found it.

The reason for that is because imagine that you are the SQL Server engine. Right? And you are reading through the index over here backwards.

Right? So you’re reading. You’re doing a backwards scan. So let’s just pretend that, like, this is our B-tree. And over here is the, like, ascending end of our B-tree.

And over here is the descending end of our B-tree. So the backwards scan starts over here and starts reading things this way. Right?

So we’d be reading through this index in descending order. And then all of a sudden we would get to, like, we’d be, like, you know, we’re ordering by reputation descending. We’re reading through the index in descending order. And we’re trying to, like, you know, we want to order by ID ascending.

We would get to that, like, 160303 reputation. And then we would have to, like, do a U-turn and be, like, no, now we need to sort it this way. Right?

So SQL Server, like, it can’t really do that. Like, that’s not really how B-trees work. We can’t just, like, start sort, like, reading the index and then, like, backwards order this way and then be, like, ah, but then order the other column this way. Like, SQL Server is not, like, an advanced enough product or rather doesn’t have that advancement in the product in order to do that.

So in the second query where we are, we just, we can just straight up backwards scan the whole thing. ID is already going to be, like, we’re reading through the index backwards. So ID, the ID, the order of the IDs is also going to be backwards reading through that.

So as we’re reading through the descending order reputations, ID is also going to be in descending order. So SQL Server can just chug along that and have all the data straight in the order that it wants for both of those. So that’s why we don’t need to sort there.

Of course, you can write a query that would do that for you if you were to break things up a little bit and do some manual phase separation, as the smart folks out there say. And we were to get the top thousand reputation ordered by reputation descending in an inner query. And then join, remember, we have to get, like, we’re doing the group by here so we get our unique set of reputations back.

And then out here we join back to the users on that reputation. Then we can get the data that we want in the order that we want without sorting it. Right?

And the reason for that is because we hit that index twice. Right? Look at that. We scan into the index over here and we seek into the index here. When we scan the index here, we are doing that.

We are back to doing a backward scan. So we’re reading the data from over here this way rather than over here this way from largest to smallest rather than smallest to largest. And so we do the backward scan here.

We produce the top thousand rows. And then this index seek is going forward. So we’re reading through this index.

We’re joining on reputation. We start with the lowest reputation that came out of there. And we move to the highest reputation that came out of there. So we’re in forward order on this one. And we can avoid having to sort data at all in this query plan because we have two separate reads of the index that happen in two separate directions.

SQL Server could do this for us if it felt like doing a bunch of extra work in the query plan. But I understand a bit why it doesn’t. Now, of course, if we really cared about like we really wanted an index to do that for us, we could create this index because this would be reputation descending and ID ascending.

So what we would have here is an index that you don’t need to read backwards in order to get data from this end to this end because we have the highest stuff over here. And reputation is going to be an ascending order from there. So we can get this plan over here without a sort now.

So depending on how you need to return data with your queries, you might need to think about how you create your indexes and which direction you store your data in. Because sometimes if you need to mix sort like sort orders like this, like reputation descending, ID ascending, you might need to change the way that your indexes are set up or write really much more complicated queries in order to in order to avoid having to sort data. Now, of course, like the client issue that I was dealing with was a big sort and it was taking up a lot of memory and it wasn’t using a lot of that memory.

It was getting a very big memory grant, not really using it. And so we had to think of some clever ways around that problem. And that is what brings us here today is solving problems in clever ways.

We could do the query rewrite and leave the indexing alone or we could change the index. So we have choices as far as making things work. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something about, well, either indexes or forward scans or backward scans or some kind of scans. I don’t know.

There’s a lot of scanning going on in here. Anyway, maybe that’s my problem. Anyway, thank you for watching. I will see you in another video another time. Toodle-

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.