Learn T-SQL With Erik: Indexing and Sorting And Searching

Chapters

00:00:00 – Introduction to Indexing and Sorting
00:02:34 – The Impact of Indexes on Query Performance
00:05:16 – Common Misconceptions About Indexes
00:07:28 – Creating an Imperfect Index for Specific Queries
00:09:23 – Understanding Row Goals and Their Effects
00:10:40 – Optimizing Query Plans with Shared Ordering
00:11:05 – Recap and Conclusion

Full Transcript

Erik Darling here with Darling Data, and today’s video we’re going to continue getting little whiffs and whispers and snippets of my Learn T-SQL course. This one is about indexing and sorting in a query. So, I know it’s not the snappiest title, but you’re just going to have to learn to live with the fact that I am not a snappy marketing person.

So, if that’s what it takes, sorry I don’t have it. Anyway, down in the video description, all sorts of helpful links, including one to purchase the entire corpus, the big body of this training, that is going to be down below in there. There’s even a coupon code attached to it if you celebrate coupons, many of you do.

There are other links as well, you can hire me for consulting, you can buy my other training, Lord knows I’ve got other stuff. And you can even choose, if you appreciate this and you don’t want to see me turn this into a tickling enthusiast channel, then you can also choose to support my efforts here for as few as $4 a month.

You can also ask me office hours questions, and if you want to be my friend, then you will like, subscribe, and tell your friends, so that the numbers associated with this channel continue to rise in the correct direction. Down in the video description as well, also, my goodness, you can also download my completely free SQL Server monitoring tool.

It is open source, you can see everything it does. It’s all the stuff. It’s all the stuff that I would care to look at if I were working for you and monitoring your SQL servers.

One thing I can’t vouch for entirely is if you want to have your robot friends look at your monitoring data, you can do that. There’s MCP tooling built in. But, you know, of course, the advice that you get will not be anywhere near as good as the advice you would get if you really did hire me.

But if you don’t have any money, then that’s going to be your silver medal. Upcoming events, down to, at this point, well, I mean, it’s, at this point, I will be on my way home from Poland, so Poland, I hope I had a great time.

I hope you enjoyed having me. And maybe I’ll see you again in the near future. I’m sure I was problem-free. But I will be at Data Saturday Croatia June 12th and 13th with an advanced T-SQL pre-con.

I will also be at PaaS Data Community Summit in Seattle, Washington from November 9th to 11th doing who knows what. The gods must be crazy. Erik Darling is going to Seattle again.

Maybe I’ll just get a barista job shortly before the baristas all get replaced with robots, too. But for now, it is May, and we will continue our May monitoring tool mogul month with creepy, gothy database skull whispers. Ah, man, it never gets old.

All right. So now you’re getting a preview of next month. You can’t see how grand it is in that. Maybe if you zoom enough. But I promise it’s going to be a banger.

When that thing goes full screen. But anyway, I’ve got two queries here. And the only difference between these two queries is the way that data gets sorted for them. This one is ordering by ID descending, and this one is ordering by ID ascending.

But they have very, very different execution times. If you look at the results for them, well, the results don’t matter. What matters is the query plans.

All right. The first query, which is ordering by ID descending. Go away, tooltip. You were supposed to zoom it. And you didn’t. And now you’re making me look foolish in front of all these nice people. The first one that orders by ID descending finishes instantly.

Zero milliseconds. The one that orders by ID ascending takes 18.6 seconds. Dear God, why? Why would you do that to a poor query?

All right. What did that query ever do to you? So what’s interesting here. So this is the votes table in Stack Overflow 2013, which means time ends on January 14th. January 1st of 2014.

And the votes table has about 53 million rows in it. And a lot of them are vote type ID 2, which is an upvote. The big thing here, though, is that ID being the clustered primary key of the votes table, it means that it shares something with the creation date column.

So that ID column is an identity, meaning it is monotonically increasing. Every time a new row comes in, it comes in with a higher value. Same thing with the creation date column, right?

Because you’re not creating like a higher ID with an older column. You’re creating a new row. You’re creating a new ID with a higher ID. And that’s just silly.

All the creation dates are going up too. So when we sort by ID descending, we get to the range. We get to the pages where we have the 2013, 2014 data a lot more quickly. All right.

These are both just scanning the clustered index. I’ll go back and show you that in a second. But the one that sorts by ID ascending, we have to go through a lot more of the table until we get to the rows where 2013, the last month of 2013 happened, right? So, you know, again, both just scanning the clustered end.

I do wish SSMS offered you an option to expand object names because I don’t like being left hanging. It’s a little obnoxious. But you can also see shades of that.

I generally hate this query plan, but it’s good to illustrate sort of the problem that we have here. So I’m getting the min ID and the max ID with the same where clause, right? And keep in mind, we don’t have to sort this data, right?

We don’t have to like… Rather… I’m not saying order by here, right? But implicitly, min and max have to sort of order by to find the lowest and the highest. If we look at the query plan for this, one of those branches finished very quickly.

You can bet this is the max branch. One of those branches finished pretty slowly. It took 6161. Look at that symmetry.

It took 6 seconds to finish or 6.161. 6 seconds to finish. You can bet this is getting the min. Well, I do find this query plan helpful to illustrate the problem. That we’re having here.

It is highly offensive. If you want to… If you see query plans like that and you are equally offended, you can add these hints. Index 0, which is in use even though we have a clustered index and a tablock hint. And if we run this, we will get just one single pass of the clustered index where we find our min and our max.

Fun things, SQL Server. What will you think of next? I don’t know. But like I was saying, ID and creation date have that sort of increasing value in common.

Now, what I want to do here is create what is probably sort of like maybe an imperfect index, right? But it’s to show you something a little bit later on. Now, will that create…

Actually, I think I have the query again down here. We don’t have to go all the way up. Now, we’re still going to stick with the query ordering by ID ascending, right? And like we would think that, you know, the index up there that leads with creation date would be able to find that last index.

We would be able to find that last month of data very, very quickly. And then we would be able to, after we locate that range of rows, then we would be able to locate the vote type ID 2s. And then maybe we’d be able to order by ID.

But B-tree indexes just don’t work like that because the initial greater than equal to is already messing up like the preserved order of the equality predicate. But just watch, right? Okay.

So let’s just run this thing now, right? We do this. We say, hey, go and run, right? And it’s still going. It’s still going. It finishes faster, right?

Notice it finishes faster, but it doesn’t use our index, right? We are still scanning the clustered index of the votes table, but it finishes much more quickly. Okay.

That’s interesting. Well, let’s focus on why here a little bit, right? So let’s run these two queries. And I’m going to run these.

I’m going to keep this query at max.1 just to keep the plan a little bit more simple. But in the first query, we’re letting it run as it is, right? In the second query, we’re telling SQL Server to use our index. So this is kind of what’s happening here.

Now, the first one that scans the clustered index still finishes in about six seconds now. So that’s cool, I guess. But the second one down here finishes like instantly, right?

Well, just about instantly. Much faster than, like, I mean, really, just like six seconds faster. We have 6.39 versus 300 milliseconds. Just chop off the six.

They’re basically equivalent. But if we look at the cost that SQL Server puts on this one, right? Very, very low cost. Estimated subtree cost, 0048194, I can read. And this one has a subtree cost of 125.232.

SQL Server costed the index seek into the votes table very, very highly, right? I believe this is where it pops up. But, yeah, if we look at this, we have a row goal set in this, right?

Because we’re using offset fetch. And the row goal is cheapening the clustered index access. Let’s see.

Properties. This query, if you look, does not have a row goal attached to it, right? We do not have that.

If we just flip back and forth between these two, the estimated row goal, estimate rows without row goal is on for this one, but not on for this one. So that’s a little strange and confusing there, right?

This query uses a top. This query uses a top and sort. So in reality, all the rows that we want to get need to get to this sort operator. And then we’ll do that.

So the query plan changed in a way where the row goal got removed from the second one. And so that query appears to be much cheaper to SQL Server because now SQL Server is like, well, with this one, I think I can do this in 10 rows.

With this other one, this is going to take me a lot more rows to do this. And man, that is a sad, sad bummer, right? So what we can do is change the optimizer’s mind by taking advantage of the shared ordering between creation date and ID, right?

Well, we do need to keep ID as a tiebreaker because creation date is not unique. It’s not guaranteed to be unique, right?

So what we’re going to do is say, let’s order this by creation date and ID. And this also buys us a little bit of extra juice, because now that we’re ordering by creation date, these inequality predicates don’t mess us up with preserving the sort of B-tree index ordering.

And this finishes absolutely instantly, just like we would want, right? And that’s seeking into our fancy index there. So like I said, this ultimately happens because of the row goal that offset fetch introduced.

If we run this query without… Why are you doing that, SQL Server? If we run the first query that we did without the… Sorry, with the row goal disabled, then we will get a pretty good query plan out of that as well.

But anyway, that’s all I had on this one. Index, searching, sorting, ordering, indexing, all the stuff that I promised would be in here.

I hope you enjoyed yourselves. I hope you learned something. And I will see you in Tuesday’s Office Hours video. All right. Thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Indexing and Sorting And Searching