A Little About Index Design Patterns In SQL Server

Thanks for watching!

Video Summary

In this video, I dive into index design strategies and their impact on query performance in SQL Server, specifically focusing on how indexes affect filtering and ordering. We explore why missing index requests often fall short of expectations and delve into the nuances of creating effective indexes for complex queries. By analyzing two sample queries and experimenting with different index configurations, I demonstrate that while some index designs can significantly improve query speed, others may not yield the desired results due to data distribution and query specifics. The video also touches on common misconceptions about index leading columns and the limitations of SQL Server’s missing index feature, providing valuable insights for database administrators looking to optimize their indexing strategies.

Full Transcript

Erik Darling right here with you, live, sort of. I mean, I feel pretty lively. Today, what are we going to talk about? I don’t know. How about some index design strategy? All right. We’re going to talk about how indexes affect filtering and ordering, and we’re going to talk about why, I think for the 10-bajillionth time, why SQL Server’s missing index requests are quite often underwhelming in their very existence. And how there is only one person to blame for that. He doesn’t care. All right. So, let’s make sure we got nothing going on here. Oh, God, I had some statistics. Look at these awful things. I bet they’re duplicates. I bet someone should delete those for performance. That was a callback to another video. Welcome. Welcome to my brain. It’s awful. It’s not a good place to be. It’s treachery. All right. So, we got two queries here. Let’s make sure query plans are enabled because those are important for everything that we do. They mostly function correctly, which is nice. Sort of like my brain. Mostly functional.

So, we’re going to run these. And we got no indexes right now at all. Zero indexes. Both of these queries return the required 5,000 rows. The problem is that neither one of these queries is great and fast and amazing. When we search for a sort of small amount of data, we scan a clustered index. We sort all our data, and it takes about 600 milliseconds. Not the end of the world, but you know what? Not every query to any extra exercise needs to start with something that takes 35 minutes and ends with someone being like, dude, I added an index. It was three seconds. It’s not every story. Not everything ends that way. It’s annoying. And when we search for more data, and we have to do some more sorting, we end up spilling a little bit here. Ouch. Ouch. I chihuahua. And then this one takes about 1.1 seconds. Almost twice as long as this one. Monocounted this sorty spilly thing over here. And I guess there’s like 100 milliseconds on that. Now, these queries have both asked for the exact same index. On post type ID, last activity date, as the key columns, because they’re in the where clause. And then include score and view count, because they are in the query, but they are not in the where clause. Our only columns in the where clause.

Right here. Right here. Post type ID and last activity date. We are ordering by score descending right here. Which seems important. It seems like something that an index would help with. Doesn’t it? Seems like we maybe had an index. Maybe we wouldn’t have to sort that data. Maybe we wouldn’t have to risk spilling that data to disk. Maybe we could have these queries run and not ask for any memory. Wouldn’t all of these things be grand? The thing is that the index that SQL Server asks for, which is just about this one. If you remember the green text above, here’s our where clause columns. And here’s our oh, they just happen to be there columns.

So let’s create this one. And I’m going to name this index whatever, because I’m allowed to do whatever I want. It’s my computer. And in the words of John Crook, I’m taking my ball and going home. So this index takes a minute to complete because, well, you know, there has been some chatter recently. I had a video about why some indexes are slow.

And of course, Paul White saw that video and absolutely dumped on me with the technical details for why the actual technical details for why it’s slow. Not just me being like, yeah, because they ended up on like three threads and it stunk. He like broke down the entire algorithm of parallel index creation. I will find the link to that and put it in the show notes because, gosh, gosh, is it impressive.

All right. So now we have the index that SQL Server asked for. Both of these queries will use that index. And both of these queries will, well, sort of benefit from it. This query does spectacularly with it. Does great. Four milliseconds in a seek, six milliseconds in a sort.

This one gets about, this one does better too, right? It’s about twice as fast as it used to be, right? It went from 1.1 seconds to about 600 milliseconds. So now this query with the index is about as fast as this query was with no index, just the clustered index on the post table.

So, and this is a, this is the sort of mixed bag that you get from most of Microsoft’s advice. Some things, yeah, they get better. Other things, not so much. It’s not great. It’s not fantastic. It’s just okay. So let’s, let’s try an index that I would, I see a lot of people go for when, when they’re dealing with this stuff.

Because a lot of people have this strange misconception about how indexes work. And a lot of people will think that if they have to sort by something, that that sort column has to be the leading column in the index. Right? So because we’re ordering by score descending, they think, whoa, well, we’re ordering by score descending.

That better be first so that we have that index sorted by that. So let’s, let’s recreate the index with score descending first in the index. And we’re going to see what happens in this case, which is, well, I should probably not hover down too low.

Because that’s going to completely spoil the final index that we’re going to create. And, I don’t know, ruin the whole video. Way to go, me. All right. So let’s run these two now.

And with score very first in the index. Well, it kind of flip-flops here, doesn’t it? Ugh. This one slowed down a lot.

This one is slower than it’s ever been. Now, both of these, both of these queries are single-threaded now. And the second query, well, this one, holy cow, zoom it. What are you, what is on your mind?

This one, really fast. Right? This one’s crazy fast now. This one, crazy slow now. The reason for that is because, well, to talk about post-type IDs a little bit, post-type ID 4s are pretty rare in the data.

There are not a lot of post-type IDs 4. Most of the post-types in the Stack Overflow database are, one, where there are about 6 million, two, where there are about 11 million, and then 3 through, like, 8 or something, which there are, like, a few hundred thousand.

So post-type ID 4 is rare. So once we sort by score, it still takes a whole lot of, like, it still takes a whole lot of, like, scanning through to find all the, find post-type ID 4 columns in that index.

So that’s not a very good index choice either. So we still have very mixed results. We have, we’ve created two indexes now. The first index helped the first query and sort of helped the second query.

The second index completely just neutered the first query and really helped the second query, right? The second query is now as fast as the first query was with the first index.

Promise that makes sense. So let’s go with the third index here because one thing that a lot of people overlook about B-tree indexes is that when you navigate them correctly with equality predicates, order is preserved throughout the, throughout the, throughout the, the second, the column after the equality predicate.

So where, you know, columns for, like, post-type ID where there are a lot of duplicates, you’re going to have, like, a whole lot of the number 1 and then all the scores and, like, for that range of number 1 are going to be ordered descending in this case, because we said descending in the, in the index definition right there.

Look at us, look at us go. And so that, that helps, right? Because we seek to the number 1 first in the index and then we have score in the order we care about and then last activity date, I mean, that’s an inequality predicate over here.

So, you know, it’s, it’s maybe not the most ideal situation to have score gatekeeping last activity date. Maybe some situations where I wouldn’t want that.

But now, with that index in place, both of these queries are equally as deliciously fast. Now, the important thing is, let me bring these up, let me get it in there, squeeze in there, everyone hug.

Big hugs for everybody. So the two important things that this index did for the, for the query that we’re running here. One, it allowed us to seek to the post type ID that we care about.

Two, notice that there’s no sorting in this index, in this query, query plan rather. There’s no sort operator. There’s a top, but we don’t need to sort the data because after we seek to post type ID, we have score in the order that we want it to be.

Then we’re able to evaluate our last activity date predicate as the final part of the seek. But it is a residual predicate, right? It is a predicate way up here in the residual nosebleeds.

And if Zoomit would listen to me, we would have a much better video. We would have high quality videoing. So last activity date is a residual predicate up here.

Post type ID is a seek predicate in here. And yeah, we don’t do any sorting. So without needing to, without actually having to sort data, again, the stuff that we avoid, you know, if this were a properly parameterized query, we would avoid, like, you know, parameter sensitivity issues with memory grants.

We would avoid, you know, potentially spilling the disk, potentially asking for way too much memory for some executions. We would not avoid green screen fuzzies over there. Those are annoying.

Anyway, a little bit about index design there. A little bit about how to approach different index design choices. Remember that the missing index request feature has a lot of blind spots and does not aim to please every area of your query.

It only aims to please the where clause. Other relational areas that are great, relational areas, yuck, that are good to index for, like order by, group by, joins, things like that.

It just ignores them and sticks those columns in the includes, which isn’t cool at all. So careful, careful with those. Careful with the green text, as they say on 4chan.

Anyway, I’m going to go do something with my life. I don’t know what it is. Might be good, might be bad, might jump out a window again.

We’ll have to wait and see. Anyway, thank you for watching. Hope you enjoyed yourselves. Hope you learned something. If you like this video, compelling comments, thumbs ups, are great, great ways to tell me.

If you don’t like the video, and you feel so utterly driven to click the thumbs down button, I mean, you can tell me why if you want. I don’t care.

You can also just click thumbs down, and I’ll just say, oh, you’re that one person again. But you can also share your thoughts and feelings with me. I want to know. I want to know what love is.

Actually, for you, it’s probably hate or something, right? You’re giving me a thumbs down. If you like SQL Server content about performance tuning, indexing, query tuning, what other spam SEO words can I throw in this?

I think I’ve run out. Subscribe to my channel. A lot of other people do. We’re coming up on like 4,000 people. 4,000 people get notified whenever I say something, which is strange, to say the least.

Anyway, thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.