A Little About Choosing Index Key Column Order
Video Summary
In this video, I dive into the fascinating world of index key column order in SQL Server indexes. After a bit of an uncharacteristic “big arms” moment that seemed to take over my screen, we get right back on track as I explain how different types of predicates can significantly impact which index is chosen and why. We explore scenarios where equality predicates might not always be the best starting point for indexing, especially when dealing with skewed data like in our example table `post`. By comparing query plans using forced seek hints, I demonstrate the performance differences between seeking on a non-selective column first versus a selective one, highlighting how these choices can dramatically affect application response times. The video also touches on dynamic SQL as a potential solution for handling unpredictable query patterns, making it a comprehensive guide for optimizing your index design based on real-world querying behaviors.
Full Transcript
Erik Darling here with Darling Data. I don’t know why I did the big arms thing. It was just like that. Sometimes you just lose yourself in these moments. You get all worked up. You get all riled up and excited and next thing you know it’s big arms. Whatever. Stop looking at me like that. In today’s video, this is a video of the video. This is gonna be, I guess, I guess this is gonna go on the a little about playlist because we’re gonna talk a little bit about choosing index key column order. I’ve done a lot of videos sort of about how SQL Server stores data in B-Tree indexes, how you can take advantage of that for various things like seeking and sorting and all that other good stuff. But this one is a little bit different because in this one we’re gonna talk about how different kinds of predicates can end up being, like, how different kind of different kind of shut up until this thing all the time. Yeah, just scooted it out of the way with my elbow. Pretty good, right? Different, like, types of predicates can be more selective than others. And how the sort of general advice to index purely for equality predicates first isn’t always the best advice. And then finally, how sometimes you might even need to maintain multiple indexes for different types of predicates.
types of queries. And, you know, going even beyond that how you might need to use dynamic SQL, in order to get SQL Server, or rather to direct SQL Server to the correct index sometimes. Particularly because at least as of the 2020 SQL Server 2022, iteration of the parameter sensitive plan optimization, like it only works with the quality predicates, it does not work with inequality predicates. So if you’re dealing with a parameter sensitivity issue, because of like a greater than, equal to less than, less than, equal to situation with like dates or something, it doesn’t, it’s not going to help you at all there. So that’s pretty screwed up. I have another video that I have to do. This can be a follow up on the last parameter sensitive plan optimization video that I did. That’s going to be fun too. So yeah, we’re going to talk about that stuff. But before we get into all that, of course, it is time to talk about the health and wellness of this, this here YouTube channel. If you want to sign up for a membership to support my endeavors and bringing you this, well, let’s call it high quality SQL Server content. Because that’s that’s a nice way to that’s a nice way to frame it. You can do that right down in the video description. There’s a link to become a member, you can join the 51 other people who have chosen to hand over some small quantity of money to me every month. And be I guess thankful for that.
If you don’t have four bucks a month, I don’t know, maybe maybe you got robbed on the playground, you can like you can comment, you can subscribe. And if you want to ask me a question, if you have a like what a good SQL Server question about SQL Server performance, you can go to that link right there, which is also down in the video description. And you can click on that. And you can ask a question. And eventually, I will do a round of office hours questions and answer it. And won’t that won’t that won’t that just be grand? If you need help with SQL Server, if you are ready to ready and willing to bring in a young and handsome consultant with reasonable rates to solve all of your SQL Server performance maladies, you can hire me for any of these things. And once again, BeerGut Magazine rated me the best SQL Server consultant in the world outside of New Zealand.
If you would like to get access to my training, you can get all 24 hours of it for about 150 US dollars. Again, that URL, that discount code, coupon code, whatever you want to call it, right down there. Just click somewhere in this general area.
Upcoming events. Well, well, well, well, SQL Saturday, New York City 2025 will be taking place on May the 10th of 2025. You can go.
Times Square, Microsoft offices. I don’t know. Maybe you can make a friend in the Port Authority bathroom while you’re there. There’s all sorts of, you know, it’s a big, crazy, wild city.
Anything can happen. But with that out of the way, let’s party on, Wayne. So I’ve got my database set to compat level 140.
I forget why I did that. There’s probably a good reason, but it doesn’t matter too much now. And I’ve also got two indexes on the post table. If Zoomit will cooperate.
Aha, there we are. Finally, listen to me. One of them is very descriptively named PSC. It is on post type ID, score, and creation date.
Hence, PSC. The other one that I have is called… Oh, Zoomit’s going to do one of these things again.
All right. All right, Zoomit. The other one is called CSP. Mmm. Very descriptive again. Creation date, score, post type ID. I don’t know why I’ve decided to give them all funny little toupees.
Or maybe they’re eyebrows. I don’t know. Unibrows, I guess. Monterbrows, whatever they’re called. So I’ve got these two indexes. And we’re going to look at how different queries can use these indexes more effectively depending on the selectivity of the sets of predicates that we’re passing in.
Now, if both sets of predicates are equally selective, it almost doesn’t matter a lot which index you use. But in certain cases where post type ID is not selective, but the creation date filters are, or the reverse of that where creation date is selective, but post type ID is not selective, which index gets used can make a big difference.
And then we’re going to talk about one other funny little thing that can happen along the way as well. I’m just going to make sure that I have that set up. I’m going to make sure that these indexes are created, but they should be because I feel like I was pretty smart about this beforehand.
And thankfully I was. So this is our starting store procedure. And we have one instance of the query where we’re going to force SQL Server to use the, to seek into the index called PSC that leads with post type ID.
Kind of a funny set of stuff with the force seek hint. I don’t know if you ever read the documentation for it fully, but with the force seek hint, not only can you tell a SQL Server which to do a seek, you can tell it which index to seek into, and then which key columns, or which key columns you want it to seek with.
And it’s wonderful. You can, you can do all sorts of things. So if this syntax looks funny to you, I would highly recommend going and reading the documentation for the force seek hint, because you can do all sorts of neat stuff with it that you maybe didn’t know about, because you didn’t read the documentation for years like me.
Ta-da! But here we are. So one is going to fork, force seek, not fork, into the PSC index.
That’s this one here. The other one is going to force seek into the CSP index here. And we’re going to look at the differences in query plans when we have selective and non-selective predicates in play. So we’ve got query plans turned on up here.
Let’s run it for this one first. And what I want you to see is that post type ID 2 is very much not selective. There are 11 million rows that match the post type ID 2. And this date, but the date range we have here, the start and end dates are very selective.
It’s just a month, right? So 2008-12-01 through 2009-01-01. If you run this, we’re going to get both queries, but we’re going to see that the query plans are somewhat different in their speed and effectiveness.
So the SSMS will listen to me. And it finally did. We have our seek into the PSC index. So we sought to all 11 million post type ID rows.
And then we had to do a follow-up on that, a residual predicate after that, to apply the predicate on creation date. This all took 1.4 seconds.
And down here, where we did a seek into the CSP index, in other words, creation date first, and we just found that month of values first, that whole thing took about 19 milliseconds.
So we’ve already got a pretty big disparity here, where seeking to all 11 million post type ID 2s and then applying a residual predicate on creation date, that’s what this one is doing.
Oh, thanks for disappearing on me, tooltip. Very nice of you. So we do seek to the post type IDs that we care about. And then we have this residual predicate on creation date here. Now, this is not because score is in the middle.
Score is the middle key column, but range predicates like this, you know, often do end up as residual predicates anyway.
But notice that there is a difference between these two query plans. When we use the PSC index, and we did a seek to post type, all the post type ID 2s first, we didn’t need to sort the data afterwards.
Now, granted, this is 40,000 rows, so it’s not that big of a deal for us to sort in this case, right? 40,000 row sort is almost never going to be your biggest problem, unless like the next one is like 40 million rows or something, right?
You parameter sniff that initial memory grant and things go terribly. Because memory grant feedback, remember, that’s a post execution feedback loop. That doesn’t help you get more memory while a query is running.
That won’t happen. Memory grant is adjusted after the query finishes. So you might have a query run for like an hour spilling data everywhere. And then when it’s done, SQL Server is like, well, I can fix that for next time.
It makes a little note. So, but this happens because when we do this seek, we have, we do a seek in here, right? And we find the range predicate.
Then we have an equality predicate over here. But the way that the B tree is set up with creation date first, like even with an equality predicate on post type ID for two, like that doesn’t help us with the sorting of score, right?
Because score is a second key column. So it’s sorted by creation date. I did experiment with this with an index on creation date, post type ID, then score in descending order.
And there was still a sort because even with that setup, the, like you have like the creation date column and the way, and like I have other videos on how B tree index is stored.
You should watch those if you’re kind of not filled in on it, but like all the like individual creation date entries, like post type ID is sorted in there. So it doesn’t help to have the range and then the equality, but you still have to sort this column.
This column is still not going to be sorted the way you want it on the way out. So for this one, the, for the, the, the selective predicate on creation date, using the index that leads with creation date was the better choice, right?
By like, like just about a second and a half, right? So like looking at like a, like a fairly big difference there in like application response time. If we, if we do that backwards, right?
And we say where post type ID equals four, but then, right? Which is very selective, right? It’s about 700 rows, but then we have, oh, why is this red? This should stay pink.
For this, this is a, this date range is wide open. This is literally every date in the table, 2008 to 2014, right? So this is every single date. Now I could have made this a little bit smaller or something, but you know, for dramatic effect, I have left all of, I have left the entire range in.
And if we run this, things are going to end up just about backwards, right? So, and this one, we do a very fast seek into the index, into the index that leads with post type ID.
And in this one, well, I mean, look, look at the times, right? The last one was about like 20 milliseconds that was for the faster plan. This one is seven milliseconds. And this one is back up to like a second and a half.
So even though like, you know, even though like the top one is great for this set of predicates, it was not great for the, for the non-selective post type ID set of predicates.
So which index you use or like how you design the indexes really does depend a lot on how people query the data. If you have people who are always querying for a narrow range of dates, and like, you know, you have a quality predicates, especially on very skewed data, like the way the post type ID column is skewed in this table, you would probably want, you would probably want to do, have your index lead on creation date.
But if you don’t know, right? Like if we, if you look at the, the distribution of the post type IDs, like some of these are selective, like down here, like 25,000 rows down to two rows.
Those are all fairly selective. These two are not very selective. So like you probably don’t want to equal server, like you probably don’t want your queries going to find all this stuff and then applying the range predicate second.
At least, you know, the way the query is written here, there’s, there’s, you know, probably more, there are more of, there are more verbose ways you could write the query where you could probably, you know, have pretty, you know, reliable performance from either one, but it would just take, it would take way more typing and probably another index or, yeah, probably another, probably different index that has the ID column in it so that you could like, you know, kind of like correlate, you know, find the post type IDs, find the creation date, sort of join that stuff together or use exists or cross apply or something.
But anyway, like, so like really depending on how people query the data, you might have, there might be some other considerations as of like how you design your indexes.
Now, one thing that you could do if you’re just not sure how people are going to query the data is maintain both indexes and do some work with dynamic SQL in order to have SQL Server go to the right index, depending on some stuff.
Now, the way I’ve set this dynamic SQL up, like you don’t have to tell me, like I know, I just didn’t feel like making this the longest demo in the world that, you know, there are going to be cases where neither of these are true, right?
Like there’s going to, there’s probably going to be cases, there could be cases where, you know, post type ID is not in one or two and the start date, end date is still less than two.
And then, you know, and like the reverse of that for the bottom one. So like, I know these don’t catch every single possible outcome. This is just to show you that you can make it work with the set of variables, right, the set of predicates that I’m passing in.
So you could have like another outcome for this where like, maybe you just like say from posts and let SQL Server pick the table, right? Like if you, like, there’s some other outcome to this, you could just say, otherwise just, you know, hit the post table, do whatever you want.
Maybe you could still keep a foreseek hint on there and you could just leave it up to SQL Server, which index to foreseek into. But if we run this query, we will see, make sure that’s created, that for both of our dynamic outcomes here, right, for both of the sets of predicates we pass in, we can direct SQL Server to use the correct index based on how selective various predicates are, right?
So for one of them, like all we care about is, you know, if the post type ID is one or two, like those are not selective. So we want SQL Server to go with creation date.
And then if post type ID is not one or two, then we want SQL Server to use the post type ID indexes. That would be, that would be very selective for all those other ones.
And again, you know, you would just, you would just need like some like fallback for this, but like else, you know, just from post with foreseek, do whatever you want. So anyway, this is just a little bit about some of the index design stuff that you might have to think about.
Remember that, you know, not every scan touches the whole table. Not every seek is quick and direct to a few rows. If you look, think about the queries that ran before, we did have seeks, but we had seeks that span like almost the entire post table, right?
For the creation dates, they did span the entire post table. For the post type ID too, they did a seek, but they did a seek to 11 million out of the 17 or so million rows that are in the table.
That’s generally not what you want, right? A seek that seeks through most of the table or all of the table is not like much of a seek at all, right? Might as well be a scan, right?
Just a scan that they spelled wrong and pronounced wrong and labeled wrong and I don’t know, a bunch of other stuff. So anyway, these are just some of the things that I find it helpful to think about when I am helping people design and tune indexes specifically for queries.
You know, if you’re just looking at missing index hints, stop. Like, if you’re just looking at missing index suggestions either in query plans or in the DMVs, just stop.
It’s not going to help you do this sort of fine-tuned work. Like, the, it’s, like, SQL service is not good enough at telling you what indexes would be really appropriate very quickly, like, prior to optimization, right?
It just doesn’t know. Can’t do it. It’s not that smart. Even, like, the Azure auto, auto index tuning stuff is still crap. Like, don’t, it’s useless. I don’t know, I don’t know how many, I don’t know how many hours they buried in that, but, I don’t know.
You gotta keep the summer interns busy, I guess. Well, anyway, I hope you enjoyed yourselves. I hope you learned something. I hope that you, you have found this video to be enlightening, both from the, the point of view of index design and 4C Kent usage and dynamic SQL usage and all that other good stuff because we did, we did, we did cover some ground in here, didn’t we?
We did, we did get through a few things. Anyway, thank you for watching. I’m gonna go take a shower. Bye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.