Equality, Sort, Range Indexing In SQL Server – When It Doesn’t Work
Video Summary
In this video, I delve into the intricacies of equality sort range indexes, specifically when they might not be as beneficial as initially thought. Building on yesterday’s discussion, I demonstrate how these indexes can struggle when your range predicate is highly selective and your ordering elements are poorly aligned with query requirements. By running through several queries and their corresponding execution plans, I illustrate the challenges faced when trying to optimize performance in such scenarios. The video also includes tips for improving query performance, such as using index hints and considering alternative indexing strategies that better align with specific query patterns.
Full Transcript
Your best friend in the entire world, Erik Darling here, to talk to you further about a subject that we broached in yesterday’s video, which is the Equality Sort Range Index. In yesterday’s video, I did say that there are situations where it’s not going to be as great of an idea as it is in others. And we’re going to, we, I owe you at least something for that. So, to sort of generalize a little bit, the problem with this style of indexing comes when your, let’s see, an easy way to put this, when your range predicate is far more selective than your Equality predicate and your ordering element or elements are in a bad order for the way you want to search for things.
It’ll make a lot more sense when I actually show you the demo. So, for now, let’s just, let’s just keep in mind what the topic is going to be and be amazed and astounded and just wowed right out of the seat of our pants when, when I show you what’s going on. But before that, of course, I like doing this stuff.
I like doing this stuff even more when people sign up for memberships. So, if you would like to do that, there’s a link right down in the video description. And for as little as four bucks a month, you can say, hey Eric, thanks for doing all this.
It looks like a lot of work. You’re a real sport. If you do not have four bucks a month, perhaps you have significantly increased your Pez dispenser collection, used eBay for its original purpose, and you just don’t have four bucks, you can do other things that help me become a bigger, better person.
You can like, you can comment, you can subscribe. And if you want to ask questions, fancy little pinky out questions privately that I will answer publicly on this very YouTube channel, you can go to the link above then, which is also in the video description, and you can ask your questions there.
And I will answer them five at a time. Again, I’ve got a bit of a backlog on these, so we’re going to record some of those very soon. If you need help with SQL Server, if asking a question anonymously that gets answered non-anonymously online, you can hire me as a consultant to do all sorts of things with your SQL Server to improve performance.
I do all of these things and more. And as always, my rates are reasonable. If you would like to get some training content from me, carrying on my fine tradition of rate reasonability, you can, again, video description has the fully assembled link, but if you feel like typing, you can go to training.erikdarling.com and punch in the discount code SPRINGCLEANING and you can get 75% off any of my training videos.
Fun stuff there. SQL Saturday, New York City, 2025. That is this year. That is, at this point, let’s, uh, uh, I mean like, really, like, two months away at this point.
So, I, if you’re speaking at them, gosh, I hope you practiced. And, uh, as always, I will see you there. Uh, bright and, bright and oily.
But with all that out of the way, let’s, uh, let’s talk about when this equality sort range index thing doesn’t work out so well. So, uh, because I felt the format of yesterday’s video worked pretty well, um, I have pre-created some indexes and I’m gonna run some queries using index hints to show you what happens when these queries interact with those different indexes.
Uh, so let’s run these two queries. And I’ve got query plans enabled because, gosh darn it, I’ve been practicing. Don’t wanna, don’t wanna disappoint anyone with, uh, unpracticed SQL Server demos.
And you might notice that these, at least, well, one of these queries is taking quite a while to execute. And, uh, it’s gonna continue to take quite a while to execute. So, what these are doing, and the, the difference between these two queries is just in the ordering.
Uh, this one is, of course, ordering by the ID column in ascending order. And this one is ordering by the ID column in descending order. So, for this first query, uh, what it, what’s worth noting here are a couple of things.
Uh, one, we’re gonna pretend we don’t have any useful indexes at first. And I’m gonna show you what happens when either you don’t, you don’t have any useful indexes, or for whatever reason, SQL Server does not choose some useful index that you have.
Uh, let’s just put, like, you could, you could substitute this ID column for like a select star, or select a bunch of columns type thing. And SQL Server might cost using a narrower index out of, uh, out of the equation, because it doesn’t want to do a whole mess of lookups.
All right? So, just bear with me here for a moment. Um, we are forcing the clustered index via this index hint. And we are looking for, in the votes table, vote type ID equals two.
Vote type ID equals two has about 37 million rows associated with it. That is the almighty upvote in Stack Overflow land. And the other predicate that we have on this table is where creation date is between 2013-1201 and 2013-1231.
Now, since this is the Stack Overflow 2013 database, the world ended right here. There are no, there are no rows after this. So, this is the last month of data in the table.
The problem that we run into here is that SQL Server can, like, find 37 million rows for this pretty easily. But finding the top 10 rows, right? Because we’re doing fetch next 10 rows only, offset zero rows, ordered by ID ascending.
It, you have to go through a lot of the table before you get to creation dates that meet this predicate. So, finding the top 10 ordered by this is what’s really tough. If we look at the execution plans for these, this first one takes 20 seconds.
Two, zero, 20, right? If we look at the clustered index scan, the number of rows that got read is this many, right? 5, 1, 4, 0, 1, 7, 1, 5.
That is an 8-digit number of rows. That’s a lot of rows. The number of rows in the table is 5, 2, 9, 2, 8, 7, 0, 0. That is also an 8-digit number of rows.
But if you notice, they both start with the same number, 5. And only once you get over to the second number, there’s only like maybe like a million and a half or so fewer rows that get read than there are rows in the table. And of course, since this happens awful in the single-threaded, and we have this top just like, gimme rows, gimme rows, gimme rows, gimme rows, gimme rows, that takes a whole long time, right?
It’s not a good thing. You can find stuff, you can find partially reasons for stuff like this in your query plans. If you look at the properties of an index access operator in a query plan that has like top or offset fetch, and you’ll see this thing in here, estimated rows without row goal, that’s going to be a sign that SQL Server said, well, you know, I think I have a different estimate in mind.
If we didn’t have that top 10 in there, I would have to read all of this. But with the top 10, I don’t have to read nearly as many. SQL Server is just like, you know, I think I can get away with a lot less than that, right?
So, fine there, right? All good. But then down here, notice that we do far less work, right? This thing spits out quite a large number of rows there.
This one spits out, oh, whoa, what just happened there? Developer PowerShell. That was a weird button. Let’s never hit that again. Let’s never touch that button again. Come on, there we go.
Zoom it. All right. So, for this one, we only had to read 18 rows to find anything that we cared about. So, that is, of course, far fewer, right? That is far fewer rows than you would want to deal with there.
Now, of course, Paul White, who is the most useful human being on the planet. Let’s just take a moment here to acknowledge that Paul White is the single most useful human being, maybe, that has ever existed. And you can’t spell Paul without the U for the most useful human.
We’ll workshop that later. Let’s just move on from that. That didn’t go so well. You can’t spell useful without Paul. Nah, screw it. Anyway, you can somewhat improve these queries.
And I’ll try to remember because, gosh darn it, I know how much you care to put this link, a tale of two index hints over at the much-loved SQL.Kiwi website in the video description. So, if you use a weird index hint and a tab lock hint, you can improve the performance of these queries quite dramatically.
These will drop down to about 1.3 seconds a piece. But now we have this sort in the query plan, right? Because SQL Server, when you use index zero in the tab lock hint, Paul explains this much better at his site.
But basically, like, you no longer do an ordered scan of this data, and then you have to sort this data. So, it does look a little bit funny that, like, we’re selecting from the table, and now all of a sudden we have to sort data to put it in order by the column of the primary clustered key that’s already in order by it.
So, that’s a little weird. But anyway, these do get slightly worse if we take out the tab lock hint, which I find quite amusing. And again, which if you go to the Paul’s post, you will find quite a bit of detail on this.
But these do go from 1.3 seconds a piece to almost 1.8 seconds a piece. So, that tab lock hint is, let’s just say, strictly necessary for the full performance benefits here. So, slight digression.
Anyway, let’s talk about why that equality sort range index does not help a query like this. So, this is my index v0, and I’ve got vote type id, id, and creation date. These are our equality sort range predicates, just like we talked about yesterday.
And both of these queries are hinted to find, to use that index. Now, this will help performance somewhat generally, right? Like, that first query, the first time we ran this using the clustered index, it took us 20 seconds to locate rows in there, right?
It was not a good time. We, you know, we beefed it on that one. We did not do well.
With that index, and with that index hinted, we save about five seconds. So, this is, again, not great, right? We did not, like, see the benefit from using that index methodology for this query the way that we do, the way that we did for the query yesterday.
And this is because, again, we need to seek through quite a number of rows. Even, like, we can seek to these rows now, and we can have this thing in order. But to evaluate this predicate on creation date, we still have to do a lot of extra work, right?
So, like, with that query in place, like, yesterday, this worked great. We did a seek. We had our data in order.
There’s no sort in this query plan or this query plan, I promise. But then applying this much narrower predicate on the creation date column completely screws things up, right? Like, we get the 37 million rows just by seeking to vote type ID 2.
But then after we find those 37 million rows, we have to go apply this predicate to all of them, right? It’s no fun, right? It’s no good at all.
So, the second query with ID in descending order does just fine with this, right? So, like, well, it’s not totally, this is, like, that funny intersection of, like, how indexes and queries work together. If your query demands rows in the ID with ascending order and your predicate is, like, something that’s way deep into the table, you’re going to have a bad time finding that data with this type of index.
The type of index that would work better if you need things in ascending order here would be an index like this that leads with creation date. And I’m going to show you two forms of this index. This one is creation date, then vote type ID, then ID.
The second one is going to be creation date ID, vote type ID. But the point for me, the point of me showing you both of those is that this sort of breaks the pattern that the equality sort range thing fixes. Because when you put the range stuff at the beginning of the index, even if there’s an equality predicate in the middle, like, it breaks that.
Because creation date is still the primary sorting and vote type ID is still only sorted after that. So, like, you don’t have the ID column in a useful order for this one. So, if we look at the query plans for this, these are both hinted to use the V1 index.
These will both be fast enough to find the rows that we care about. We do have a sort in the query plan, which, you know, was part of the thing that we wanted to get rid of with the ESR, the quality sort range index methodology, was having a sort in the plan. But in this case, we get the number of rows down into a pretty small batch.
So, the sort isn’t very painful in either of these. Both of these take just a little under 100 milliseconds. This is the second index in there that I promised I’d show you.
Creation date ID and then vote type ID. So, now we have, like, you know, range, sort, equality. But this doesn’t really make that much of a difference because, like, it’s just for, like, this specific query, you just do much better to be able to locate that narrow range and then look at vote type ID no matter where, almost no matter where it is in the index.
You can probably have it as an included column and it wouldn’t make any difference. So, if you run this is so, both of these will, both of these queries perform just about the same as with the index in slightly different order. So, like, swapping ID and vote type ID doesn’t make a big difference here.
One thing that you might want to consider is, like, just in general, like, the ordering of your columns and some columns, like, kind of share ordering. So, like, in this case, if you really wanted to get rid of the sort, you might consider ordering by creation date. In this case, for the votes table, the creation date column, like, like, obviously that’s for a new row that gets inserted.
When a new row gets inserted, the ID column, which is an identity, also increments. So, every creation date and every ID in this table is going to be higher than the one before it. You might have columns that similarly share ordering and that you might be able to rely on for this.
Now, just because of the way votes get inserted here, there aren’t going to be any duplicates for creation date. Right? Because, like, like, you’re just going to not have two things show up at the exact same time.
So, if you really wanted to get rid of the sort, you could run your query and instead order by creation date and just have a much simpler query plan overall. So, you might not always be able to get away with that, but in certain cases where you can, this might be a good thing to do. If you needed to have a tiebreaker in there, you would want to use this v2 index on creation date and then ID.
So, you could say, like, order by v.creation date, v.oops, ha ha, typing in demos. Look at this silly man go. So, if you needed the tiebreaker, like, this would be the better indexing strategy for these two, right?
Because you would have order by creation date, order by ID, and this index would be able to fully support these things. So, the equality sort range indexing, again, it’s great for a lot of situations, but for situations where your range is much narrower and sort of like, let’s just, like, to make it easy to physically sort of understand it. Like, the range is much narrower and towards the end of the table, and you’re ordering by stuff in ascending order.
Like, you have to read through a lot more of the table to get down to the stuff that you care about down here in order to find those rows. So, it’s not always going to work out perfectly for every query, but for some other, for some types of queries, it works amazingly as is. For other types of queries, you might have to either change your ordering to better, like, locate data towards the end of the table that you care about, or to sort by a different column in the table with the ES, with, like, you know, with some sort of index in place that helps you put this stuff in order without having to sort.
So, sort of a complex situation there, but one that is worth examining and one worth talking about. I do hope you enjoyed yourselves, and I hope you learned something, and with that out of the way, I don’t know, I’ll talk about some other fun stuff. Well, we’ll see what it is when we get there, though, won’t we?
Alright. Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.