Equality, Sort, Range Indexing In SQL Server - When It Works

Equality, Sort, Range Indexing In SQL Server – When It Works

Video Summary

In this video, I delve into a fascinating indexing methodology that I recently discovered and find incredibly useful for improving query performance in SQL Server databases. It’s not about creating new types of indexes but rather following an effective indexing strategy known as the Equality Sort Range (ESR) index. I explain how to structure your indexes by placing equality predicates first, followed by sorting elements, and ending with inequality predicates. This approach ensures that data is ordered efficiently for both seeking and ordering operations, leading to faster query execution times. Through practical examples, I demonstrate how this indexing method can significantly enhance performance compared to traditional index creation techniques.

Full Transcript

Erik Darling here. Your pal Erik Darling here with Darling Data. In today’s video, we’re going to discuss a term that I recently discovered, which is a wonderful way of phrasing an index methodology. Not a specific type of index, of course, not like a brand new style of index that you can create in your database, but just an indexing methodology that you can follow. generally used to improve query performance quite a bit. And that is the Equality Sort Range Index. It rolls right off the tongue, just like the POC index, the partition over covering index that is so very popular when people are talking about improving the performance of windowing functions. So we’re going to call this the ESR index. And we’re going to look at an example of how you can use this indexing methodology to improve your query performance. Yay!

We’re all excited. If you like this video content and you would like to be a supportive member of the audience, you can sign up for a membership using the link down below in the old video description. And you can, I don’t know, contribute to me being a happier, well, probably not healthier person, but you can contribute to me being a happier person that way. If you do not have a the dinero to contribute to that, to the happiness fund in that regard, you can always like comment and subscribe. And if you would like to ask questions of me privately that I will answer publicly, you can use that link, which is also in the video description. Coincidentally, many useful things down there to put your question into my magical Excel file that I will read from. If you need help with SQL Server, if you are struggling with performance, reliability, scalability, application, blah, blah, blah, blah, you can hire me as a consultant to do many useful things for you. Health checks, performance analysis, hands-on query and index tuning, all that other stuff that needs tuning, dealing with performance emergencies, and of course, training your developers so that performance emergencies are a thing of the past.

All worthy goals when it comes to SQL Server. If you would like some training, boy, do I have it. My pockets are full of it. You can get all about 24 hours worth of training from my pockets for 75% off. That’s the link, that’s the discount code. And like everything else useful in the world, the video description contains a fully assembled version for you to click on. Upcoming events, we have SQL Saturday, New York City, 2025. That’s this year. It’s hard to believe. Coming up May the 10th with a pre-con by Andreas Walter on May the 9th. And that’s about SQL Server performance monitoring and tuning and stuff. So we’re all looking forward to that. I’ll be there both days serving you lunch, bringing you, bringing you cookies and coffee and whatnot.

And so you can come hang out. Give me a hug. I don’t know, whatever you’re into. But with that out of the way, let’s party. Let’s do this thing. Let’s do this thing like we care about it. So I, for years, have been talking about this thing that I do when I’m tuning queries and indexes. But I had never, I always talked, it was always very clunky in my head. It was just like, but it’s like, you do the thing and then the other thing and then you care about this thing.

And then I saw a fellow named Frank Pachaud, I apologize, Frank, if I pronounce your name wrong, on LinkedIn talking, I guess he started a job with MongoDB recently. We’ll all forgive him for that. We’ll have a seance for Frank. We’ll talk to him in the database hell. I’m kidding. Where he posted a link to this thing in the MongoDB dots, where they talk about the equality sort range rule for indexing.

And I thought, wow, that’s a thing that I didn’t have a good name for. And so we are going to start referring to him. We’re going to meme this into existence for SQL Server, the ESR index, that is equality sort range. And we’re going to show some examples of that. Now, I’ve got a query down below and without any indexes, starting with no indexes on the POST table, SQL Server gives us a missing index request.

The missing index request that it gives us leads with POST type ID and then has last activity date as a second key column. And then in the includes, we have score and then view count. So the missing index request in SQL Server, as far as key columns go, are very WHERE clause centric.

If you look at the WHERE clause that we have here, we have WHERE POST type ID equals 4 and WHERE LAST activity date is greater than or equal to 2012-01-01. So SQL Server puts the equality predicate first and the inequality predicate second. And it doesn’t give a lot of thought to the fact that we are ordering by score over here, right?

Doesn’t really care about that. But we’re also selecting view count, right? So view count being in the includes, that’s fine.

But score being in the includes, that is not so fine. Because columns in the include of an index are not sorted or ordered in any useful way for, you know, either seeking to values or for helping with presentation or operator-dependent order buys. Like I’ve talked about in many other videos, things like merge joins and stream aggregates require sorted data.

And if you ask for data in an order that is not supported by an index, SQL Server has to put a sort operator into the query plan and break out its tiny little baby fingers and magically put your data in the presentation order that you have asked for. Other things like windowing functions also do pretty well with ordered data. So what we’re going to do is look at examples of these two queries using this index.

This is index P0. Both of these queries are hinted to use this index, mostly for a little bit of compactness here. We do have a few things to talk about.

And so I pre-created three or four indexes and I’ve hinted queries in each section to use those indexes so that there’s no weird overlap and oops, this query used this index this time. Sorry, whatever. So because that gets annoying.

So we’re going to run these two queries. And the first one is going to be pretty quick. And the second one is going to be a little bit less quick. And I’ll be honest with you, I am not fully used to SSMS 21 and where it puts the little grappling hooks for us to move things around on the screen with.

So you’ll have to bear with me while I acquaint myself with this wonderful 64-bit program. But for the first query, this is fast enough, right? This four milliseconds, six milliseconds total, great.

Like, no problem. Like, all easy peasy there. This one, not so great, right? We do an index seek. Okay, 284 milliseconds. But then we hit this sort and we spill a little bit.

Not so much that I’m, like, worried about it. And then we, of course, end up taking nearly a very devilish number of milliseconds to complete this query. So this isn’t really great.

This isn’t really an awesome scenario for the second query. But there is a way to make both of these fast. Now, a lot of people might be tempted to create an index that looks like this.

Where score is the leading key column. Because then we will always have score in descending order. The problem this presents us as query tuners is that with score is the leading column in the index.

It sort of acts as a gatekeeper to the other columns. So score is in order. But post type ID and last activity date are not in a useful order after this for searching.

So we sort of have this, like, blocker here that prevents us from seeking into the index for the values that we care about. So this is index P1. Both of these indexes, both of these queries are hinted to use index P1.

And if we run these, we’re going to get sort of not great performance out of either one. This query up above that took, well, like, a few milliseconds before is now what? Is that the right one?

Let me just go double check, make sure I didn’t mess anything up. No. All right. So, yeah. This one up here that used to be really fast took 1.5 seconds. This one down here, this one actually got a little bit better, right?

Even though we end up scanning this whole index, it got a little bit faster than last time. Mostly because we didn’t have the sort spill on this one, right? So this one did not take a devilish amount of time before.

This one just took a long time to find all the, scan through and find all the data that we care about. So what I’m going to show you now is the ESR style of index that I talked about in the video introduction.

We are going to lead with the equality predicate. We are going to follow that up with the sorting element in our query. And we are going to put the last activity date as the final key column. We’re still just going to include view count because view count, we don’t have a predicate.

We’re not in our where clauses, not in our order by. We’re not joining on it or anything. So there’s really not a lot of sense in there aside from just putting it in the included columns.

So let’s look at query performance with these two. And let’s see if this actually gets faster using our ESR index. And it does, right?

So this is about, this is what I want my query to look like. We have two very, very fast index seeks into our P2 index. This is our equality sort range index.

And we don’t need to sort data because the data is in order for us. Now, I just want to show you what the seek predicate looks like. This is going to be identical for both of these aside from the actual values in them.

This one’s four, this one’s 2012, the other one’s 2011 and one or something like that. But the important thing here is that we are able to seek in the index to where post type ID equals four.

And then we can take advantage of how B-trees store data, which is that like everything is primarily in this index ordered by post type ID. And then once we seek to any post type ID, like where post type ID equals something, all of the scores will be in descending order for that particular post type ID.

So we don’t, like, we’re not saying where post type ID is in four or five or greater than four, because that would mess things up because we would be crossing boundaries. We would be crossing post type ID boundaries to different post type IDs.

And the score column order would reset for each post type ID. So that wouldn’t do us any good. But for an equality predicate, where we just find all the rows for one thing, all of the data in the index is ordered within that one thing.

And that just gives us a quite opportune index to apply the C predicate, keep the data in order that we need for score, and then apply a residual predicate on last activity date to finish filtering out rows that don’t matter to us.

Right. And just because of the way indexes, B tree indexes work, we don’t even necessarily need view last activity date as a key column.

If for whatever reason you just wanted to have it as an included column, you would get the exact same execution plan. So just to show you what I mean, I’ve created a fourth index because we started our index numbering at zero because we’re nerds.

I put last activity date as a second include column here. Right. So it’s not even the first included column, but this is mostly just to get across the point that include column order doesn’t matter much.

If we run these two queries, we will get identical performance as the last query and our query plans will look just about exactly the same as well. Here is our index seek that goes directly to post type ID four.

And here is our residual predicate on the last activity date column. Now, the reason why this doesn’t matter so much here is because range predicate range or rather inequality predicates greater than greater than equal to less than less than equal to not equal to all that stuff is not no equality predicates and is no predicates are boom, you equal this.

like they often end up as residual predicates in queries anyway. Not all the time. Sometimes. Sometimes it happens. Sometimes it doesn’t.

Sometimes you have a seek and a residual predicate. But this is just to show you that the final key column in this case does not really have much impact on final query performance.

So when you’re looking at when you’re trying to tune queries that do something like this, they can, of course, be more complicated than this. But you can always get indexing for these to really help query performance when you embrace the equality sort range paradigm of index creation.

So you want to make sure if you have equality predicates, you want those to be the leading key or key columns in the index. You want your ordering elements to come after the equality predicates.

And you want any inequality predicates to come third in the index or come after the ordering elements in the index. Not necessarily third, of course.

Right? Could be tenth. I don’t know. At that point, you really start to see the benefit of included columns in indexes because you don’t have to have 13 key columns in order to support this sort of thing. So you could put your inequality predicates either as key columns after the sorting elements or in the includes because they will generally have less impact on things after that.

Now, there are always going to be some exceptions to the rule. If your inequality predicates are very, very selective, you might not want to have them as includes. If you’re seeking to like 10 billion rows and you have to apply a range predicate to like, you know, 9 billion rows afterwards, it’s maybe not as valuable.

But once you’re dealing with tables that large anyway, you should be using columnstore indexes. And I don’t know, I’m probably thinking about the meaning of deeper meanings in life because you might be a little, you might be dealing with many other things.

Also, you don’t see a lot of top 5,000 order by queries and that sort of situation. But anyway, thank you for watching. I hope you enjoyed yourselves.

I hope that you will embrace the equality sort range indexing methodology. I hope that you will join me in memeing this into widespread existence in SQL Server. And I will see you in another video where I will hopefully get back on track talking about store procedure stuff.

Really trying to work out some good demos for the temporary objects stuff that I want to talk about there. That might end up being more than one video because there’s a lot to say. But anyway, thank you for watching.

And I will see you in another video, I don’t know, shortly or longly or I’m really just not sure at this point. Well, the fates will decide. Anyway, thank you.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.