Introducing sp_IndexCleanup!

Introducing sp_IndexCleanup!


Video Summary

In this video, I’m Erik Darling from Darling Data, and today we’re unveiling a brand new stored procedure to help you streamline your SQL Server management. SP_index_cleanup is designed to identify unused and duplicative indexes, which can be major culprits in performance issues like buffer pool contention and lock escalation. With this tool, you’ll get recommendations for tuning queries and indexes, fixing blocking and deadlocking problems, all while making your databases more efficient. The process involves removing unused indexes, deduplicating leftovers, and adding page compression to make your data smaller on disk and in memory. This not only helps reduce transaction log throughput issues but also eases the burden of managing too many indexes.

The script works through a series of well-defined rules to identify exact duplicates and superset/subset relationships between indexes, ensuring that you’re making informed decisions about which indexes to keep or remove. It then scripts out all necessary changes for you, including merging included columns into create statements, disabling unused indexes, and compressing any indexes without page compression. This iterative process helps ensure that your database is optimized for performance while minimizing the time spent on repetitive tasks. Whether you’re dealing with a single table or an entire database, SP_index_cleanup provides detailed reports to guide your decisions, making it easier than ever to get better at SQL Server management.

Full Transcript

Erik Darling here with Darling Data and today we have an exciting announcement. We are welcoming a new member to the family and I don’t mean I’m hiring someone because who the hell would want to work for me. What I mean is that we are introducing a brand new stored procedure to help you with your SQL Server nonsense. It’s been a while since I’ve written a brand new one of these. I think sometime towards the end of 2024 I spit out SP Health Parser and SP Log Hunter, which are valued members of the family, but as a wise woman once said, you’re getting too big to cuddle. So I’ve got a new one and this one is called SP underscore index cleanup. And the goal of this stored procedure is to help you identify unused and with di solipe.

performance, make recommendations, tune queries and indexes, fix blocking and deadlocking problems, and of course help you get better at SQL Server. And like most of my time gets spent in here. And as much as I love like tuning queries and indexes and like other words, like, you know, tuning queries so they go faster and creating indexes to help queries go faster, part of this analysis is looking at unused and duplicative indexes because unused and duplicative indexes cause a lot of problems. I recorded a few videos about the problems they cause around like buffer pool contention, lock escalation and like trend, like the more indexes you have, the more transaction log stuff you have to do. So having like too many indexes, especially of the like unhealthy kind, it’s like, like bad cholesterol or something. Having too many indexes of the unhealthy kind impacts SQL Server in a number of areas. So most of the databases that I look at have way more data than memory, have only ever had indexes added, it seems to be like an append only operation, then have lots of bad deadlocking problems and cleaning up unused and duplicative indexes can help reduce these things, right? So like, like you have fewer indexes, you have fewer objects competing for buffer pool space, you have fewer indexes. So you have fewer lock escalations.

And you know, if you have transaction log throughput trouble, or like you have an AG, we’re like, you know, sending all these index changes across to other nodes is a pain, getting rid of these sort of like unhealthy indexes is very useful. So the clean cleanup process typically consists of removing unused indexes, deduplicating leftovers and adding page compression.

Right? And adding page compression is great because it makes your data smaller on disk and in memory. So you make better use of the memory that you have. So the point of this script is to really clean up all the indexes you have, make them as small as possible. And then you have a very clear sense of like, you know, do you need more memory? Now you know, right? Like, like now you can figure it out.

Now, you don’t want to double work things. So whenever I’m going through this process, right? My job, like the first thing I do is find unused indexes, right? And I’m listed all the unused indexes. And then I grab all the duplicative indexes, either like exact key column matches or indexes that lead with the same key column. And then I have to go and look and see if any of the unused indexes are in there because I don’t want to double work anything. So I’m like, Oh, I can delete this one. I can do this one. And then after you finally do that, you can look at the list of indexes that you have left.

The problem is now that you’ve, now that you like have this list of duplicative indexes that you can, you can, you can analyze, you have to look at key columns, key column order. You have to look at included columns. You have to look at if they have filters on them. You have to look at if they’re unique or not. You have to compare like column sort orders, they ascending, descending. And it’s just like a lot of just like, Oh, we have to move this down. Oh, this include list goes way off the SMS screen. Let’s bring that down. Oh, this one looks like it has the same number. Oh, this one has 30 includes this one has 31 includes what’s different. It’s it’s just it’s not fun. And after all that, you still have to like edit all these create statements to merge the included columns, which means like you have to make a, like you have the create index statement and then you’d like do the drop existing blah, blah, blah, make sure all the included columns are accounted for. Uh, like for all the create statements for the indexes, you’re going to get rid of, you have to like alter table, like alter index, blah, blah, disable. And then like, like, you know, fixing all of the other stuff that goes wrong with like index creation, you have like fill factors that are really low. Uh, can you use online, you use, are you using a version of SQL Server where you can create indexes online or not?

Uh, do you need to like, you know, drop the, like do drop existing? Do you need to add compression? Like all this stuff that you have to think of every time you do this, it gets very, very annoying. And then because index tuning is a very iterative process and I realize I’m standing in the way of some of these words. So I’m just gonna, I’m gonna duck down for a moment and oh, the green screen effect. We’re having a good time with this, aren’t we? Uh, so then you get to do it all over again because index tuning is a highly iterative process, right?

Like you go through, you make this first set of changes, you see how things look with SQL Server, and then you, you check in and then you make sure that like, there’s no new crap. And then you can just do all these things until, until time, time immemorial or something. Um, but even after doing all that, you still don’t have a full list of indexes that are not compressed that you probably should compress because you have too much data and not enough memory, especially if you’re on standard edition, this is a big deal. Now, I absolutely adore SP Blitz index. I, I still use it. Uh, I still, well, I mean, I haven’t had anything to contribute to it in a while. I feel like I’ve contributed all, maybe all that I can to that. Uh, but, and it’s great at analyzing things. It is fantastic at doing the analysis and flagging things that are like worth you looking at. So like unused indexes, what’s duplicative based on either all of the leading key columns and a lot more other stuff.

I’ve written some checks in there that are very, very useful to this day. Uh, SP Blitz index, of course, originally written by Kendra Little now, like, I guess, happily maintained by the, the, the greater open source community of SQL Server. But, uh, like adding all this code to that would have been a nightmare. Um, so there, there is also a mode with Blitz index where we’ll inventory all the indexes in your database and give you scripted create and drop statements. But like, if you want to add compression, modifying those statements is not the most fun in the world. Uh, and it doesn’t do any of this work for you. Now, me being a consultant, um, I want to spend as little time on these sort of tedious, repetitive tasks as possible. So I wrote SP index cleanup. Um, so it will identify unused indexes. It’ll warn you if your server uptime is not very long because it’s a bet. It’s not a, you can’t make a very confident decision about unused indexes. If your server hasn’t been up for a terribly long time, uh, it will compare index and included columns. It will take into account like uniqueness and filters and all that other good stuff. And it will find compressible indexes.

So anything without page compression already, you will find those and mark those and say, Hey, you should probably compress this. Uh, and it works off a set of well-defined rules for unused exact duplicate key columns, uh, superset subset key columns, uh, unique versus non unique keys, including just like unique constraints. Um, one thing this script does that I’m very happy with is if you have, uh, unique constraints on your table and you also have a nonclustered index that is on those same columns and has like some includes, then it will tell you to get rid of the constraint and make the non clustered index unique. So you can kill two birds with one stone there. Uh, it will evaluate column sort directions. So if like just say you’re on the user’s table and you have something on reputation, ascending and something on reputation descending, it won’t call those dupes. And it will also merge include columns into a handy create script for you with all the options, like the way that I want them, right? When I do this stuff. Uh, so like I said, it scripts out all the actions for you. It’ll merge any missing included columns into keeper indexes. It’ll disable unused and exact dupes or subset indexes, and it’ll compress any indexes without compression. Uh, it’ll give you some things to review.

I mean, all of this is for review. It doesn’t automatically do all this stuff. It just gives you the script to do it. Um, so like stuff that you might have to review, uh, like as part of like the overall results are indexes that are again, like have like the same key columns, just in different orders. Again, like columns, ABC column, the CBA or BAC or something. Uh, you can look at the usage patterns there and figure out if that’s like, like you want one or the other or both or whatever, uh, indexes that we’re keeping. I leave in the results so you can validate any changes there.

Like one thing it’ll flag is like this index is a subset or of this index or this index, it’s a superset of this index. So you can like go and like grab those definitions and compare them. Uh, it’ll give you all the proposed scripted changes and it will give you a summary report about all of the actions that you can potentially take at like overall database and table level and like the potential gains from executing on each one of those actions. Uh, I don’t do any double working. Um, if indexes are unused, they’re not part of the D-dupe analysis. So, you know, again, like server uptime is pretty important for this. You want to make sure that like, you know, your server has been up for like a day or two or something like that. You want to make sure that you have at least a couple of weeks of uptime to make sure that like you have a good amount of like index usage information in there. Uh, any indexes that get merged will already have compression, uh, scripted into them. So you don’t need, so you won’t get like a separate compress this index because you already compressed it when you did the merge thing. Uh, and compression, so compression recommendations don’t include indexes being merged or indexes being disabled for any reason.

Uh, and so let’s talk a little bit about usage. Um, there are things you can filter on. You can filter to a specific database if you want, and you can also filter to a specific table if you want to focus on one table.

Uh, there are also some things about your tables and indexes that you can filter on, like a minimum number of reads, writes, minimum number of rows in the table, and the minimum table size in gigs. Um, this is it that the size, the min size GB is a decimal, so you can put things in there if you’re interested in something that’s like 500, 500 megs or something.

Um, you can also choose to get all databases, and you can choose to include or exclude databases as part of that. So if you want to like get all databases, but you want DB1 and DB2 and not DB3 and DB4, you can tell the script that and it will do it. And so you get just the databases that you care about, because me, Erik Darling, I care about you. Uh, some sample results for this thing. I realize that my, my body is in the way of some of these, that’s okay, but this is sort of what the cleanup output looks like.

Uh, you get the type of script, so whether you’re going to merge, disable, or compress something. Uh, any additional info, note in this column we do have like the warning, there’s less than 14 days of uptime. Uh, then you get stuff like the database schema, the table name, the index name, uh, the consolidation rule that it got hit with. Uh, and then down here we have the target index name and the superseded stuff.

So this is, this is what I was talking about where like, uh, if you need to compare which indexes are, like which index won the merge war and like which index is getting merged into it and which one like is the mergee and the merger, then you can, you can sort of follow these names and do that.

Uh, you get the size of the indexes, the number of rows in the index. So filtered indexes, they’ll show some stuff different. Uh, the number of, you get like the read and write ratio of these. Of course, you know, this is just me like ginning up some test stuff. So the reads and writes aren’t very impressive looking here. And then there’s the original index definition. So like the index as it exists right now. And then there’s also the, uh, this final column, the script, this is what you would run to apply whatever changes the script is recommending. And then finally we have the reporting output, uh, where you will see, uh, stuff at the, like all the things that were analyzed.

This does require a little bit of explanation, which I’ll get to in a minute, but you have the number of objects analyzed. Uh, then you have like the database and table level numbers, uh, the database name, uh, like tables analyzed, total indexes, indexes you can remove, indexes you can merge, and then like some percentages. This first line up here is only the stuff that was analyzed. So like in my stack overflow database right now, the only table that I have a bunch of extra indexes on is the users table. So that’s why this says tables analyzed one and total index is 25. If you look at the, like, like the number of indexes in here, it says 30 and total for all this, that that’s, that’s by design.

Uh, the total index is 25. There are 25 nonclustered indexes on the users table. That’s what was analyzed. There are 26 total indexes on the users table, including the clustered index. Um, this script will not like try to tell you to get rid of nonclustered indexes if they match your clustered index or if they match a non-clustered primary key, because that would be stupid, right?

You don’t want to mess with those. Those are like, those are more specific things than just some non-clustered indexes that people have clicked, right clicked on the green text to create. Uh, you also get some size information. So the current size of, uh, again, this line here is only for the analyzed stuff.

So this top line is always going to look different from the rest of the results, but for the rest of it, you have like the, at the database level, the size of the database, the size of the tables, how much space you can save by cleaning up indexes. Um, there’s also some guesses at how much space you can save, uh, both like, uh, like min and max for, uh, the total for the database and everything.

And then you have the total rows and then like a breakdown of reads, writes, locks, slashes, all this other good stuff. And then at the table level, it’ll show you like how many like things you can save a day. Let me move over to the side a little bit here so you can see better.

Uh, how many things you can save a day by cleaning up indexes. So this is the user’s table right here. This is the user’s table right here. It’s had 499 latch weights and we could save 25 latch weights a day by cleaning up the nine removable indexes on here and performing our two merge statements. So, uh, that’s kind of how this whole thing works. Um, if you want to get it and try it out, you can go to code.erikdarling.com and you can do that there, but, uh, let’s, let’s do some live demo action on this thing. Uh, so you can kind of see how it works. And I’m going to show you just a couple of other things while we’re in here. So, um, what I’ve done is created, these are the 25 non-clustered indexes on this table. Uh, this is the script that I run to generate some reads against them. So that, against most of them, some of them I do want to be unused to test the unused rules.

And then this is the store procedure itself in all its glory. Uh, like all of my store procedures, uh, there is a help parameter. So if you run this, it’ll tell you, uh, what parameters are currently available. It’ll introduce itself like a proper, proper young gentleman. Uh, it will, uh, give you all of the parameters, uh, their data type, their description, their valid inputs, and their default values. Uh, and then if you run this just normal, and this runs pretty quick, at least on my machine, this is, these are the results that we just looked at, uh, where you can see all of this stuff that we just talked about where, um, you know, like you get the, all of the scripted output here. Uh, it’s lots of, lots of fun. And then down here you get the, uh, the reporting analysis on everything.

So there is lots of stuff in here. Um, it works. Uh, I’ve tested it. I’ve, I’ve actually done unit testing to like created like indexes that match rules and like extended rules and stuff. And, uh, everything seems to be working very well. Uh, so I’m excited to get out and try it on more real world servers. I’ve only had a chance to run it on two or three client servers to try and weed out some of the initial, like, I just wrote this script problems. Cause there are always, there always are some, and, uh, that was very helpful to be able to do that. But, uh, anyway, I look forward to you using it and giving me feedback on it. And, uh, I don’t know, I think, I think that’s just about it for this one. So, um, yeah, um, again, very excited over here in, in darling data land. Uh, so I hope you enjoyed yourselves. I hope you’ll use the script. I hope you’ll enjoy using the script. And, uh, again, if you, if you run into anything with it, uh, the, the place to go or rather the URL to go to is, uh, code.erikdarling.com. This redirects to my GitHub repo. You can, uh, you know, you can open up issues with the code. Uh, you can, you know, uh, propose contributions to the code, ask questions about it, whatever it is you need to do. That URL will lead you to GitHub where you can do all of those things and get support. Do not email me for support on this. I do not do email support at all, ever. I hate it. I will tell you to go to GitHub. Do not email me directly. Okay.

Okay. Anyway, uh, I think that’s about it here. Uh, all right. Yeah. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

All About SQL Server Stored Procedures: Temporary Object Performance

All About SQL Server Stored Procedures: Temporary Object Performance


Video Summary

In this video, I delve into the nuances of choosing between temporary objects in SQL Server stored procedures, focusing specifically on global temporary tables and table variables. I explore how these objects behave differently from each other, particularly when it comes to performance and recompilation issues. By walking through practical examples using SQL Server 2019 with compatibility level 150, I demonstrate the limitations of table variables—how they lack statistics and can lead to suboptimal execution plans due to their inability to understand the data within them. This sets up a contrast with temporary tables, which, while still having some performance challenges, offer more flexibility in terms of optimizing queries through recompilation hints and proper indexing.

Full Transcript

Erik Darling here with Darling Data. In today’s video we are going to approach the final frontier when it comes to choosing between temporary objects in SQL Server store procedures. Now, very little practical use case for global temp tables, the double hash sign ones. We’ve talked through a lot of differences between temp tables and table variables. This video ties back well to the kind of the video before last, so maybe I recorded these in not like an optimal order, but where I talked about recompilations with temporary objects because something that we’re going to look at in this video ties directly into like, like when I talked about table variables not causing recompiles because table variables don’t have statistics. And I talked about thresholds for temp tables. And I talked about thresholds for temp tables recompiling. Well, there are some things that can happen that mess up my whole spiel that temp tables are generally better for performance when temp tables are small. So we’re going to, what I want to walk you through in this one is how temp tables are sort of like using optimized for unknown hints and table variables are sort of like using optimized for unknown hints. In other words, I want to punch you when you do that. And I talked about thresholds for temp tables recompiling.

And temp tables are sort of like parameter sniffing where I’m like, cool, we have a problem. We have an interesting problem to solve when we do hit problems with them. Okay. So I’m not saying that temp tables are these perfect angels that always work. Just most of the time when you care about performance and you’re joining your temp table off to other tables, that’s what you should be choosing because you at least have a fighting chance of fixing performance problems with temp tables. Whereas with table variables, you are just screwed. So without further ado, okay, I lied. There’s a little further ado. And the little further ado we have to get through is my usual song and dance prior to the informational portion of the video. You can sign up for a channel membership. There’s a link down in the video description. If you like this channel and you say, you want to say good job, Eric with money, you can, you can, of course, do that. If you want to say good job, but not with money, you can do the free stuff up there. And you can even still ask me questions, uh, during for, for my office hours episodes, um, all of the relevant links down in the old video description. Uh, if you need help with SQL Server, uh, there are two important things for you to know about me. One, uh, I’m the best consultant in the world outside of New Zealand. And two, my rates are reasonable. Uh, uh, the other thing that you should probably know is that I’m, this is all the, this is like the stuff that I am best at. Uh, so if you need help with any of these things, this is the face that you want to see when, when your SQL Server is having problems, uh, if you would like, uh, some content for, to learn more about SQL, so SQL Server training content at the beginner, intermediate, and even expert level, well, I’ve got, uh, about 24, 25 hours of it. And you can get all of it for about 150 USD.

When you use the 75% off code floating right there on the screen next to me, it’s quite, quite a deal. Uh, SQL Saturday, New York City, 2025 coming up May the 10th. Boy, oh boy. I just, it just keeps getting closer. Doesn’t it? Like, like the asteroid that’s going to kill us all. Uh, taking place at the Microsoft offices in lovely Times Square, Manhattan. Um, if you’re anywhere in the area, well, you should, you should probably set this Saturday aside to show up and learn more about the thing that you do, you do the other six days a week.

Just work with datas, data’s bases and whatnot and all that other good stuff. But anyway, uh, let’s talk a little bit about, uh, the performance of temporary objects. Now, uh, what I want to do is show you this stuff in the context of SQL Server 2019 operating, operating a compat level 150 to sort of like split, split, split this branch off in the right place. Um, if you’re on SQL Server 2019 or a higher and your database is in compatibility level 150 or higher, there is a new optimizer feature called table variable deferred compilation.

Uh, prior to that, prior to those numbers, like earlier than 2019 and compat levels lower than 150, uh, you don’t get that. Uh, SQL Server just like will default unless you throw a dumb trace flag at it. SQL Server will default to a one row estimate from table variables.

Uh, and you can also throw a recompile hint to queries in those cases in order to get a table level cardinality estimate, not a statistical estimate of like what data lives in the table variable, but just a table level cardinality. How many rows are in the table, not what values are in the table, uh, you can throw a recompile hint at it to get that prior to all this. With table variable deferred, uh, compilation, you get slightly different guesses, but still none of them are statistical.

There is still no statistics object. There is still no histogram. There is still just a table cardinality guess. How many rows are in the table, not what rows are in the table.

So, um, what I want to show you here is that I do have my database in compat level 150. Uh, and I do have, uh, that particular feature enabled and turned on. It is on by default. I have not undefaulted it.

Uh, so we’re going to look at everything in that context. Now I’ve already created this index here, so we don’t need to recreate that. And, um, like if we look at the results of this query and you can ignore some of this stuff over here, this is just stuff I used to build up, uh, some text that was annoying to type out.

Uh, the parent ID, uh, column has the top, uh, five plus, top six, five plus one is what, what they call it in math. Uh, not just six, five plus one. Uh, it’s also what they call it if you’re building rate arrays.

Uh, the top five plus one in here where you’ll see zero has six million rows. And then after that, everything else has just something in the low hundreds of rows. So this is going to make for some fun times for us.

Now, what I want to show you is outside of the store procedure, sort of what happens. And this is not a, this is not a sensitivity issue. This is a table variables do not get statistics and have no idea what’s going on issue.

So if we run this code and we run this query, even with an option recompile hint, uh, SQL Server the first time around will not have a problem because the five rows that were in the table variable just did not have a lot of matches in here. Even though the cardinality estimates are like wildly wrong, uh, they’re not so wrong that this plan suffered from it. Okay.

Uh, if we reverse this in the words of Missy Elliott, and we put in zero, which if you recall zero had like 6 million rows associated with it. And we rerun this query, even with a recompile hint, this will take 10 or 11 seconds. Okay.

Okay. Because SQL Server still chooses the same execution plan, still, still makes the same cardinality estimates, but it does not really do a, actually slightly different cardinality estimates because only one row comes out of this. So the five rows that were in there influenced how many rows that thought would come out of here, which was 1755.

So this does change slightly, but it’s still the same exact plan in plan shape. Right. So this thing all takes just about 10, almost 11 seconds at the very end there.

So this is kind of the root problem with table variables is that SQL Server does not know what rows are in them. It only knows at best how many rows are in them. Right.

Again, not like how much data, not what data. Right. Okay. So with that out of the way, let’s talk about how this works in the context of store procedures. For simplicity, I’m going to use a table value parameter to handle this rather than a table variable inside of the store procedure.

Uh, the behavior here is just about identical in every single way that you would possibly have behavior, uh, within the store procedure, uh, and recompile hint or not, this is all going to go, you know, probably about as you’d expect. So let’s do this. Let’s run this procedure without a recompile hint and without this thing in there.

So if we run this query and we put, uh, the five rows that we care about into the table variable, again, we get that same execution plan. SQL Server makes that 1755. Oops.

Uh, go. No, you stay pink, but you clear out, makes that 1755 row guess. But because so few rows are actually matching for the IDs that we put in there, it’s not a big deal. So if we do the same thing with, uh, parent ID zero and we run this, see, we’re going to see the same behavior.

This is going to take just about 11 seconds again, right? So this thing is spinning and spinning and spinning. And like we get the same, not, we get the same query plan again, the same slightly different guess.

The, the guess is actually going to remain, um, actually, look at that. We got a slightly different table variable deferred cardinality estimate guess there, but, uh, it still remains a bad plan choice because SQL Server is not like guessing great anywhere, right? Getting 6 million rows in these loops is much worse than getting like, you know, 11,000 rows or whatever in the, in those loops.

So let’s come back and let’s throw an option recompile hint into the mix. Because people often credit option recompile with solving all sorts of problems. And you know what? I agree.

Anyone who says, uh, option recon, like be careful with option recompile, watch out for, you’re going to have higher CPU if you use option recompile all the time. Uh, they just like to make things hard. Option recompile solves a lot of problems.

I’m not, I’m not going to lie to you about that. Uh, so let’s try this again. And with the option recompile hint again, you know, the, this five row query, uh, you know, just about all this, everything is the same in here. Nothing really changes everything fine.

But you know, the, the bigger deal is that, uh, you know, the, the, the zero plan, right? When we execute this again with zero option recompile still doesn’t do anything for us. And just to like, come back and show you like this table variable is backed by this table type, which has a clustered primary key on parent ID.

We’re still not getting any stats on this, even with the index in place. And you can validate that in the query plan. If you look at the properties and you look at optimizer stats usage, the only thing that’s going to be in here is for the post table.

Note that there is no optimizer stats usage for the table variable. And you can, you can even further sort of, um, uh, you can even further sort of validate that by trying to update statistics on the table variable. SQL Server will give you this red squiggle.

And this is not just like a parser issue. See, like SQL Server will be like, you just can’t, you can’t do that on television. Right. It seems like you get an error just trying to create the procedure. So there are no statistics on the table variable to update, to try to improve things.

This is, this is where the temp table thing has a direct comparison. All right. So let’s, let’s do this.

Let’s come down here. And let’s create or alter this procedure. This is our, uh, this is our table variable. I guess it’s not really a one row table variable, uh, varying row temple table variable with a, with a temp table where, uh, rather than rely on the tape of the table value table variable or table value parameter. In this case, we’re going to create a temp table, uh, again, with the clustered primary key on it.

And we’re going to insert the contents of the parent IDs table variable or table value parameter in, into there. Okay. And then we’re going to do our work off the temp table here.

All right. So there should be fairly, fairly obvious. Note that I have the same update stats and option recompile things available to us in here. So let’s create or alter this.

And this is, um, this is the sort of issue that I wanted to show you with when you have very small temp tables. Now, if you, if you recall the video from one prior to this, where I talked about recompiles with temp tables versus table variables. One of the things I showed you was an extended event that said, that showed us when, um, uh, things were recompiling because statistics changed with the temp table.

And that the smaller the temp table was like, like you got a lot more changes at first, but then it was small. And then like, when it was bigger, like you sort of needed like more rows going in to trigger the, the stats, the, the statistics, the statistics update change. So if your temp tables are always, well, I’m like fairly large amounts of data, like anything over like a thousand, 10,000 rows or something like that.

You’re going to see a lot of recompilations because of statistics changing with the temp tables. Like that’s just going to happen way more often because you have like way more data going in and out to, to count towards the statistics, to count towards the, like the number of modifications that would trigger the statistics update. So what we have here is a situation where we have a very small temp table, but that’s very small temp table does not have a statistics update triggered on it.

Like when a larger one would have. All right. So let’s like, if we run this first with the five rows, we get the same very fast plan, right?

SQL Server chooses things in here and, um, like we get like, okay performance from this one still. So now because I want like, you know, I’m, I’m, I’m team temp table when it comes to like performance being good. What I want to show you is when that can go wrong and how you can fix it.

Uh, so let’s run this and let’s, let’s see what happens. Where, uh, we have this execution plan now, and this is going to, uh, give us, uh, the cash and reused execution plan for the query that just ran. Right.

This is also going to take just about 11 seconds. Right. But this, at least for us is a solvable problem, right? With the table variable, there was no solvable problem. There was no, there are no statistics to update on the table variable.

And the option recompile hint did not help us discover what rows were in the table variable that we, that would change the execution plan. But we do have some hope here because temp tables do get statistics usage. Notice that now, instead of there being just one thing that opens up, we now have two things under optimizer stats usage.

We are going to have one for the temp table and one for the actual table. So if we zoom in and I move this out of the way of my giant head, uh, here, here are the statistics on the temp table that did not exist for the table variable. Right.

And here are the statistics for the base object, uh, on the, on the post table. Right. So there are statistics here that we can use. The problem is that because this temp table is so small, it cached and reused an execution plan where it still thinks the five rows that were in there before are going to come out of there, which is not a very good guess. Now you have to do two things in order to address this problem.

You don’t need the full scan on this. I’m just choosing to do a full scan because it’s a very small temp table and who cares, right? It’s not going to, not going to take any longer fully, fully scanning five rows versus fully scanning one row.

Guess what? Not a big deal, but we do also need this option recompile hint here because this is what’s going to give us a fresh plan for this query. Just the option recompile hint or just the statistics update is not enough to get us across the finish line for this one.

So let’s run this and now let’s revisit running this in the exact same order, right? We get this execution plan. SQL Server still does an okay job with this one.

Now, when we run this for parent ID, just zero SQL Server will do a different job. Notice that now we get a completely different plan shape. We no longer have a nested loops join and we no longer have the seek plus a key lookup.

We have this query here, goes parallel, takes just about 800 milliseconds. And instead of that, we have a scan down here and this solves our performance issue. Now, you may get very lucky with your temp table usage and you may get something like this that happens where, let’s say that this happens in reverse order, right?

Let’s recompile all this stuff. I’ve taken out the update stats and the option recompile hint here. If we just run this for parent ID zero first, SQL Server is not going to have any problems with this.

SQL Server chooses that same plan for down here. And then if we rerun it for the five rows up here, SQL Server is going to reuse that plan here. So there are going to be cases just like with parameter sensitivity where you get a good plan that works fine for anyone.

And there are going to be cases where you get a plan that does not work so fine depending on the distribution of data. The big difference is that with, again, with temp tables, there are at least things at your disposal that allow you to solve the problem. With table variables, there is absolutely nothing at your disposal to solve the problem aside from using temp tables, right?

You can’t update stats on the table variable because table variables don’t have stats. And you can’t use a recompile hint to get anything better because SQL Server guessing one row and not knowing the contents or SQL Server guessing five rows and not knowing the contents of those rows. Again, the recompile hint with the table variable just gets you how many, not which.

That doesn’t help. So whenever you are dealing with performance issues around table variables, one of the first things that you should be thinking is, should I change this to a temp table? And you should be thinking, yes, I should.

Because even if you run into performance issues with the temp table, you have things at your disposal that you can use to fix them. Now, is this all too much? Is this all something that Microsoft should be handling better or the SQL Server could be doing better?

Potentially. You know, it’s a very difficult thing to split correctly because you have to manage a lot of sort of expectations and behaviors that people have relied on for a very long time. Sure, there are probably some scenarios where this could be better.

Like, but, you know, figuring out a way to do this during query execution or during query planning or optimization or to make this somehow more adaptive would maybe be a little too difficult. Like, you know, one thing that is sort of interesting is, you know, SQL Server 2017 plus has like adaptive joins, but adaptive joins won’t make a call if like there’s not a covering index in place. Like you’ll never see, well, at least as of this writing, like I’ve never seen an adaptive join that was like, you know, like gave you an option between like, you know, hash and loop join, where the loop join portion was like a seek with a lookup.

It was always just like you could like seek or scan, but if there’s a, there’s going to be a lookup, the adaptive join is short circuited. You don’t get one. So is there stuff that could possibly be done to the optimizer that would make this an easier situation?

Probably, but you know, you, there’s, there’s a lot to, there’s just a lot of stuff that would make this really hard to get exactly right. But this is where the, the skills of query tuners is still very valuable and very useful to people who have performance problems. And as always, my rates are reasonable.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we are going to talk about, well, I might, I might take a short break from the, the, the store procedure stuff to talk about some of the open windows over here, because they are, they are quite interesting to me anyway. So I might take a short break to look at these things, but after that, we will, we will be getting towards the, the, the very wrapping up end of our store procedure series.

So we’ll see how, we’ll see how things go over the next couple of days. We’ll see how, see how my motivation lines up with reality. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 8

SQL Server Performance Office Hours Episode 8


Hello friend, How many Hello friend, are there techniques to compare data returned from a sproc when making perf changes to make sure you are not messing up the results? I currently run the old sproc and the updated sproc and copy the results into a text diff app, but was wondering if there was a better way.
Hi Erik, What is the feature of mssql that you would like to see customers or dba use more ?
What are your top three books on performance tuning that you would enthusiastically recommend to someone about to embark on this journey?
Hi Erik, in AdventureWorks I have these two statements: select top 5 ProductID pid into #p from Production.Product; select * from Production.Product where ProductID in (select ProductID from #p); As you can see, the subquery is wrong, but SSMS doesn’t highlight the error, the query runs basically unfiltered. Do you know the reason?
Hi Erik, please do NOT consider my previous question. Thanks. I’m tired….

To ask your questions, head over here.

Video Summary

In this video, I dive into some SQL Server performance tuning tips and tricks during a lively Office Hours session. We tackle questions ranging from comparing data return between old and new stored procedures to improving code formatting for clarity and efficiency. Whether you’re looking to optimize your queries or just want to know more about the features of Microsoft SQL Server that could benefit your work, there’s plenty to learn here. I also share some of my top recommendations for books on performance tuning, providing a resource list for anyone embarking on this journey. So if you have any questions or need help with SQL Server, consider joining our next Office Hours session or exploring the resources available on my website, erikdarling.com.

Full Transcript

Erik Darling here with Darling Data, and I don’t know if you can get that pungent aroma in your nose. What is that delightful smell? That is the smell of Office Hours. Yeah. Alright. If you like this content, maybe you want to ask a question on this here Office Hours thing. I hesitate to call it a podcast because, God forbid, another person in tech with a podcast. Wow. No, not doing that. It’s not a thing. You can support this channel by signing up for a membership. You can like, you can comment, you can subscribe. All of the useful links to do these things are available in the video description. Gosh darn it. Look a little bit down below the handsome face. If you need help with SQL Server, maybe you watch these things and you think, gosh, that Erik Darling sure does know his SQL Server stuff. Well, you can hire me to do all sorts of SQL Server things for you. And as always, my rates are reasonable. If you want some training, I have it. I’ve got it. I would love for you to have it. I would love for you to have it so much that I give you this gigantic discount, 75% off. That brings it down to, what, $150 US dollars. You get that for the rest of your life.

Again, video description. What a nice place to be. We still have SQL Saturday, 2025. That is not the 2025th SQL Saturday. It is the year that it is taking place. The date that it is taking place is May the 10th with a performance tuning pre-con by Andreas Walter on May the 9th. And you can go to both. And you can see me at both. And I don’t know, maybe we can go smoke cigarettes outside together. With that out of the way, let’s go party here. Let’s have some office hours fun. And I’m going to have to shrink this down a little bit. It’s not really friendly to the… Oh, dear. What happened there? Oh, boy. All right. Well, you know, nothing’s perfect. Anyway, let’s get to this first question here. If Zoomit will decide to be my friend. Well, that word wrap cut off a little bit, but we’ll deal with it. Hello, friend. How many hello, friend? How many times can you type hello, friend?

Are there techniques to compare data return from a sprock? Godly. Just call it a store procedure. Why do you have sprock? Sounds ugly in the mouth. Sprockets. When making perf changes to make sure you’re not messing up the results. I currently run the old store procedure and then the updated store procedure and copy the results into a text diff app. There are easier ways of doing this, depending on the nature of the results.

If… Okay. So, one, if you have a store procedure that returns multiple result sets, we should probably talk. It’s not really a great thing. You know, like my analysis procedures do, but they’re not being consumed by an application. So, let’s hope and pray that your store procedures are just returning a single result set.

You could either use a global temp table and like keep it, like even like I understand, like you’ve made the performance changes. You’ve, you’ve been like, you’ve figured out this new version is faster than this old version. Great. What I would use is either a global temp table or a real table in the database, write the data out to it, and then use some variety of SQL, of the lovely SQL standard intersect and accept operators to see if the results are equivalent.

One thing to keep in mind is that depending on the nature of the query plan, the results may be correct, just in different orders. So, if the order of the results matters to you, you may, like if you have like an ID column or some sort of sorting element, you, you might need to make sure that, you know, like if you’re comparing the results, like one-to-one that you have that included in the query. But if you’re using intersect and accept, which are fantastic because they handle nulls.

So, if you’ve got nulls in your results and you’re like, oh gosh, how do I compare all these nulls? Intersect and accept are your friends for that. Let’s go and let’s answer the next question.

Hi, Eric. Hi. Hello. How are you? What is the feature of MS SQL? Miss SQL. Wow. Hello, Miss SQL.

Pretty lady you are. That you would like to see customers or DBA use more? Well, there are a number of things that I have in mind here. I wish that, well, it’s funny, right?

Because there’s stuff that I wish were closer to being the default in SQL Server than currently exists today. Just for example, I firmly believe that since 2005, the default for any new databases should have been the recommitted snapshot isolation level. So, I do think that, well, you know, starting with SQL Server 2022, query store is on by default.

So, that’s a good step forward. But, you know, we’ll see how this 2022, 2025 adoption goes. 20, you know, still, 2019 still leading the pack as far as I can tell.

So, those are two right off the bat. I also, like, I really wish that, so, the system health extended event does capture deadlocks. And it does capture a very limited amount of blocked process report stuff.

You can expose both of those using my store procedure, SP Health Parser, which will go in and get as much information out of those two things as you can get. But I wish that there were more dedicated stuff for the blocked process report and the deadlock XML report. In fact, I wish the blocked process report and the deadlock XML report were wrapped up into Query Store because they’re pretty important things for query performance.

Right? Like, it would just be nice if it were all centralized in one location where you could go and say, oh, this is where performance was awful. So, those are a few things that I wish people would use more.

You know, it does help to create specific extended events to capture the blocked process and XML deadlock report because it is much easier to access them via events specific to those things being captured than it is to go through all the system health extended event stuff. Because the system health extended event logs a ton of other stuff.

And, like, sometimes you lose information or it’s, like, just takes forever to parse through things and get to it all. So, those are a few things there that I think would be useful for people. Another book question.

What are your top three books on performance tuning that you would enthusiastically recommend to someone about to embark on this journey? The good news is for you that I just recorded a video. And coinciding with recording that video on books that I like, SQL Server books that I like, there is now a dedicated page on my website.

So, if you go to ericdarling, that’s eric with a K, erikdarling.com slash books, you will see a list of books that I enthusiastically recommend for SQL Server people. And, uh, this is sort of a twofer, but, you know, um, I do five at a time and these just happen to, these just happen to make the five question cutoff. Uh, so, uh, here we have our beleaguered friend bemoaning their, their exhaustion.

I, I, you know what, friend, I, I hear you. I’m, I’m tired all the time. I, I, worn out.

Uh, please do not consider my previous question. Thanks, I’m tired. Well, uh, we’re going to answer this question anyway because this, this question, like, speaks directly to a lot of things that I’ve talked about here, uh, both in the past and present and probably will again in the future. And that is around code formatting and proper aliasing and all that other good stuff.

They’re saying that they have two statements where they select the top five product ID. Notice that product ID is aliased as pid here without an as. Nuh, nuh, nuh, nuh, nuh.

Always put an ad. Like, like, first off, like, table aliases should be as. Column aliases, you know, it’s a lot more clear when you say pid equals something than pid as. Like, like, you have to read too far over this way to get to the as or too far down.

It helps to have the column names all lined up. And then, uh, the second thing that they have is select star from production dot product where product ID and select product ID from P. Well, there’s no product ID column in P because you name product ID pid in P.

So what’s, what you’re, what you’re not doing down here is aliasing your, your pound sign temp table. And so SQL Server is resolving this product ID column to the product ID out here. And, and that’s where things are going awry.

So please, for the love of God, write your queries clearly. Alias things properly. The sooner this, the, the, the better you are at writing your queries, the less confusing this stuff becomes. You can, you can, you can become a smarter person overnight just by taking the time to be a bit more verbose in your T-SQL.

Stop taking shortcuts. Stop leaving words out. Stop putting things in funny places.

Stop putting commas over here. Commas belong at the end. Uh, column aliases belong as column, as a column name equals expression. Stuff like that.

And, you know, when you do, when you do things like this, when we are sloppy with our code in these ways, we hit all sorts of strange, unexpected bugs. This can, this does not just go for, um, like the, the, the mere writing of the query. This goes, this goes to all sorts of things.

A number of strange performance, uh, oddities that I have encountered in, in silent, in bugs that have been silent for years that, um, have cropped up just because people were lazy with things like this. Or, uh, lazy in, in using the correct data types for things, uh, is, is pretty big. So, uh, when you write your queries, make sure, make sure, you know, you and SQL Server are able to understand them.

And make sure, like, when you write your queries, think if, if the dumbest person, person in the world came and looked at this query, would they be able to figure out what it’s doing? If Erik Darling came and looked at this query, would he be able to understand what it’s doing? What would Erik Darling say about this query if he saw it?

Well, these are questions that you should ask. Anyway, me and Bats, I’m going to sign off. Uh, I do have some more office hours questions to get through, but since I, I do the five at a time because Bats loses patience with me, we’re going to, we’re going to, we’re going to stick those in a new video. But, uh, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. I hope that you will format your queries properly. And I will see you in the next video. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

All About SQL Server Stored Procedures: Table Variables for Logging

ll About SQL Server Stored Procedures: Table Variables for Logging


Video Summary

In this video, I delve into the nuances of using table variables in SQL Server stored procedures, particularly focusing on their role in logging errors and handling rollbacks. I explore how table variables can be useful for tracking progress within transactions, especially when you need to maintain a record even if an error occurs or the transaction is rolled back. However, I also highlight that while they are handy for certain scenarios, table variables might not always be the best choice for logging data, especially in complex operations involving large datasets. The video includes practical examples and discussions on when it’s appropriate to use table variables versus other methods, ensuring you make informed decisions about their implementation.

Full Transcript

Erik Darling here with Darling Data. And in today’s exciting, outstanding, completely AI-free video, we’re going to talk about store procedures, still. Carry on talking about SQL Server store procedures. But we’re going to talk specifically about using table variables in the context of logging things about your store procedure in the face of errors and rollbacks. Because for some reason that I cannot explain, every time you are talking about the performance differences between temp tables and table variables, and you’re like saying, hey, if we change this table variable to a temp table, we can make this procedure go faster, someone will decide to chime in with the time in with the time in with the time in with, but table variables, they survive errors and rollbacks. And you’re like, so what? Except with more words, so instead of a long so, inject some colorful language into the so what? Because completely irrelevant to the topic at hand. We’re talking about, when we’re talking about performance, we care about performance, we don’t care about errors and rollbacks. It’s like we’re looking at a reporting store procedure that doesn’t even do any, there’s not a transaction in here. There’s not an error, there’s not a rollback, there’s not a commit. There’s nothing. Why? Why do you bring this up? Why? Why did it, why do you decide to inject the conversation with this meaningless knowledge? You just, you just need to act like you learned something at some point? I don’t understand your point of view on this. Anyway, before we do that, let’s talk about you and me and the birds and the bees. So if you like this channel content just enough to spend four bucks a month, on it, you can use the link in the video description to join the channel as a member. If you do not, if you have not quite reached, not quite reached consensus or formed an opinion on the channel, you can do other stuff. In the meantime, you can like, you can comment, you can subscribe. And if you feel so inclined, you can even ask me questions at this link, which is also done in the video description, that I will answer on my office hours episodes.

When I answer five of your questions at a time and we have fun. If you need help with SQL Server, this is an unruly beast that needs taming that needs performancing. You can, of course, pay me money to take care of them. Health checks, performance analysis, hands on query index server tuning, you name it, responding to performance emergencies and training your developers so that you do not run into as many performance emergencies. is all the name of my game and as always, my rates are reasonable. Promise. Take a look around. If you would like to get some training from me in lieu of perhaps other things, if you would like some real high value content, you can get all 24 hours of my training for 75% off.

That is around 150 US dollars and that comes to you for the rest of your life. Link to do all that stuff again down in the video description. Upcoming events, we still have SQL Saturday, New York City taking place in the Microsoft offices in lovely Times Square in Manhattan.

It would be a great time. You can come, you can take, we can take selfies. I don’t know, we can do whatever cool fun stuff people are still allowed to do at conferences these days. Barring any, of course, code of conduct breaches. We don’t want to do that.

We want to have a nice family friendly time at SQL Saturday. With that out of the way though, let’s talk about table variables and logging stuff. So here is pseudocode.

And you know what? You just reminded me that I need to fix a small typo in here before I keep talking. So we’re going to do that and we’re going to pretend that didn’t happen and then we’re going to look at the rest of this stuff. So let’s say that this is our pseudocode and like reasonably intelligent people, we are going to, within the context of our store procedure, set no count in exact abort on.

Then we are going to begin to try. Every day I begin to try. Sometimes things happen along the way that interrupt that. And then we are going to end try and, well, you know, there are a lot of reasons to end trying, I’ll tell you that.

But then in between all that, we have a begin transaction and a commit transaction. Now, of course, you don’t want to really need to do this for a single query. If you have a select or an update or a delete or an insert, you don’t really need this, right?

Because SQL Server is going to be working in auto commit mode where that query will happen within its own transaction anyway. But if you have a group or if you have a flock or a murder of transactions, of queries rather, that you need to put in a single transaction because they all need to complete or not complete as one group. Like Wemmings, they either need to like make it to the top of the hill or fly off the cliff together.

Then you would want to do this. If within this transaction, you want to figure out where along the way you have done things and how long they took and how many rows are affected and things like that, then you can log that stuff to a table variable. And if you hit an error or in like, you know, you like, you know, you hit an error and all this stuff rolls back, then you can still put that data from that table variable into a logging table that you can review later.

You don’t just have to return it out to like whatever client is running down in the commit transaction or rather down after the commit transaction in the begin catch block. We will, of course, do this. Say if trend count is greater than zero, we’re going to roll stuff back and then we’ll insert into our logging table whatever data we have logged in our table variable that has survived the rollback. Because remember, we don’t roll back things that got inserted or updated or deleted from table variables here.

That table variable will be alive until we do this. We can put that into our logging table in the catch block and then review that actual logging table later. You can put all sorts of like good information in here, you know, proc ID, error number, error line, error message, all that other stuff.

There’s lots of things that you can put in there that will make life somewhat easier for you when troubleshooting a problem with a procedure. One thing that I’ve not really come up with a good way to manage is like logging the data that was in there. Like if it’s a small amount of data, it’s not a big deal.

But if you’re doing this for like ETL and you need to move millions of rows around table variables, table variables are just going to hurt more than they help. So probably don’t do that. The second way that you could do this is if you have a store procedure that does something like this where, and again, you don’t need to put like, like wrapping single queries and this stuff is not really useful.

But if you have like groups again, like just like the last one where every query in the procedure had to go and like, you know, ride or die. If you have groups of queries, maybe not like the whole list of queries, but like, you know, groups of queries where along the way, like in all of these do stuff blocks need to ride or die together. Let’s say there’s a, there’s like two or more queries in here that need to complete and, you know, do that stuff.

And you need, and you want to log progress for each query in here, then a table variable would still be your friend. Because if anything happens within any of these down here in the catch block, we can still do this. If, and this is a big, if you are only logging stuff around each transaction, meaning like, let’s say that like before each begin transaction and after each commit transaction, you’re like set, you know, current step.

Equals, you know, transaction one. And then after the commit, you’re like set current transaction equals step two. Then the table variable becomes less useful because it’s not logging anything within the begin transaction commit.

So there’s like the way of thinking about this is anything that’s happening within a transaction could be useful to log to a table variable. If you’re going to do something with it, otherwise it’s probably not. Otherwise you could probably either don’t need a table variable or you just don’t need to be logging stuff, I guess.

This is the easy way, easy way of saying that. But the key thing here is that if you’re not, if you’re not logging, if you’re not doing stuff within each of those transactions, then you can still do the logging down here if you want. Right.

Like that’s easy enough. You just pull out whatever data came along with that. But, you know, for this part, if you are logging stuff about each step between the begin transaction commit, then you would want to put the stuff into the logging table from the logging table variable. And I am not sad to say that this is the very end of where table variables are useful for logging errors, rollbacks and data about things.

At least in my experience, practically speaking, you could probably overly engineer other, you know, solutions to do things like this. But the table variable will not be necessarily useful or even required, will not be a necessity for doing that. That would be your choice to use a table variable when you just didn’t need to.

You could have just used regular variables and you could have just logged very simple, easy information about the error you hit, the procedure you hit the error in, all that other stuff where you didn’t need to put that in the table variable. You just chose to do that for some weird reason, because you are a white knight for table variables and you, you just need to let them, you just need to make them shine. Right? You need to force them to shine like a, like a boy band, just manufacture them being good at anything.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that you will not manufacture use cases for table variables. I hope that you will only use them when they are necessary, required and pertinent to the solution at hand.

And yeah, in the next video, we will talk about the thing that really matters when you are choosing temporary. And that is, of course, performance. We’re going to go over some material that I covered in a video somewhat recently that makes sense in this context as well. So I will see you then. And until then, I hope, I hope you are smiling.

Hope you are a happy camper, just like me.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

All About SQL Server Stored Procedures: Plan Cache Pollution

All About SQL Server Stored Procedures: Plan Cache Pollution


Video Summary

In this video, I delve into the intricacies of temporary tables and their impact on SQL Server’s plan cache, specifically how they can lead to plan cache pollution or an abundance of query plans. I walk through creating and executing stored procedures that use temp tables in various configurations, demonstrating how these operations can result in multiple entries in the plan cache. By understanding this behavior, you’ll gain valuable knowledge for managing your SQL Server environment, even if it doesn’t directly dictate your choice between temporary tables and table variables. This is part of our ongoing series on SQL Server performance optimization, with future videos focusing on practical performance considerations to help you make informed decisions about these critical components.

Full Transcript

Erik Darling here with Darling Data, and we are going to, in this video, continue our joyous journey into SQL Server Store procedures, and we’re going to talk about how temp tables can cause plan cache pollution, or lots of query plans. The thing that I, two things that I want to say about this up front, one, this doesn’t affect query store, because query store doesn’t care about these things, and two, that I don’t care about the plan cache. I used to love the plan cache. I used to do a lot of work on SP Blitzcache to, like, get in there and find stuff and dissect that XML and, like, really, like, analyze plans. And then the more I did that and the more I used the plan cache, the more I was like, wow, this plan cache is three hours old. What am I going to talk about? What am I going to talk to you about? What, you want to know, like, why things were bad yesterday or a week ago? Get a monitoring tool, bucko. Like, I just, I just have a very hard time finding much utility in it all in the plan cache, aside from when, perhaps, query store settings are not capturing certain query plans. Like, with query store, you can set it to capture all, which means everything, which means everything, which is too much for most people, or you can set it to auto, so that some internal mechanisms figure out what plans belong in the plan cache. And then there’s, like, 2019, I think, introduced all sorts of, like, query store capture policy. So you can set specific things on, like, execution, CPU duration, things like that, to figure out which queries you’re going to allow to end up in there. So I do want this to be, this, I want you to file this under, like, SQL jeopardy, like, good knowledge to have, but maybe not knowledge that’s going to be important for you understanding when to use a different type of temporary object. This is not an excuse to use table variables. Please don’t take it as such, because I will come to your home and smack you until you cry, and no one in your family respects you anymore.

So, with that out of the way, if you would like to avoid that situation, you can get a membership to the channel. I don’t know if I’m allowed to do that. I think that might be extortion, or racketeering, or one of those, one of those RICO predicates. Anyway, there’s a link in the video description for you to do that. If perhaps someone else has extorted you, or racketeered you, or whatever, and you just have no more money, your pockets are turned inside out, you can feel free to like and comment and subscribe, or else. And you can ask me questions, and you can ask me questions privately that I will answer publicly on my office hours episodes of the Darling Data Dandy Hour, or whatever. I don’t know.

If you need help with SQL Server, and you want someone to threaten SQL Server into subservience and performance yeses, I am a consultant, and I consult on all matters related to SQL Server performance, and more. Health checks, hands-on tuning, responding to performance emergencies, and tuning your developers, actually. I will tune your developers, so that you don’t have performance emergencies anymore. You can avoid those in the future. You can finally sleep through the night. No more pagers going off, or whatever happens.

I don’t know. Maybe it’s too soon for that one. Anyway, if you would like to get some SQL Server training from me to you for the rest of your life, for about $150, you can go to training.erikdarling.com, where you will see the full expanse of my hand. And you can use that discount code. And again, there’s a link down in the video description, so you can get all that stuff. SQL Saturday, New York City 2025.

You can come to me. You can come see me in person. You can see this Adidas shirt in person. Maybe it’s not this specific one, but an Adidas shirt. I’ll be there serving lunch, smoking cigarettes, maybe getting drunk out back. Who knows? But anyway, come to the event. It’ll be a great time. With that out of the way, let’s talk about these plan cache shenanigans with temporary objects.

Now, what I’m going to do is set up a couple of store procedures and run them in a few different windows, and then run a query that looks at the plan cache. So the first one, and there’s an alternate version of this one down here. We’re going to talk about that in a minute. This first one is called a spid. And what this thing does is creates a table called a spid and inserts a value into it.

And then we have another procedure down here called no spid, which creates a table, inserts a value into it, and then executes this store procedure. This, like, a spid, right? So this store procedure executes a store procedure above it. There’s an alternate version of this where I rename the temp table to match the name of the procedure.

One thing that I find is very, very useful to avoid these types of problems is to give your temp tables very unique names. Do not just name them all T or P or A or C or D or A1 or T1 or something. I have, of course, been guilty of that in the past, so I’m not, like, busting you down about it, because I’ve done it too.

But the longer you live, the longer you learn, the more you realize unique names for temporary objects that are descriptive of their task are often a good thing. So I’ll show you that second, though. And the second way I want to show you this is with a slightly different setup, where we have this not internal store procedure, which sort of is, I guess, kind of weird.

But this is just going to insert values into a temp table, but the temp table is going to be created down here. But I’ll show you that in a second. And I have this one equals select one here just to prevent simple parameterization.

I forget why I stuck that on there, to be honest with you. I don’t think it’s necessary for the purposes of this demo. But then we have this store procedure up here called internal, which creates a temp table called internal, selects a count from it, and then executes the store procedure not internal that we just looked at above. Quite frankly, I do believe I named those backwards.

But anyway, let’s make sure that we have all these in place as, well, I mean, I was going to say as God intended, but it’s pretty much how I intend. But for all intents and purposes here, I guess I am God. We’re also going to clear out the plan cache.

I know I just did that, but I like to make extra sure. And then what I’m going to do is I’m going to run this, both of these store procedures in this window. I’m going to run both of these store procedures in this window.

And I’m going to run both of these store procedures in this window. So I run these store procedures three times across three different spids. Now, we’re going to look in the plan cache, and we’re going to use a very specific query that does a little bit of an extra thing, where we’re going to cross-apply to sys.dm exec plan attributes.

And we are going to look for the attribute optional spid. Okay? And we’re going to see the values for that up here.

Okay? Attribute and value for optional spid there. And looking at the results, what we’re going to see is two references to internal and one reference to no spid with a value all of zero. So, like, if you just have a temp table in a store procedure and you call that store procedure, SQL Server doesn’t have to do anything interesting with it.

So, as soon as you reference that temp table with a store procedure that gets called by the main store procedure, or you call another store procedure that creates a temp table with the same name, SQL Server has to figure out some way of differentiating things. Because it’s the plan. It’s the plan cache, and the plan cache is full of goblins.

And what we’re going to see down here is three different plan cache entries, each for the sub store procedures. Okay? And each one for optional spid is going to have the value of the session ID that called it.

So, we have three from session 69. I didn’t do that on purpose, I swear to you. I couldn’t possibly have.

That’s going to be that first window that we executed stuff from. Then we have three with 74. That’s going to be the second window. And three with 75, and that is the third window.

So, again, file this under things that are good to know, not things that should dictate how you choose between temp tables and table variables. Before we go, I want to show you one additional piece of good news. And that is that if we run that first demo, right?

But we create a different version of no spid where we give the temp table that gets created in here a more unique name. So, rather than this thing being named a spid like it is in this one, we’re going to name this no spid, which matches the name of the procedure. Then we’re going to run this.

We’re going to clear out the plan cache and run this again. And we’re going to run just no spid here. Oh, wait. You know what? I have to do that down here first, don’t I? I do. We’re going to run no spid here first.

We’re going to run no spid here second. And we’re going to run no spid here third. And now, when we look at the plan cache, both of these are going to have a value of zero, right? So, unique temp table names do help reduce this problem.

But it doesn’t matter if you have a store procedure that references a temp table from another store procedure. Okay? So, if you have this outer procedure, you create a temp table, and then inside your other store procedures, you do stuff with that temp table because that’s perfectly valid.

It’s all in scope. Then you will end up with the optional spid thing. If you end up with the optional spid thing because you created temp tables with duplicate names across store procedures, an easy way of fixing it is to give your temp tables more unique names.

Again, this is not a good reason to pick table variables which don’t cause this outcome. I’m not going to say cause this problem. I’m not going to say cause this issue because I don’t consider it much of a problem or an issue given how crappy the situation most plan caches are in generally.

I suppose you could somewhat improve the situation of your plan cache by following my advice. But, you know, whatever. It’s the plan cache.

Use query store anyway. But this is something that is good to know about because you might be one of those people who still has that awful folder full of scripts from, like, dates going back to, like, 2002 that mined the plan cache for certain things.

And you might run them and you might see lots of query plans for procedures that have temp tables in them. And you might wonder why and say, dear God, I thought every query just got one plan. What is happening?

Microsoft has betrayed me. How could I possibly overcome this betrayal? Well, there’s ways. There’s ways.

Here’s your ways. Give your temp tables unique names. That’s the easy way to do it. But that doesn’t help if you are referencing a temp table from one store procedure in another store procedure. It doesn’t matter how unique the name is because you’re just calling back to that first one anyway.

Maybe multiple procedures and queries will do that. I don’t know. It’s all wild. Anyway, thank you for watching. I hope you enjoyed yourselves.

I know you learned something. But you learned something that is just good knowledge. This is not knowledge that you will use to dictate use of temporary objects. In the next video, we’re going to talk about performance.

And that is going to be what you will use to dictate your choice between temporary tables and table variables. This is not what we’ll do. The previous video on recompiles probably isn’t what’s going to do it.

The next video on performance, that’s what’s going to do it. All right. Cool.

Thank you for watching. Thank you for watching and fully comprehending everything that I say. I know that reading comprehension is somewhat difficult. But hopefully listening comprehension is much easier because I speak in a clear, precise, and authoritative tone.

All right. The dad you never had. All right.

Well, anyway. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

All About SQL Server Stored Procedures: Nested and Autonomous Transactions

All About SQL Server Stored Procedures: Nested and Autonomous Transactions


Video Summary

In this video, I delve into the intricacies of nested transactions in SQL Server, addressing a request from viewers who wanted to understand how these work and when they might be useful. I explain that while SQL Server behaves differently compared to other database engines like Oracle, it still doesn’t support partial commits within nested transactions as one might expect. Instead, all changes are rolled back if any part of the transaction is rolled back. To illustrate this concept, I create a simple stored procedure and demonstrate how transactions work in practice, showing that even though you can begin and commit multiple transactions, rolling back an outer transaction will undo everything inside it. This video aims to clarify common misconceptions about nested transactions and provide practical insights for SQL Server users.

Full Transcript

Erik Darling here with my second take recording this video. I hit a new professional low where I actually recorded about half of it and then somehow ended up hitting the stop record button and didn’t realize it and just kept talking. And then when I was done, I went to hit stop record and I was like, oh, yeah, that’s fun. So anyway, in this video, I’m going to hit stop record. In this video, we are going to get back into talking about store procedures. Now, I talked about transactions a couple of times, a couple of ways, but one of them in this series, another one was just sort of a funny short video about someone saying while ADAT trancount is greater than zero, roll back, which is just amusing to me. I have gotten some feedback that people would want to know exactly how nested transactions work and how you might be able to actually get a nested transaction to nest because SQL Server behaves a lot differently than a lot of other database engines when it comes to nesting transactions.

I think probably the prime example is Oracle, which does allow for partial commits in nested transactions, where SQL Server, well, you get, y’all don’t get rolled back when you roll back the transaction. We are not going to talk about save points because screw that. And yeah, that’s about it there. Anyway, if you would like to support this channel, if you sign up for a million dollars, I’ll talk about save points. If you barring that, not getting into it. You can use the video, the link in the video description to become a paying member of the channel for as few as $4 a month. You too can support a starving SQL Server consultant.

Then maybe I can stop answering questions about why my face looks skinny lately. If you like the channel, but maybe not enough to put a ring on it, you can like, you can comment, you can subscribe, and you can ask me questions privately that I will answer publicly here on my Office Hours episodes. It is not a podcast. Most vociferously, not a podcast.

If you need help with SQL Server, I am, they don’t call me Eric Reasonable Rates Darling for nothing. My rates are reasonable, and I am, again, editorially presented with, by Beer Gut Magazine, with being the best SQL Server consultant in the world outside of New Zealand. So, I don’t really see what you have to lose there.

If you need SQL Server training, and you don’t want to pay, like, two grand, you can get online for about $150. It’s about 24 hours of content, beginner, intermediate, and expert, maybe even some beyond expert stuff. And you get that for life. You do not have to subscribe to that.

You just sign up, and you’re in. It’s just, that’s it. You’re officially a dues-paying member, and you’re allowed in the clubhouse whenever you’d like. SQL Saturday, New York City 2025, is taking place on May the 10th, with a performance pre-con by Andreas Volter on May the 9th. You are cordially invited to both, and you are also cordially invited to bring me gifts.

You can bring me presents, preferably in the form of low-value currency. So, no hundreds and fifties, just stuff that’s easy to spend at the bodega. So, keep that in mind.

But with that out of the way, let’s talk about nested transactions. Now, SQL Server does have the concept of autonomous transactions. Now, I had that window open already. Good for me.

Now, you might look at the data on this blog post, and you might think, golly and gosh, that’s old. There must be a better way. Guess what? It’s not. You still have to do all this stuff.

Now, this is from back when SQL Server, rather when Microsoft, used to have good SQL Server blogs. If you’ve read a SQL Server blog post lately, you might notice that they’re not good. And they used to, way back when, look at this, back to 2006.

What were you doing in 2006? What was I doing in 2006? Being young, having fun, life worth living and all that stuff.

But Microsoft used to have good SQL Server bloggers. So, I would suggest reading this content before Microsoft makes it disappear. Because one thing Microsoft is famous for is making good stuff disappear and replacing it with crap.

So, please, you know, support your local internet archive or something. Because this stuff ain’t forever anymore. But this post walks through the concept of autonomous transactions, how to make them work by using a loopback linked server.

If you come down here, you’ll see some of this stuff. That is a really aggressive use statement up there. I don’t know if I agree with that.

But you’ll see where they create a linked server called loopback that just connects to your server, right, which is add at server name. And then you’ve got to do some stuff. And then you can get transactions that do partially commit doing that.

You can’t do it really any other way. Unless you want to write absolutely bonkers stuff using table variables and save points and other things. Where, like, it’s so obtuse and edge casey that I don’t want to write that code because I feel like I would get something wrong with it.

And one of, like, three people in the world who would know when that code is wrong would make fun of me. So, we’re not going to do that. So, I will hopefully remember to put that in the video description or get yelled at either way.

But the way that a lot of people think nested transactions work because, like, when you look at nested transactions, it makes sense for them to work like this. But they don’t. And this does work in other databases like Oracle.

I think there’s even a mention of that in the Microsoft post. At least I saw the word Oracle and there weren’t, like, devil horns on it. So, I assume they said something okay about it. But the way that you would expect it to work is, like, begin a transaction called T1.

Do some stuff. Begin a transaction called T2. Do some more stuff. Commit transaction T2 and just have this be, like, out of the picture, right? Like, you saved your changes from this.

You’re done. And then roll back. And then, like, if you wanted to roll back T1, T2 would be left alone. But that doesn’t happen. SQL Server does not do this. SQL Server rolls back everything.

There are no partial commits, right? At least none that stick around in Survivor rollback. So, the way that I want to show you that, like, that concept in SQL Server is I’m going to create a simple table and a simple store procedure. And this store procedure is just going to do, well, I mean, I guess essentially three things.

It’s going to begin a transaction and commit a transaction. And in between that, it’s going to do two inserts into this table just based off whatever values I pass in. And then just to not get a primary key violation, I’m going to add one there.

Outside of the store procedure, I’m going to hopefully remember to begin a transaction called T2, run the store procedure, run a select to show you what values ended up in there, then roll back T2 and show you what values are in the table afterwards. So, if we do this and we run this, we get exactly, we get, like, a query demo proving exactly what I just told you.

Well, while T2 is an open transaction, we can see these two rows have committed to the transaction table. We have both rows that the store procedure inserted in there and committed, right? Like, this thing did begin transaction commit.

We didn’t stop at all. Like, the entire procedure executed. And then later, when we rolled back transaction T2, that undid the T1 transaction inside of the procedure. So, you can’t do that.

Like I said, you can get some of the aspects of an autonomous transaction if you use table variables and write really complex code. I don’t recommend it. You will spend more time dealing with weird issues than you will be happy that you wrote it.

So, maybe don’t do that. Like, save yourself some pain. I mean, you know, like, if you want to get autonomous transaction behavior in SQL Server with as little pain as possible, and I say as little pain as possible because you’re still dealing with linked servers and that’s absolutely no fun, then the instructions in the post that will be, remember, in the video description will walk you through that and how to do that.

So, that is the least painful way, at least, that I’ve come across. I’ve seen various people try to get the autonomous transaction thing working with table variables and save points and all this other stuff, but I’ve never seen a happy person try to do that.

So, I want you to be happy out there. I want you to be happy, bright, sunshiny people who have great weekends and don’t try to do overly ambitious, borderline stupid things with their databases because we all know how that ends up.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And I will see you in another video, another time, another place, another you, another me. It’ll be beautiful, though. Well, hopefully we’ll remember each other. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

T-SQL Tuesday 185 Wrap Up: Video Star Edition #tsql2sday

T-SQL Tuesday 185 Wrap Up: Video Star Edition #tsql2sday


Video Summary

In this video, I recap the engaging T-SQL Tuesday 185 event where community members were encouraged to share their thoughts through videos rather than blog posts. The response was quite diverse and entertaining—ranging from humorous technical mishaps to insightful demonstrations and heartfelt reflections. Highlights included Rob’s creative green-screen trick in SQL Server Management Studio, Andy Levy’s exploration of the Object Explorer Details feature, and Andy Yoon’s practical stored procedure for expanding view references. Other submissions covered topics like flat file wizard capabilities, content consumption preferences, empathy as a technical skill, and personal experiences with presenting and blogging. It was fascinating to see how these videos provided not just information but also a glimpse into the personalities behind each contribution.

Full Transcript

Erik Darling here with Darling Data, and this is going to be my T-SQL Tuesday 185 wrap-up video in which I asked the nice folks out there in the SQL Server community to record it, rather than write a blog post about a specific topic, to just record a video about anything. And I got, I don’t know, seven or so pingbacks on that. If anyone out there recorded something and didn’t ping me back, sorry. If you don’t tell me, I don’t know. So, anyway, first, you know, in true he whom the gods would destroy they first make mad fashion, the video that I recorded to invite people, I had a weird little audio glitch going on. And I thought that I fixed it, and then I didn’t fix it, and then I didn’t get a chance to re-record it, and then, I don’t know, I was fully expecting at least one of the video submissions. to make fun of my bad robot voice in the video, but everyone was just, it was kind enough to not make fun of my slight technical difficulty. I’ve managed to avoid a lot of those in recent videos, so, I don’t know, I guess, I don’t know, maybe I’ve earned it. Anyway, our first video came in, of course, I’m going to say this video came in first, but only because Rob is cheating with time zone magic.

Usually he’s cheating with normal magic tricks, but this is just time zone magic. And Rob’s video, he talks about the cool mappy scroll bar on the side of SQL Server Management Studio. Of course, that’s available back, I remember when that first came out, but, you know, at first I didn’t like it because it was, like, too big on the side, and, like, when you hover, like, this, I know it’s an option, but, like, you hover over it, and you get, like, this, like, giant, like, preview of the text in there. And, like, it just, like, got in the way a lot with stuff, you know, it’s, like, there’s a certain amount of tooling where it’s just, like, sometimes these, like, pop-up things are helpful, and then sometimes it’s just, like, obtrusive. But with SSMS 21, I think I’ve been liking it a little bit more. I don’t know why. Maybe it’s the dark mode, who knows.

But Rob actually did a very cool trick where he green-screened himself into SQL Server Management Studio. I might steal that from you someday, Rob. I don’t know when or why or how, but I’ll figure it out and do that. Anyway, thank you, Rob, for this lovely video. Next up, we had Sir Andy Levy talking about one of actually, one of my favorite things in SSMS that, this is, like, one of those things that, like, actually kind of wows clients when I’m on the phone with them, is the Object Explorer Details. So, like, you can either, like, right-click and go to Object Explorer Details, or if you’re in SQL Server Management Studio, when you, like, highlight a database or a server or something, and you press F7, that’s F like Frank 7, you get this new thing that pops up that gives you all sorts of neat details.

And one of my favorite things about it is, like, just to give an example, if you have a table with a bunch of indexes on it, and you want to script out all the indexes and just see what’s in there, if you hit F7 and you go to Object Explorer Details, you can actually multi-click stuff and right-click on everything that you’ve just multi-clicked, and you can hit Script, and you can get all the indexes rather than clicking on, like, one index at a time and scripting it.

It’s very convenient for many things. So, good job there, Andy Levy. You read my mind, or something. Next up is Andy Yoon with an actually very helpful stored procedure.

I would be terrified to run on some of my client environments. It is called SP Help Expand View. And what SP Help Expand View does is, if you have a view with a bunch of nested views in it, it’ll go through and find all the view references.

And there’s an optional mode where it will give you a count of, like, how many times things are referenced in the view. So, that’s a very, very cool thing to have. Like I said, I’d be a little afraid to run it and get, like, 90 columns of nested views back.

But if you’re feeling brave and bold out there or just particularly pioneering in one of the environments you work in, I would highly recommend using this to help yourself untangle the nastiness of nested views. So, well done, Andy.

Well done. All right. The next submission I had was Steve Jones pushing the boundaries of cutting-edge technology with testing the flat file wizard, in which Steve discovers that the flat file wizard can handle multiple delimiters, not just commas and fixed width, but also pipes and some other stuff.

And then at the very end of the video, Steve submits a pull request to the Microsoft Docs site where he makes some improvements to that. So, Steve really, like, pushed the envelope of not only data engineering, but also DevOps and, I don’t know, something else probably. But good job, Steve.

Steve, we now can use the flat file wizard with a bit more confidence, I hope, when we’re dealing with our flat files. All right. The next submission I got was from Deb, whose background, for some reason, looks nothing like Andy’s background.

At least last I heard, they’re still married. I hope I’m not messing anything up there. But she is asking some questions about stuff.

She wants to know how you consume content. She wants to know the type of content you prefer. She brings up some good points about videos where, like, it’s harder to watch videos. It’s harder to sit down and dedicate the time to, like, sit there and watch one thing and just, like, absorb that.

If you have something written, you can, like, go to it and come back to it and sort of, like, follow stuff around and, like, you know, take bites as you can. With videos, it’s a little bit harder to do that. I often find myself with, I mean, anything audio related, you know, it’s not just videos where there’s, like, a visual component.

But, like, I’ll find myself, like, I’ll put something, like, on a podcast. I’m like, oh, I really want to hear about this. And, of course, I’m sitting at my computer.

So I’m nip, dip, dip, dip, dip. And, like, you know, like, 45 minutes goes by and I realize I’ve absorbed nothing from over here. So valid points. I get it.

But, you know, also talks a bit about how the bots out there are stealing the words and not giving credit for the words. And, I don’t know, I think probably Sam Altman and, by extension, Microsoft owes all of us a very big royalty check for all the training material they’ve gotten from us. I’ve got a copyright on my blog posts.

I don’t know about you. I’m waiting for my royalty check in the mail. All right. Next up, our dear friend Mala, who is, oh, look out. Look at tiny Mala in the corner.

Why is Mala hiding? There’s Mala. Wonder why. Wonder why Mala is so small over there. But she has a great video about empathy being a technical skill.

Hopefully one that I learn someday. I would love to someday empath myself, empower myself with empathy or something like that. But good job there, Mala.

We could all stand to be a bit more empathetic in the world. Especially to people who have audio technical issues when writing a recording to invite people, or rather, recording a video to invite people to record videos. So, I guess the SQL community has a bunch of empathy in it.

All right. All right. And the final one was from our dear friend, Making Plans for Nigel, who got all dressed up, wore a Darling Data t-shirt. One of the few people who, I guess, didn’t donate the one they got for free from me at a conference to a homeless shelter.

So, thanks for that, Nigel. And recorded a lovely video from his backyard talking about his experiences with sort of like presenting and blogging and getting turned down from conference. It was an expansive video.

There are big feelings in this video. A lot of big feelings. But he is curious because he does put a question out there. Would you like to see more of these?

And, of course, Nigel, we would always like to see more of you and your fabulous backyard. All right. So, that’s the wrap-up. I hope that everyone out there who recorded a video enjoyed recording the video.

Maybe it will spark some more recording magic from you. I do like watching videos and getting a sense of the person behind all the content. But maybe I might be somewhat alone in that.

I don’t know. Anyway, thank you for watching. And I will, I don’t know. I don’t know when the next time I’m going to host a T-SQL Tuesday is. I think we’re rate limited to like once a year. So, you might not see me again until, what is it, 2026 is next year?

Good God. Someone stop this thing. Anyway, thank you for watching. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Index Sort Order And The Order By Clause

A Little About Index Sort Order And The Order By Clause


Video Summary

In this video, I delve into the fascinating world of SQL Server indexes and query plans, specifically focusing on a phenomenon known as “surprise sorts.” You’ll see how different indexing strategies can lead to unexpected sorting operations in your queries. I explore this concept by running a series of queries against the Stack Overflow database, demonstrating how the order of columns in an index impacts whether or not SQL Server needs to sort data during execution. This video is perfect for anyone looking to deepen their understanding of query optimization and indexing techniques, as it provides practical insights into avoiding unnecessary sorts that can impact performance. Whether you’re a seasoned DBA or just starting out, this content offers valuable lessons on how to craft more efficient queries and indexes.

Full Transcript

Look at you. You look great. You look great, smell great, everything about you. Great, great, great. Erik Darling here with Darling Data. Me and my pal Bats been chatting. We had an interesting client call where, you know, when I talk with people about SQL Server, SQL Server performance. We talk about a lot of stuff. stuff to do with queries and indexes and query plans and the way different things you do in your query and design your indexes has an effect on the query plan that you ultimately end up getting, which should also should all be at least fairly clear to my distinguished audience. But, you know, some people need a little bit more help and guidance than others. That is what I’m here for. So in this video, we’re going to talk about a sort prize. Now, now, keep in mind, this is not like a sort sort prize, like, hey, you won. Congratulations. You get a thing. Here is your honorarium of some variety. This is a surprise sort. And we’re not, this isn’t going to cause a big performance issue today. This is, we’re just going to look at how index, like the intersection of indexes and querying and how you can sometimes end up with a sort that you might not expect in your query plan. But before we do that, Bats would like to remind you that you can become a loyal paying member of the channel to say thank you for all of the hard, diligent work that I do producing this content. There is a link down in the video description there where Bats is pecking away, where you can, for as few as $4 a month, support your local SQL Server enthusiast. If you have perhaps engorged your Bats engorged your Bats with Pez candies, you’ve spent all your money there, you can do other things to support the channel. You can like, you can subscribe, you can comment. And if you would like to ask me questions that I will answer during my officially branded Bats Maru office hours, you can go to that link, which is also in the video description.

You can ask me questions that I will answer. If you need help with SQL Server beyond what you can get from mere YouTube videos, you can do all, you can hire me to do all sorts of things that people find useful. Bats is being a little, a little cranky today. I can do health checks, performance analysis, hands-on tuning of your SQL Servers, helping you with SQL Server performance emergencies, and of course, training your developers so that you don’t have those performance emergencies anymore. These are all things that, according to BeerGut Magazine, I excel far beyond anyone else in the world at. So, you are free to hire me to do all of these things. And you can rest peaceably with the knowledge that you have hired a BeerGut Magazine certified SQL Server consultant.

And as always, my rates are reasonable. If you would like to get some high quality, low cost SQL Server training from me and you old bats here, you can get all 24 hours of mine to fill your brain with for about $150 US dollars. And that will last for the rest of your life. All you have to do is use the coupon code right there at the link up there, which is also down in the video description. And gosh darn it, you can, you can start being as good at SQL Server as batsmaru.

SQL Saturday, New York City, 2025. That is this year. That is just a couple months away now. Coming up, May the 10th, with a performance tuning pre-con on May the 9th. I highly suggest you attend both. And you hang out with me and become my best friend. Maybe, maybe that’ll be the start of something beautiful. Who knows?

But without it in the way, let’s talk about surprise sorts, because they’re interesting things. Now, we’re going to run a query with a few different things going on in it. And we’re going to use this query and the index that we have available.

And then we’re going to, I’m going to show you what happens with a slightly different index. So we’ve got right now this index up here. Zoom it would be so kind as to zoom. There we go.

We’ve got this one here where reputation is in ascending order. Now, for those of you who are new to this whole thing, because the users table in the Stack Overflow database has a clustered, the important part here is clustered primary key, on the ID column.

The ID column is a hidden key column. It is a hidden key column because this is a non-unique index. You notice that there is a distinct lack of a word, the word unique, in here.

So because this is a non-unique index, the ID column, Zoom it would be so kind as to erase the squares instead of just having me click buttons mindlessly. The ID column is hanging over here as an additional key column. If the index, well, that D was a little aggressive, huh?

Let’s dot that I. If this were a unique index, then the ID column would be hiding, well, somewhere in this region is an included column. But since this is a non-unique clustered index, it is an additional key column, which means that this index is sorted by the reputation column first.

Right? So all of the values for reputation are sorted from one to whatever John Skeet is, one million and something. And for all of the duplicates, right?

Because we index all the data. So let’s say for the million or so people who have a reputation of one, the ID column is in order for that. But as soon as we go to reputation two, the ordering of ID resets and we start from whoever has the lowest ID to the highest ID within the next range of values.

So we have an index where reputation is in order and an index where ID is in order within all of those ranges of reputation. So we would expect to be able to order things, have an order by in the query that helps with all sorts of stuff, helps us avoid sorting data. Now, like I said before, this isn’t going to show a big performance issue.

This is just going to show you some behavioral stuff. So if I select the top 1,000 rows from the users table and I order by reputation descending, right? Just reputation descending on its own here.

Then we get a backwards scan of this index, right? Scan direction is backwards. But we don’t have to sort any data.

We have a top and we have a scan. We do not have a sort operator in our query plan. SQL Server did not have to acquire any additional memory grant in order to put this data in the order that we have asked for it. If we run these two queries, well, the reason why we might do something like this is because the reputation column is not unique.

Remember, we talked about that when we were talking about the index definition. And if we have any ties in the reputation column, we might want to add a tiebreaker in the form of this unique ID column to our query so that we have a way to uniquely identify the top 1,000. Otherwise, if we could get duplicate reputation, not replication, we could get sort of unexpected results ordering by a non-unique column.

So let’s run these two queries. And let’s look at what happens. Now, you’ll notice that for the execution plan where we order by reputation descending and ID ascending, we have a top-end sort.

In the query plan where we have reputation, so let’s put a little square around that here, make it obvious. This is reputation descending ID ascending. In this query, we have a top-end sort.

In this one, we just have a top. We are back to our original plan. Now, the reason for this is somewhat complicated or maybe not incredibly intuitive to folks out there.

And let’s try to explain it well. If you look at the properties of this index scan, you’ll notice that we do not have a direction on this one. If you look at what happens, there’s no, like, we didn’t have like a backwards scan.

If we look at this one, the backwards scan will be back. So this one has the backwards property. This one is an unordered scan of the table.

That’s why we have to sort this over here. And, of course, the sort operation is ordering by reputation descending and ID ascending. So the question is, why, when we sort by reputation descending and ID ascending, do we need to sort the data?

But when we sort by reputation descending and ID descending, we do not need to do that. Well, what it really comes down to, and if we look at the results over here, it might be a little bit more obvious. I do have to do a little bit of surgery on this to get both of these query panes to the right place.

And, of course, we need to get to around row 276 for this to be incredibly obvious or for this to start to become obvious. So let’s look at both of those. Oh, come on, SSMS.

You’re really making a fool out of me here. So 276 or so has the first duplicate that I could find in the results. Yes, I did scroll through and look for them.

So we have the reputation 160303 here. Right? And that’s the same for both of these. But you’ll notice that the order of the IDs is slightly different for them. In the top query where we’re ordering by ID ascending, we have 1-9-6-7-9 first and then 2-0-6-4-0-3 second.

In the bottom query for 1-6-0-3-0-3, we have ID in descending order. So we have 2-0-6-4-0-3 first and 1-9-6-7-9 second. Right?

So clearly those two values flipped. Now, the way to think about why we need to sort the data in the top plan but not in the second plan does come back to the execution plan. So, again, the properties of the scan with the sort does not have a direction on it.

Right? There is no ordered. When you look at the ordered attribute, it says false. So we just read through stuff and found it.

The reason for that is because imagine that you are the SQL Server engine. Right? And you are reading through the index over here backwards.

Right? So you’re reading. You’re doing a backwards scan. So let’s just pretend that, like, this is our B-tree. And over here is the, like, ascending end of our B-tree.

And over here is the descending end of our B-tree. So the backwards scan starts over here and starts reading things this way. Right?

So we’d be reading through this index in descending order. And then all of a sudden we would get to, like, we’d be, like, you know, we’re ordering by reputation descending. We’re reading through the index in descending order. And we’re trying to, like, you know, we want to order by ID ascending.

We would get to that, like, 160303 reputation. And then we would have to, like, do a U-turn and be, like, no, now we need to sort it this way. Right?

So SQL Server, like, it can’t really do that. Like, that’s not really how B-trees work. We can’t just, like, start sort, like, reading the index and then, like, backwards order this way and then be, like, ah, but then order the other column this way. Like, SQL Server is not, like, an advanced enough product or rather doesn’t have that advancement in the product in order to do that.

So in the second query where we are, we just, we can just straight up backwards scan the whole thing. ID is already going to be, like, we’re reading through the index backwards. So ID, the ID, the order of the IDs is also going to be backwards reading through that.

So as we’re reading through the descending order reputations, ID is also going to be in descending order. So SQL Server can just chug along that and have all the data straight in the order that it wants for both of those. So that’s why we don’t need to sort there.

Of course, you can write a query that would do that for you if you were to break things up a little bit and do some manual phase separation, as the smart folks out there say. And we were to get the top thousand reputation ordered by reputation descending in an inner query. And then join, remember, we have to get, like, we’re doing the group by here so we get our unique set of reputations back.

And then out here we join back to the users on that reputation. Then we can get the data that we want in the order that we want without sorting it. Right?

And the reason for that is because we hit that index twice. Right? Look at that. We scan into the index over here and we seek into the index here. When we scan the index here, we are doing that.

We are back to doing a backward scan. So we’re reading the data from over here this way rather than over here this way from largest to smallest rather than smallest to largest. And so we do the backward scan here.

We produce the top thousand rows. And then this index seek is going forward. So we’re reading through this index.

We’re joining on reputation. We start with the lowest reputation that came out of there. And we move to the highest reputation that came out of there. So we’re in forward order on this one. And we can avoid having to sort data at all in this query plan because we have two separate reads of the index that happen in two separate directions.

SQL Server could do this for us if it felt like doing a bunch of extra work in the query plan. But I understand a bit why it doesn’t. Now, of course, if we really cared about like we really wanted an index to do that for us, we could create this index because this would be reputation descending and ID ascending.

So what we would have here is an index that you don’t need to read backwards in order to get data from this end to this end because we have the highest stuff over here. And reputation is going to be an ascending order from there. So we can get this plan over here without a sort now.

So depending on how you need to return data with your queries, you might need to think about how you create your indexes and which direction you store your data in. Because sometimes if you need to mix sort like sort orders like this, like reputation descending, ID ascending, you might need to change the way that your indexes are set up or write really much more complicated queries in order to in order to avoid having to sort data. Now, of course, like the client issue that I was dealing with was a big sort and it was taking up a lot of memory and it wasn’t using a lot of that memory.

It was getting a very big memory grant, not really using it. And so we had to think of some clever ways around that problem. And that is what brings us here today is solving problems in clever ways.

We could do the query rewrite and leave the indexing alone or we could change the index. So we have choices as far as making things work. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something about, well, either indexes or forward scans or backward scans or some kind of scans. I don’t know.

There’s a lot of scanning going on in here. Anyway, maybe that’s my problem. Anyway, thank you for watching. I will see you in another video another time. Toodle-

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Equality, Sort, Range Indexing In SQL Server – When It Doesn’t Work

Equality, Sort, Range Indexing In SQL Server – When It Doesn’t Work


Video Summary

In this video, I delve into the intricacies of equality sort range indexes, specifically when they might not be as beneficial as initially thought. Building on yesterday’s discussion, I demonstrate how these indexes can struggle when your range predicate is highly selective and your ordering elements are poorly aligned with query requirements. By running through several queries and their corresponding execution plans, I illustrate the challenges faced when trying to optimize performance in such scenarios. The video also includes tips for improving query performance, such as using index hints and considering alternative indexing strategies that better align with specific query patterns.

Full Transcript

Your best friend in the entire world, Erik Darling here, to talk to you further about a subject that we broached in yesterday’s video, which is the Equality Sort Range Index. In yesterday’s video, I did say that there are situations where it’s not going to be as great of an idea as it is in others. And we’re going to, we, I owe you at least something for that. So, to sort of generalize a little bit, the problem with this style of indexing comes when your, let’s see, an easy way to put this, when your range predicate is far more selective than your Equality predicate and your ordering element or elements are in a bad order for the way you want to search for things.

It’ll make a lot more sense when I actually show you the demo. So, for now, let’s just, let’s just keep in mind what the topic is going to be and be amazed and astounded and just wowed right out of the seat of our pants when, when I show you what’s going on. But before that, of course, I like doing this stuff.

I like doing this stuff even more when people sign up for memberships. So, if you would like to do that, there’s a link right down in the video description. And for as little as four bucks a month, you can say, hey Eric, thanks for doing all this.

It looks like a lot of work. You’re a real sport. If you do not have four bucks a month, perhaps you have significantly increased your Pez dispenser collection, used eBay for its original purpose, and you just don’t have four bucks, you can do other things that help me become a bigger, better person.

You can like, you can comment, you can subscribe. And if you want to ask questions, fancy little pinky out questions privately that I will answer publicly on this very YouTube channel, you can go to the link above then, which is also in the video description, and you can ask your questions there.

And I will answer them five at a time. Again, I’ve got a bit of a backlog on these, so we’re going to record some of those very soon. If you need help with SQL Server, if asking a question anonymously that gets answered non-anonymously online, you can hire me as a consultant to do all sorts of things with your SQL Server to improve performance.

I do all of these things and more. And as always, my rates are reasonable. If you would like to get some training content from me, carrying on my fine tradition of rate reasonability, you can, again, video description has the fully assembled link, but if you feel like typing, you can go to training.erikdarling.com and punch in the discount code SPRINGCLEANING and you can get 75% off any of my training videos.

Fun stuff there. SQL Saturday, New York City, 2025. That is this year. That is, at this point, let’s, uh, uh, I mean like, really, like, two months away at this point.

So, I, if you’re speaking at them, gosh, I hope you practiced. And, uh, as always, I will see you there. Uh, bright and, bright and oily.

But with all that out of the way, let’s, uh, let’s talk about when this equality sort range index thing doesn’t work out so well. So, uh, because I felt the format of yesterday’s video worked pretty well, um, I have pre-created some indexes and I’m gonna run some queries using index hints to show you what happens when these queries interact with those different indexes.

Uh, so let’s run these two queries. And I’ve got query plans enabled because, gosh darn it, I’ve been practicing. Don’t wanna, don’t wanna disappoint anyone with, uh, unpracticed SQL Server demos.

And you might notice that these, at least, well, one of these queries is taking quite a while to execute. And, uh, it’s gonna continue to take quite a while to execute. So, what these are doing, and the, the difference between these two queries is just in the ordering.

Uh, this one is, of course, ordering by the ID column in ascending order. And this one is ordering by the ID column in descending order. So, for this first query, uh, what it, what’s worth noting here are a couple of things.

Uh, one, we’re gonna pretend we don’t have any useful indexes at first. And I’m gonna show you what happens when either you don’t, you don’t have any useful indexes, or for whatever reason, SQL Server does not choose some useful index that you have.

Uh, let’s just put, like, you could, you could substitute this ID column for like a select star, or select a bunch of columns type thing. And SQL Server might cost using a narrower index out of, uh, out of the equation, because it doesn’t want to do a whole mess of lookups.

All right? So, just bear with me here for a moment. Um, we are forcing the clustered index via this index hint. And we are looking for, in the votes table, vote type ID equals two.

Vote type ID equals two has about 37 million rows associated with it. That is the almighty upvote in Stack Overflow land. And the other predicate that we have on this table is where creation date is between 2013-1201 and 2013-1231.

Now, since this is the Stack Overflow 2013 database, the world ended right here. There are no, there are no rows after this. So, this is the last month of data in the table.

The problem that we run into here is that SQL Server can, like, find 37 million rows for this pretty easily. But finding the top 10 rows, right? Because we’re doing fetch next 10 rows only, offset zero rows, ordered by ID ascending.

It, you have to go through a lot of the table before you get to creation dates that meet this predicate. So, finding the top 10 ordered by this is what’s really tough. If we look at the execution plans for these, this first one takes 20 seconds.

Two, zero, 20, right? If we look at the clustered index scan, the number of rows that got read is this many, right? 5, 1, 4, 0, 1, 7, 1, 5.

That is an 8-digit number of rows. That’s a lot of rows. The number of rows in the table is 5, 2, 9, 2, 8, 7, 0, 0. That is also an 8-digit number of rows.

But if you notice, they both start with the same number, 5. And only once you get over to the second number, there’s only like maybe like a million and a half or so fewer rows that get read than there are rows in the table. And of course, since this happens awful in the single-threaded, and we have this top just like, gimme rows, gimme rows, gimme rows, gimme rows, gimme rows, that takes a whole long time, right?

It’s not a good thing. You can find stuff, you can find partially reasons for stuff like this in your query plans. If you look at the properties of an index access operator in a query plan that has like top or offset fetch, and you’ll see this thing in here, estimated rows without row goal, that’s going to be a sign that SQL Server said, well, you know, I think I have a different estimate in mind.

If we didn’t have that top 10 in there, I would have to read all of this. But with the top 10, I don’t have to read nearly as many. SQL Server is just like, you know, I think I can get away with a lot less than that, right?

So, fine there, right? All good. But then down here, notice that we do far less work, right? This thing spits out quite a large number of rows there.

This one spits out, oh, whoa, what just happened there? Developer PowerShell. That was a weird button. Let’s never hit that again. Let’s never touch that button again. Come on, there we go.

Zoom it. All right. So, for this one, we only had to read 18 rows to find anything that we cared about. So, that is, of course, far fewer, right? That is far fewer rows than you would want to deal with there.

Now, of course, Paul White, who is the most useful human being on the planet. Let’s just take a moment here to acknowledge that Paul White is the single most useful human being, maybe, that has ever existed. And you can’t spell Paul without the U for the most useful human.

We’ll workshop that later. Let’s just move on from that. That didn’t go so well. You can’t spell useful without Paul. Nah, screw it. Anyway, you can somewhat improve these queries.

And I’ll try to remember because, gosh darn it, I know how much you care to put this link, a tale of two index hints over at the much-loved SQL.Kiwi website in the video description. So, if you use a weird index hint and a tab lock hint, you can improve the performance of these queries quite dramatically.

These will drop down to about 1.3 seconds a piece. But now we have this sort in the query plan, right? Because SQL Server, when you use index zero in the tab lock hint, Paul explains this much better at his site.

But basically, like, you no longer do an ordered scan of this data, and then you have to sort this data. So, it does look a little bit funny that, like, we’re selecting from the table, and now all of a sudden we have to sort data to put it in order by the column of the primary clustered key that’s already in order by it.

So, that’s a little weird. But anyway, these do get slightly worse if we take out the tab lock hint, which I find quite amusing. And again, which if you go to the Paul’s post, you will find quite a bit of detail on this.

But these do go from 1.3 seconds a piece to almost 1.8 seconds a piece. So, that tab lock hint is, let’s just say, strictly necessary for the full performance benefits here. So, slight digression.

Anyway, let’s talk about why that equality sort range index does not help a query like this. So, this is my index v0, and I’ve got vote type id, id, and creation date. These are our equality sort range predicates, just like we talked about yesterday.

And both of these queries are hinted to find, to use that index. Now, this will help performance somewhat generally, right? Like, that first query, the first time we ran this using the clustered index, it took us 20 seconds to locate rows in there, right?

It was not a good time. We, you know, we beefed it on that one. We did not do well.

With that index, and with that index hinted, we save about five seconds. So, this is, again, not great, right? We did not, like, see the benefit from using that index methodology for this query the way that we do, the way that we did for the query yesterday.

And this is because, again, we need to seek through quite a number of rows. Even, like, we can seek to these rows now, and we can have this thing in order. But to evaluate this predicate on creation date, we still have to do a lot of extra work, right?

So, like, with that query in place, like, yesterday, this worked great. We did a seek. We had our data in order.

There’s no sort in this query plan or this query plan, I promise. But then applying this much narrower predicate on the creation date column completely screws things up, right? Like, we get the 37 million rows just by seeking to vote type ID 2.

But then after we find those 37 million rows, we have to go apply this predicate to all of them, right? It’s no fun, right? It’s no good at all.

So, the second query with ID in descending order does just fine with this, right? So, like, well, it’s not totally, this is, like, that funny intersection of, like, how indexes and queries work together. If your query demands rows in the ID with ascending order and your predicate is, like, something that’s way deep into the table, you’re going to have a bad time finding that data with this type of index.

The type of index that would work better if you need things in ascending order here would be an index like this that leads with creation date. And I’m going to show you two forms of this index. This one is creation date, then vote type ID, then ID.

The second one is going to be creation date ID, vote type ID. But the point for me, the point of me showing you both of those is that this sort of breaks the pattern that the equality sort range thing fixes. Because when you put the range stuff at the beginning of the index, even if there’s an equality predicate in the middle, like, it breaks that.

Because creation date is still the primary sorting and vote type ID is still only sorted after that. So, like, you don’t have the ID column in a useful order for this one. So, if we look at the query plans for this, these are both hinted to use the V1 index.

These will both be fast enough to find the rows that we care about. We do have a sort in the query plan, which, you know, was part of the thing that we wanted to get rid of with the ESR, the quality sort range index methodology, was having a sort in the plan. But in this case, we get the number of rows down into a pretty small batch.

So, the sort isn’t very painful in either of these. Both of these take just a little under 100 milliseconds. This is the second index in there that I promised I’d show you.

Creation date ID and then vote type ID. So, now we have, like, you know, range, sort, equality. But this doesn’t really make that much of a difference because, like, it’s just for, like, this specific query, you just do much better to be able to locate that narrow range and then look at vote type ID no matter where, almost no matter where it is in the index.

You can probably have it as an included column and it wouldn’t make any difference. So, if you run this is so, both of these will, both of these queries perform just about the same as with the index in slightly different order. So, like, swapping ID and vote type ID doesn’t make a big difference here.

One thing that you might want to consider is, like, just in general, like, the ordering of your columns and some columns, like, kind of share ordering. So, like, in this case, if you really wanted to get rid of the sort, you might consider ordering by creation date. In this case, for the votes table, the creation date column, like, like, obviously that’s for a new row that gets inserted.

When a new row gets inserted, the ID column, which is an identity, also increments. So, every creation date and every ID in this table is going to be higher than the one before it. You might have columns that similarly share ordering and that you might be able to rely on for this.

Now, just because of the way votes get inserted here, there aren’t going to be any duplicates for creation date. Right? Because, like, like, you’re just going to not have two things show up at the exact same time.

So, if you really wanted to get rid of the sort, you could run your query and instead order by creation date and just have a much simpler query plan overall. So, you might not always be able to get away with that, but in certain cases where you can, this might be a good thing to do. If you needed to have a tiebreaker in there, you would want to use this v2 index on creation date and then ID.

So, you could say, like, order by v.creation date, v.oops, ha ha, typing in demos. Look at this silly man go. So, if you needed the tiebreaker, like, this would be the better indexing strategy for these two, right?

Because you would have order by creation date, order by ID, and this index would be able to fully support these things. So, the equality sort range indexing, again, it’s great for a lot of situations, but for situations where your range is much narrower and sort of like, let’s just, like, to make it easy to physically sort of understand it. Like, the range is much narrower and towards the end of the table, and you’re ordering by stuff in ascending order.

Like, you have to read through a lot more of the table to get down to the stuff that you care about down here in order to find those rows. So, it’s not always going to work out perfectly for every query, but for some other, for some types of queries, it works amazingly as is. For other types of queries, you might have to either change your ordering to better, like, locate data towards the end of the table that you care about, or to sort by a different column in the table with the ES, with, like, you know, with some sort of index in place that helps you put this stuff in order without having to sort.

So, sort of a complex situation there, but one that is worth examining and one worth talking about. I do hope you enjoyed yourselves, and I hope you learned something, and with that out of the way, I don’t know, I’ll talk about some other fun stuff. Well, we’ll see what it is when we get there, though, won’t we?

Alright. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Equality, Sort, Range Indexing In SQL Server – When It Works

Equality, Sort, Range Indexing In SQL Server – When It Works


Video Summary

In this video, I delve into a fascinating indexing methodology that I recently discovered and find incredibly useful for improving query performance in SQL Server databases. It’s not about creating new types of indexes but rather following an effective indexing strategy known as the Equality Sort Range (ESR) index. I explain how to structure your indexes by placing equality predicates first, followed by sorting elements, and ending with inequality predicates. This approach ensures that data is ordered efficiently for both seeking and ordering operations, leading to faster query execution times. Through practical examples, I demonstrate how this indexing method can significantly enhance performance compared to traditional index creation techniques.

Full Transcript

Erik Darling here. Your pal Erik Darling here with Darling Data. In today’s video, we’re going to discuss a term that I recently discovered, which is a wonderful way of phrasing an index methodology. Not a specific type of index, of course, not like a brand new style of index that you can create in your database, but just an indexing methodology that you can follow. generally used to improve query performance quite a bit. And that is the Equality Sort Range Index. It rolls right off the tongue, just like the POC index, the partition over covering index that is so very popular when people are talking about improving the performance of windowing functions. So we’re going to call this the ESR index. And we’re going to look at an example of how you can use this indexing methodology to improve your query performance. Yay!

We’re all excited. If you like this video content and you would like to be a supportive member of the audience, you can sign up for a membership using the link down below in the old video description. And you can, I don’t know, contribute to me being a happier, well, probably not healthier person, but you can contribute to me being a happier person that way. If you do not have a the dinero to contribute to that, to the happiness fund in that regard, you can always like comment and subscribe. And if you would like to ask questions of me privately that I will answer publicly, you can use that link, which is also in the video description. Coincidentally, many useful things down there to put your question into my magical Excel file that I will read from. If you need help with SQL Server, if you are struggling with performance, reliability, scalability, application, blah, blah, blah, blah, you can hire me as a consultant to do many useful things for you. Health checks, performance analysis, hands-on query and index tuning, all that other stuff that needs tuning, dealing with performance emergencies, and of course, training your developers so that performance emergencies are a thing of the past.

All worthy goals when it comes to SQL Server. If you would like some training, boy, do I have it. My pockets are full of it. You can get all about 24 hours worth of training from my pockets for 75% off. That’s the link, that’s the discount code. And like everything else useful in the world, the video description contains a fully assembled version for you to click on. Upcoming events, we have SQL Saturday, New York City, 2025. That’s this year. It’s hard to believe. Coming up May the 10th with a pre-con by Andreas Walter on May the 9th. And that’s about SQL Server performance monitoring and tuning and stuff. So we’re all looking forward to that. I’ll be there both days serving you lunch, bringing you, bringing you cookies and coffee and whatnot.

And so you can come hang out. Give me a hug. I don’t know, whatever you’re into. But with that out of the way, let’s party. Let’s do this thing. Let’s do this thing like we care about it. So I, for years, have been talking about this thing that I do when I’m tuning queries and indexes. But I had never, I always talked, it was always very clunky in my head. It was just like, but it’s like, you do the thing and then the other thing and then you care about this thing.

And then I saw a fellow named Frank Pachaud, I apologize, Frank, if I pronounce your name wrong, on LinkedIn talking, I guess he started a job with MongoDB recently. We’ll all forgive him for that. We’ll have a seance for Frank. We’ll talk to him in the database hell. I’m kidding. Where he posted a link to this thing in the MongoDB dots, where they talk about the equality sort range rule for indexing.

And I thought, wow, that’s a thing that I didn’t have a good name for. And so we are going to start referring to him. We’re going to meme this into existence for SQL Server, the ESR index, that is equality sort range. And we’re going to show some examples of that. Now, I’ve got a query down below and without any indexes, starting with no indexes on the POST table, SQL Server gives us a missing index request.

The missing index request that it gives us leads with POST type ID and then has last activity date as a second key column. And then in the includes, we have score and then view count. So the missing index request in SQL Server, as far as key columns go, are very WHERE clause centric.

If you look at the WHERE clause that we have here, we have WHERE POST type ID equals 4 and WHERE LAST activity date is greater than or equal to 2012-01-01. So SQL Server puts the equality predicate first and the inequality predicate second. And it doesn’t give a lot of thought to the fact that we are ordering by score over here, right?

Doesn’t really care about that. But we’re also selecting view count, right? So view count being in the includes, that’s fine.

But score being in the includes, that is not so fine. Because columns in the include of an index are not sorted or ordered in any useful way for, you know, either seeking to values or for helping with presentation or operator-dependent order buys. Like I’ve talked about in many other videos, things like merge joins and stream aggregates require sorted data.

And if you ask for data in an order that is not supported by an index, SQL Server has to put a sort operator into the query plan and break out its tiny little baby fingers and magically put your data in the presentation order that you have asked for. Other things like windowing functions also do pretty well with ordered data. So what we’re going to do is look at examples of these two queries using this index.

This is index P0. Both of these queries are hinted to use this index, mostly for a little bit of compactness here. We do have a few things to talk about.

And so I pre-created three or four indexes and I’ve hinted queries in each section to use those indexes so that there’s no weird overlap and oops, this query used this index this time. Sorry, whatever. So because that gets annoying.

So we’re going to run these two queries. And the first one is going to be pretty quick. And the second one is going to be a little bit less quick. And I’ll be honest with you, I am not fully used to SSMS 21 and where it puts the little grappling hooks for us to move things around on the screen with.

So you’ll have to bear with me while I acquaint myself with this wonderful 64-bit program. But for the first query, this is fast enough, right? This four milliseconds, six milliseconds total, great.

Like, no problem. Like, all easy peasy there. This one, not so great, right? We do an index seek. Okay, 284 milliseconds. But then we hit this sort and we spill a little bit.

Not so much that I’m, like, worried about it. And then we, of course, end up taking nearly a very devilish number of milliseconds to complete this query. So this isn’t really great.

This isn’t really an awesome scenario for the second query. But there is a way to make both of these fast. Now, a lot of people might be tempted to create an index that looks like this.

Where score is the leading key column. Because then we will always have score in descending order. The problem this presents us as query tuners is that with score is the leading column in the index.

It sort of acts as a gatekeeper to the other columns. So score is in order. But post type ID and last activity date are not in a useful order after this for searching.

So we sort of have this, like, blocker here that prevents us from seeking into the index for the values that we care about. So this is index P1. Both of these indexes, both of these queries are hinted to use index P1.

And if we run these, we’re going to get sort of not great performance out of either one. This query up above that took, well, like, a few milliseconds before is now what? Is that the right one?

Let me just go double check, make sure I didn’t mess anything up. No. All right. So, yeah. This one up here that used to be really fast took 1.5 seconds. This one down here, this one actually got a little bit better, right?

Even though we end up scanning this whole index, it got a little bit faster than last time. Mostly because we didn’t have the sort spill on this one, right? So this one did not take a devilish amount of time before.

This one just took a long time to find all the, scan through and find all the data that we care about. So what I’m going to show you now is the ESR style of index that I talked about in the video introduction.

We are going to lead with the equality predicate. We are going to follow that up with the sorting element in our query. And we are going to put the last activity date as the final key column. We’re still just going to include view count because view count, we don’t have a predicate.

We’re not in our where clauses, not in our order by. We’re not joining on it or anything. So there’s really not a lot of sense in there aside from just putting it in the included columns.

So let’s look at query performance with these two. And let’s see if this actually gets faster using our ESR index. And it does, right?

So this is about, this is what I want my query to look like. We have two very, very fast index seeks into our P2 index. This is our equality sort range index.

And we don’t need to sort data because the data is in order for us. Now, I just want to show you what the seek predicate looks like. This is going to be identical for both of these aside from the actual values in them.

This one’s four, this one’s 2012, the other one’s 2011 and one or something like that. But the important thing here is that we are able to seek in the index to where post type ID equals four.

And then we can take advantage of how B-trees store data, which is that like everything is primarily in this index ordered by post type ID. And then once we seek to any post type ID, like where post type ID equals something, all of the scores will be in descending order for that particular post type ID.

So we don’t, like, we’re not saying where post type ID is in four or five or greater than four, because that would mess things up because we would be crossing boundaries. We would be crossing post type ID boundaries to different post type IDs.

And the score column order would reset for each post type ID. So that wouldn’t do us any good. But for an equality predicate, where we just find all the rows for one thing, all of the data in the index is ordered within that one thing.

And that just gives us a quite opportune index to apply the C predicate, keep the data in order that we need for score, and then apply a residual predicate on last activity date to finish filtering out rows that don’t matter to us.

Right. And just because of the way indexes, B tree indexes work, we don’t even necessarily need view last activity date as a key column.

If for whatever reason you just wanted to have it as an included column, you would get the exact same execution plan. So just to show you what I mean, I’ve created a fourth index because we started our index numbering at zero because we’re nerds.

I put last activity date as a second include column here. Right. So it’s not even the first included column, but this is mostly just to get across the point that include column order doesn’t matter much.

If we run these two queries, we will get identical performance as the last query and our query plans will look just about exactly the same as well. Here is our index seek that goes directly to post type ID four.

And here is our residual predicate on the last activity date column. Now, the reason why this doesn’t matter so much here is because range predicate range or rather inequality predicates greater than greater than equal to less than less than equal to not equal to all that stuff is not no equality predicates and is no predicates are boom, you equal this.

like they often end up as residual predicates in queries anyway. Not all the time. Sometimes. Sometimes it happens. Sometimes it doesn’t.

Sometimes you have a seek and a residual predicate. But this is just to show you that the final key column in this case does not really have much impact on final query performance.

So when you’re looking at when you’re trying to tune queries that do something like this, they can, of course, be more complicated than this. But you can always get indexing for these to really help query performance when you embrace the equality sort range paradigm of index creation.

So you want to make sure if you have equality predicates, you want those to be the leading key or key columns in the index. You want your ordering elements to come after the equality predicates.

And you want any inequality predicates to come third in the index or come after the ordering elements in the index. Not necessarily third, of course.

Right? Could be tenth. I don’t know. At that point, you really start to see the benefit of included columns in indexes because you don’t have to have 13 key columns in order to support this sort of thing. So you could put your inequality predicates either as key columns after the sorting elements or in the includes because they will generally have less impact on things after that.

Now, there are always going to be some exceptions to the rule. If your inequality predicates are very, very selective, you might not want to have them as includes. If you’re seeking to like 10 billion rows and you have to apply a range predicate to like, you know, 9 billion rows afterwards, it’s maybe not as valuable.

But once you’re dealing with tables that large anyway, you should be using columnstore indexes. And I don’t know, I’m probably thinking about the meaning of deeper meanings in life because you might be a little, you might be dealing with many other things.

Also, you don’t see a lot of top 5,000 order by queries and that sort of situation. But anyway, thank you for watching. I hope you enjoyed yourselves.

I hope that you will embrace the equality sort range indexing methodology. I hope that you will join me in memeing this into widespread existence in SQL Server. And I will see you in another video where I will hopefully get back on track talking about store procedure stuff.

Really trying to work out some good demos for the temporary objects stuff that I want to talk about there. That might end up being more than one video because there’s a lot to say. But anyway, thank you for watching.

And I will see you in another video, I don’t know, shortly or longly or I’m really just not sure at this point. Well, the fates will decide. Anyway, thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.