Indexing SQL Server Queries For Performance: Blocking and Deadlocking

Indexing SQL Server Queries For Performance: Blocking and Deadlocking



Thanks for watching!

Video Summary

In this video, I delve into the intricacies of SQL Server indexing for performance optimization, specifically addressing how to index tables effectively while avoiding common pitfalls like over-indexing. I also explore the ins and outs of blocking and deadlocking issues under the default isolation level “read committed,” explaining why enabling read committed snapshot isolation can significantly reduce these problems. Throughout the video, I share practical examples and tips on how to tune both select and modification queries for better performance, emphasizing the importance of considering your isolation level when dealing with indexing challenges.

Full Transcript

Erik Darling here with Darling Data. Feeling ultra professional today for various reasons. Did some cool stuff, some very cool SQL Server stuff, but of course it’s nothing I can talk about here because, you know, it wouldn’t make any sense. In today’s video we are going to continue with talking about how to index SQL Server for performance. And in this video, we’re going to talk about how to index SQL Server for performance. And in this video we’re going to address problems with blocking and deadlocking. Now, what’s funny about this is this is a situation that really rares its ugly head, at least between read queries and write queries, under the unfortunate default isolation level read committed. If you need some help understanding how terrible an isolation level read committed is, you should watch the playlist of isolation level read committed is, you should watch the playlist of videos that I have in my channel called Everything You Know About Isolation Levels Is Wrong, where I will go into great detail about how awful read committed is as a default choice for SQL Server. And I really hope that more people start understanding how lousy and isolation level it is, especially as Microsoft does takes, takes some steps towards saying, hey, maybe you should do something else. For example, you know, in Azure SQL EB, read committed snapshot isolation is the default. You’re allowed to turn it off if you really need to, but you know, if Microsoft is like, you know, hey, this is the default here, maybe it’s not as bad as those stupid blog posts made it out to be that you may have read. And, you know, another thing to consider is that there’s a new feature that’s a new wish feature that’s rolled out to Azure SQL DB, and seems to be full steam ahead with being present in SQL Server 2025, called optimized locking. And if you read the docs page or the learn page, I forget what they start, whatever they call it these days, for optimized locking, there are two suggestions in that page. One of them is to enable accelerated database recovery recovery, actually, I think that one’s a requirement, I forget. And the other one, which is definitely a suggestion, but a strong suggestion, is to enable read committed snapshot isolation, makes it makes the optimized locking work better. So if you are, you know, like, wow, I’d like to optimize some locking my SQL Server database, enabling read committed snapshot isolation is a pretty good way to do that. Anyway, right, you have less blocking between read query and write queries. That’s great. You have less deadlocking between read queries and write queries, all you have to do is turn on this one thing. And your life will be instantly better. But anyway, before we get into all that, let’s talk about how what a bunch of generous souls you are. Up to nearly 40 people who will become members of my channel, and donate some for four or four or more dollars a month to support me making this content. So that’s very sweet of all of you. I appreciate every last one of you. I do not appreciate the 30% that YouTube withholds in taxes, but that’s not your fault.

If you don’t have the $4. If you don’t have the $4, liking, commenting, subscribing, all just, I mean, very humanitarian efforts on your part. I would appreciate that. If you need help with SQL Server, blocking and blocking included, if you would like to hear more about our Lord and Savior read committed snapshot isolation, or even snapshot isolation, you can hire me as a consultant to do these things. And as always, my rates are reasonable. Training. I have stuff that I can teach you, including about locking and blocking. Wow, what a low price, about $150 USD. And you can get all of that stuff at that link up there with that discount code there. And there’s also a link to that in the video description in case you didn’t, there weren’t enough reasons to look at the video description. Now you have this one. It’s wonderful. Again, upcoming events for 2025 will be announced as soon as I hear about them. All sorts of stuff that I got to deal with, isn’t there? All sorts of things, also the hoops I have to jump through before I start announcing things.

But let’s talk about locking and blocking in SQL Server. And what was that? Okay, that was where I rolled that thing back. We’re good there. All right. All right. Let’s make sure we have nothing going on. Now, what I’m going to show you in this video is how, well, you know, it’s the default read committed isolation level, which most of you are suffering under. And before I show you stuff about how adding an index can help fix some blocking and deadlocking problems, I do need to warn you a little bit about how indexes can cause problems.

There are three main areas where over-indexing. And by over-indexing, I don’t necessarily mean the number of indexes you have. I mean, like how those indexes are used and how those indexes are defined. So when I when I think about over-indexing, I think about if you have tables with lots of indexes on or with indexes, don’t even lots, just with indexes on them that have like zero reads and a high number of writes or a really low number of reads and a high number of writes. Or I’m thinking about indexes that have very similar key column orders, maybe different included columns and stuff, but like, you know, indexes that you could theoretically merge together.

So you have like one index that does the work of, you know, two or more indexes. To me, that’s where that’s really over-indexing. If you have a bunch of indexes on a table that all have very different definitions or even different enough definitions, but they’re all doing like hard work, like they’re like queries are reading from them a lot. They’re helping queries go faster. You’re not necessarily over indexed.

But the three main areas where over indexing, as I just described, it can hurt you are one, the buffer pool, right? Because most SQL servers I look at just don’t have enough memory compared to the amount of data they have. So you have all these different indexes competing for space in the buffer pool.

The transaction log, because every time you modify data in the base table, you have to modify data in any indexes where the column that have the columns in them also change. So like for inserts and deletes, that’s, unless you have filtered indexes, that’s going to be like all of them.

For updates, it’s just going to be the columns that are in the index definition, if those changed. And the third area is going to be around locking. The more indexes you have that have to change when data gets put into a table, the more you increase the likelihood of locking and blocking situations and deadlocking situations.

And, of course, you run the chance of SQL Server trying to escalate locks from the row or page level up to the object or table level. So there are ways that over-indexing, having lots of unused indexes, duplicative indexes, stuff like that can hurt you. But you do need indexes to make queries go faster so that you run, like fast queries run less of a risk of having those problems or like being slow and causing problems or other queries, interfering with other queries, stuff like that.

In my experience, unless you’re in Microsoft’s cloud, because Microsoft does this really, plays this really nasty trick on its customers. And it doesn’t matter what you use. It doesn’t matter if it’s Azure SQL DB.

It doesn’t matter if it’s managed instance. It doesn’t matter if it’s a hyperscale. Microsoft throttles your transaction log. So writing that stuff out to the, writing data modifications out to the transaction log can be incredibly slow and painful.

So if that’s your problem, that’s not necessarily the fault of the indexes. That is the fault of your cloud provider. You should, you should maybe have a talk with them about how underhanded of a maneuver that is.

But what was I going to say? Anyway, you can end up with locking and deadlocking problems when you sort of have the opposite problem, right? When you have too few indexes.

Because whenever we, like I talk to people about index tuning, without fail, the majority of the queries that they’re worried about are going to be select queries. Very few people mentally think or like understand that modification queries need indexes too. The faster modification queries can find the data they get to, the better off you are.

Sort of in a general sense, I mean, aside from like query speed, if your modification query locates the data that it needs to change via seek, you will most likely, like unless you’re hitting a lot of rows, start with row locks. If your data modification query acquires the data that it needs to change via a scan, is most likely going to start with page locks.

Again, unless you’re hitting a lot of rows right off the bat. But let’s just take a quick look at this query, right? I’m not going to run this here, of course, because here is not where we want to run this.

We want to look over here. So I have query plans turned on and I’m going to run this query with no indexes. If we do this, this runs for about a little over half a second.

And it does a lot of work over here and it does a lot of work over here. And, you know, it’s acquiring data down here from the post table where it’s also updating, right? So we need to find data that we want to update in the post table by like joining it to the users table.

If we wanted to reduce the risk of this query colliding with itself or with any other queries that are touching the post table, we would want to add some indexes in that help this query go faster. So if I, I mean, we don’t necessarily, we don’t strictly need the index that I’m making here on the users table.

It’s just to sort of illustrate that the more you do to help your modification queries go faster, the simpler and the simpler your modification queries are, the query plans are, the faster they go, the better up you are when you need to deal with these things.

So these indexes are created here. And now let’s just rerun this. And remember, this was like a little over, a little north of half a second. But now with those two indexes in place, this thing runs for about 19 milliseconds, right? And it’s like, so just, you know, coming over here and looking, we no longer need a parallel plan.

SQL Server, just, you know, be able to seek to data that it cares about on both sides of this. Everything is much better, right? So like when, if you’re like serious about query tuning, don’t just stop at the select queries in your workload.

Make sure you’re tuning any modification query that, you know, has like a query underneath it. If you’re inserting values, what are you going to do, right? There’s not a lot you can do there.

But like, you know, updates, deletes, even if it’s like an insert with a select, you know, don’t be afraid to tune those queries as well. Like modification queries need love too. So with that out of the way, let’s talk a little bit about how indexes can help you under read committed.

Now, I’m going to go off on something a little bit here where if you are truly worried about over indexing and you are truly worried about locking and blocking, the answer is not no lock. If you are using an optimistic isolation level, like read committed stop shut isolation, not only would you avoid the many, many pitfalls of no lock, but you would also avoid having to sort of over index tables because read queries and write queries would no longer be blocking each other or deadlocking with each other. So if you’re truly worried about it, optimistic isolation levels can help you get out of lots of blocking and deadlocking scenarios without having to tune queries over index.

Oh my gosh. I can only have five indexes on this table. If I create a sixth index, my workload will go to hell.

So like just, you know, one thing to consider is, you know, what isolation level are you using? If you’re afraid of indexing too much, use an optimistic isolation level. You won’t have to index as much.

You’re blocking and blocking problems and deadlocking problems will largely go away. You’ll still have write queries fight with each other, but that’s okay, right? Because, like I said, who’s worried about their write queries anyway? So let’s begin a transaction and let’s run this update query.

And, of course, this thing switched over to that mode for me. So what we have here is the results of my handy little helper function, What’s Up Locks. This is available at my GitHub repo.

If you go to code.erikdarling.com, you can get a nice little shortcut there. But if you look at the results here, we have an intent exclusive lock on the object. That’s not what we’re worried about.

We’re worried about this exclusive lock on 11 pages in the badges table. Now, if we come over to this window with, like, this query is open. We’ve done the transaction.

We just got the results of What’s Up Lock. We didn’t commit or roll anything back down here. If we look at this query, this query will be able to finish. Like, you know, this query is able to finish because we’re seeking into the primary key right here, where this ID equals this.

And this ID is just outside the range of pages. Now, remember, pages contain many rows. So if, like, this query is able to finish, but this query that is just one ID higher, this one gets stuck.

And if we come over here and look at SP who is active, we’ll see the locks that this thing has taken, which is exactly what I showed you before. Right? It’s the 11 page locks on the badges table.

Right? There we are. Like, my results line up with SP who is active. I consider that a real feat. It’s a pretty good deal.

And then this query is trying to take some locks, but the object level shared locks got granted, but the page level shared locks, we are waiting on that. So this query is blocked because this query could not get to the data that it cared about. It was like this row was on a page that is currently locked by this query over here.

Right? So let’s roll this back. Let’s make sure that’s double rolled back.

And let’s create this index down here on the date column on the badges table. Now, remember, this update query is hitting the badges table. I’m just going to get the estimated plan for this.

We’re able to seek into that index now. Before, all we had was the clustered primary key on the ID column. So we could seek into the ID column, but if pages in that ID column that had dates in this range were on those pages, or even like if dates a little before or a little after happened to be on these pages, those queries got blocked.

All right? So now let’s run this with that index in place. And what we’re going to see is the pages have intent exclusive, so not actual locks.

And the object still has intent exclusive. Now, though, we have granted 856 row locks. All right?

Oops. Sorry. Makes a little more sense over here. Here are our row locks. Here are the 856 row locks that have been taken on the badges table now with that index in place. And if we come over here, now we can run this query.

And now we can run this query. All right? These both are able to finish now because SQL Server is only locking rows. And so the pages that are unrelated to data that was on pages that are unrelated to those rows, we can now get to that data.

We can access that data. We have a different access path to see that. Now, really, no demo about locking and blocking would be complete if we didn’t make fun of no lock just a little bit.

Right? And so if we run this query, and keep in mind, this where clause here matches the where clause of the update that we ran over here. Right?

This is 2010, 1225 through 1226. If we run this query with the no lock hint, we will, of course, see all of these user IDs change to what we, or rather displaying now what we change them to in this query that just has the open transaction. Right?

Like this query is open. We were changing every user ID to this that was in that range. But, of course, let’s say this query failed or someone changed their mind or we decided to go do something else and we roll this stuff back. Now this query is going to show us what those user IDs were before, which has nothing to do with the number that we just had in there.

So, this is something that I talked about quite a bit in the video series that I mentioned before. Everything you know about isolation levels is wrong. That playlist is available again on this channel.

But, you know, what I talked about before is that these are the kind of problems that a lot of developers hit under the default read committed isolation level. And they have decided that the situation with read committed is so bad that they are willing to have data problems like I just showed you with the no lock hint rather than deal with the problems that read committed exposes in your database and your workload. So, like, you know, like, like everyone likes to make fun of no lock, me included.

But really, like, no lock is what people choose because read committed is so crappy. So, if you’re using read committed and you’re, you know, you’ve got this idea in your head that no lock hints are a good solution, they’re really not, you should be looking at optimistic isolation levels that avoid dirty reads like I just showed you that return you a far more stable and correct set of data without the locking and blocking problems. So, I think we’re about done here.

Indexing your modification queries can absolutely help them not only go faster, but it can also help reduce the number of locking and blocking problems that you have in your database. Of course, the ultimate solution to locking and blocking and deadlocking problems between read queries and write queries is an optimistic isolation level. I can’t say that enough.

So, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that if you haven’t seen my series on isolation levels already, you will go find that playlist and watch the whole thing. It is well worth your time.

And thank you for watching. Now, we’ll see you in the next video, which is about more indexing stuff. So, we’ll see you there. All right.

Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Missing Index Requests Are Bad And Dumb

Indexing SQL Server Queries For Performance: Missing Index Requests Are Bad And Dumb



Thanks for watching!

Video Summary

In this video, I dive into the often-overlooked topic of missing index requests in SQL Server. While many people might jump straight to creating indexes based on these suggestions, I argue that it’s crucial to take a step back and actually analyze the query plan. Missing index requests can be misleading; they don’t always point to the most impactful changes you could make. In this video, I demonstrate how simply following missing index scripts without deeper analysis can lead to significant performance degradation. By walking through an example where creating the suggested indexes resulted in a query that took over two hours instead of 39 minutes, I highlight the importance of understanding the broader context and potential alternatives, such as using batch mode or leveraging early aggregation techniques. The key takeaway is that missing index requests should not be your primary focus; they are just one tool in your performance tuning arsenal, and it’s essential to critically evaluate their impact before implementing them.

Full Transcript

Erik Darling here with Darling Data, failing my Darling Data dandiest. And I have a special video for you. I’m recording it a little bit later than I thought I would, and I think the reason why will become apparent in just a moment. But we’re going to talk about missing index requests in this video. Because I don’t know what I’m going to say. I still, to this day, see a lot of people talking, almost bragging about implementing them. Like, like, cool, you followed the instructions on the bottle. Alright. We’re all proud of you. Ding dong. And how missing index requests do not really always expose the performance problems that queries to the performance problems that queries are having. And how sometimes you’re going to actually have to get up off your lazy keister and actually look at the query plan and figure things out. You can’t just, like, I don’t know, use whatever bargain basement script you got off the internet that scripts out every missing index request and spend some time staring at them and then create them and act like you did your job. Because that’s crappy. So, let’s, we’ll talk about that. But before we do, we need to talk about four dollars. Four of them. If you, if you would like to support the work that I do in this channel, you can, you can give me four dollars a month by clicking on the little link in the video description right down there, right about where my desk chair is. You can’t see via the magic of green screens. And that’s a nice thing to do.

Pretty spiffy of you if you do that. If four dollars a month is beyond your means. If you just don’t have four bucks, well, you can like, you can comment, you can subscribe, and all that stuff is free. At least, at least it is for now. Who knows what you two will start charging for that. That’d be weird. If you are in need of SQL Server consulting, because you are the type of lazy person who just creates missing indexes based on what SQL Server is, you know, you can get a lot of money. SQL Server is telling you might help. And you actually want someone who will do some more better work than that. I am available to do all these things. And it’s been requested by the editors at Beer Gut Magazine that I clarify their position on me being the best SQL Server consulted in the world. Apparently, that title is not valid in New Zealand. So, everywhere else, I win. Some places you can’t beat, though.

If you would like some fantastic SQL Server training, you can get all of mine for about $150 and you don’t have to subscribe to that. That is just a one-time payment. And then you can watch it for the rest of your life over and over again until something finally sinks in, which hopefully won’t take you for the rest of your life because I hope you live a long life. And I would be sad if it took that long for these lessons to be learned.

Once again, I will have news about upcoming events later in 2025. Right now, I’m just enjoying living life in the moment. With that out of the way, let’s do this thing. Let’s talk about missing index requests. So, I’ve run this query, and this is why this is getting recorded later than usual.

This thing took 39 minutes and 5 seconds to run. All right. 39 minutes and 5 seconds. Now, SQL Server is telling us that it wants an index on the comments table.

It wants that index to be, have score as a key column and include user ID, well, as an included column, somewhat obviously. I guess that’s a little bit of a spoiler there. But let’s use our heads for once.

Let’s not just be missing index idiots. Let’s actually look at the query plan. And let’s see, how long did SQL Server spend touching the comments table over here? 998 milliseconds.

What happens if we create that index? Can SQL Server spend 0 milliseconds touching the comments table? Maybe.

Maybe. But if you take 998 milliseconds away from 39 minutes and 5 seconds, you have 39 minutes and 4 seconds. It doesn’t seem like an index that is going to be very helpful, does it?

Does it seem like an index that is going to really do much for us? Now, missing index requests are, of course, sort of, well, this one is interesting. And this one is interesting because if you grab the XML from this thing, there are actually two missing index requests.

SSMS only shows you one in green, but there are two that get logged in the XML of the query plan. If you are the type of person who is smart and likes to go digging into query plans and you hit the properties of things and you expand the missing indexes thing, you will see both of them in here. They show up.

It’s not very friendly for scripting, of course. If you look at these things, you kind of have to keep expanding these little plus signs until you have some idea of what’s going on. But they’re both in here, right?

Now, let me ask you a question. Between you and me, I know none of us in here are born and bred lifelong mathematicians, but is 998 milliseconds, is that really 22% of 36 minutes? I don’t think so.

I don’t think that’s the impact that they have in mind here. I think SQL Server thinks it can reduce the effort of running this query by 22%, but I don’t think that we’re going to see those results if we’re looking at whatever time-telling device you care about. Maybe you have an hourglass or a stopwatch or a grandfather clock or, I don’t know, maybe you’re just real good at counting exact seconds.

But I don’t think that 22% of anything is going to disappear if we create that missing index. Do you? I don’t.

I’ll be honest with you. I don’t see that happening for us. The problems in this query are very much because of spills, right? This is where things start to get real hairy. We spend three minutes and 59 seconds up to this point.

Well, sort of. This looks a little weird. The operator times, God bless, this is 13 minutes and then we go to 36 minutes. So something weird is happening in here, right?

This is not a well-performing query. We’re having some problems. So what I did ahead of time and part of the reason why we’re up so late recording is because I actually created both of the missing indexes that SQL Server asked for. There were the two on the comments table.

And then if you keep getting the estimated plan after you create indexes, SQL Server will keep suggesting indexes. And so I kept doing that. And eventually SQL Server suggested one on votes and then one on posts.

And so after I created the one on posts, I ran the query. And you can’t see it because my armpit’s in the way a little bit. But we’re going to make this magic work.

We’re going to do a little Hollywood. We’re going to do some Michael Bay style special effects. And we’re going to scroll over this way. And you’re going to see you’re not reading that wrong. That’s not two minutes and seven seconds.

There was not a 37 minute improvement. That is two hours and seven minutes. If we look at that query plan, I’m going to have to be real careful hitting control in our here. That is not two minutes.

That is two hours, seven minutes and three seconds. This query has all sorts of problems. Many problems.

Gigantic problems. Two hours and seven minutes worth of problems, really. And SQL Server still thinks that a missing index on the user’s table is going to be what solves that problem. But if we go all the way over here and we look to where we touch the user’s table, that was 36 milliseconds out of two hours.

What does SQL Server say? We could have an impact of nearly 35% on this query. Wow.

Wow. Would that really fix this? No. No. We would still be screwed. We would still have a query that runs for, well, I guess two hours, seven minutes and minus 36 milliseconds probably. That just might do it.

Just sounds about right anyway. So this thing just really gets screwed up in all sorts of places. You know, there’s 44 minutes in here and 36 minutes in here. And, well, we spent 46 minutes up until here.

So this thing is really screwy. The funny thing is that, really, no amount of missing index is going to help this thing. But if all you do is look at missing indexes and think, well, if I create that index, something will get better, right?

I created, like, five indexes. And it got worse. It went from 39 minutes to two hours and seven minutes.

That’s not an improvement. That’s a worsement. It’s not a good showing. What missing indexes tell you are a very brief opportunistic, a passing thought, really, by the optimizer.

Well, I’m creating this query plan. You know what? It might be better if we had this thing, right?

Oh, it might be okay. Maybe things would get better if we had this. But it doesn’t really know. It doesn’t really understand what’s going to happen after you create that index. After you create an index, you could get an entirely different query plan.

All of a sudden, the costs are all different. Maybe it wouldn’t be 35%. Maybe it wouldn’t have a 35% lower cost. Maybe it would come up with a query plan that cost more. What can we do?

This is quite a conundrum. What missing indexes don’t tell you are ways that you can tune a query that have nothing to do with indexes. It has everything to do with how you write the query and maybe even how you get this query to run.

Right? So what a missing index request won’t tell you is that if you run this query and you use batch mode, if you do something, well, for the sake of making sure that you don’t think I have any weird things up my sleeve, I just allowed this query to run in compat level 160.

Now, cardinality estimation under compat level 160 using the new cardinality estimator is very hit or miss. So I’m letting this one go. Sometimes when I use the optimizer compatibility level hints so that I can get intelligent query processing features like batch mode on rowstore, I pair it with the legacy cardinality estimator because it’s generally better.

I like it better anyway. So this one, I just let it fly. And this all finishes in about eight seconds.

All right. So we made this query better either from the original one by about 36 minutes or 39 minutes or by the second one after I created all the indexes by two hours and seven minutes.

And if you notice, this query is not using any nonclustered indexes. This query is using just all clustered indexes. It’s just it’s scanning all of them.

Right. We’re not doing anything better here. We’re not doing anything more efficient when we touch these indexes, except for the fact that they are all happening or probably the majority of them are happening in batch mode. All right.

So batch mode works a lot of magic on queries that do stuff like that and have lots of problems running in row mode. All right. This query looks incredibly dissimilar to what we had before, but it’s the exact same query. This is what I ran here.

This is what I ran here. All right. It’s the exact same query over and over again. This is what I ran here. Let me scroll back up to where this thing is. It’s the exact same query, just in a different execution mode. All right.

And another thing that missing index requests won’t tell you is that sometimes you can get things better by using a temp table. All right. We create a little table here called precheck, and we dump a partial aggregation of the results we care about into that temp table, and then we join that temp table off to just one other query.

We can get this thing down to three seconds, right? I mean, close enough. There’s 1.7 seconds there.

There’s 1.8 seconds there. So, okay. So maybe a little closer to four seconds. That’s fine. But we’re still benefiting from batch mode. We’re still getting some results there.

So we could run this query just as it is and see a quite significant improvement from, you know, 40 minutes to over two hours to eight seconds. That might be good enough for most people, right? We could also, you know, if we’re allowed to rewrite the query, we could use a temp table.

We can get that down to three seconds. Probably three seconds, probably preferable to eight seconds, depending on the activity, right? Depending on your level of impatience.

And one thing that missing index requests are also not going to tell you is when the optimizer just misses serious opportunities, like early aggregation. Early aggregation is one of the most powerful things you have up your sleeve as a query tuner because the optimizer often misses this. The optimizer often doesn’t think, oh, I can group by some stuff first.

I could group by these columns that I’m joining and doing other things with, and I could join fewer of them because I’ve aggregated early. So if we could rewrite this query to do one early aggregation as part of a derived join, and another aggregation as part of another derived join, and I could join those two derived joins together, and I could even use batch mode as I’m doing all of these things, and I could get this query to run in about a second and a half.

And, of course, SQL Server still thinks that we’re missing an index on the user’s table, which, you know, granted, in the other query where we would have saved like 35 milliseconds on a query that took, you know, 40 minutes or two hours, well, that’s not too significant, but maybe 40 seconds out of 1.425 seconds.

Maybe if we were really just itching to get this thing down to 1.38 seconds or something, what would be, I don’t know, I actually don’t know anyone who would do that. But if you’re out there, I want you to leave now.

I would appreciate if you just step away from the computers. You can buy a book on rock collecting. It might be better for everyone if you would just do that.

So if there’s a message here, it’s that missing index requests should not be a big part of your query tuning efforts. Every once in a while, missing index requests do line up with something that you should do or an index that you should have in some way. But for me, when I’m dealing with this sort of thing, the last thing that I generally care about is missing index requests.

Even when there’s a missing, like even when you go and run the query and get the actual execution plan, you might see the missing index request line up with, let’s, Adam Mechanic calls them data acquisition operators. So there’s two forms of operators.

There’s data acquisition. There’s data processing. Data acquisition is, of course, whenever you read from a thing, right? So whenever you read from an index, whenever you do one of these things, you are acquiring data. Whenever you do one of these things like join or aggregate data or compute scale or filter, you’re processing data.

So there’s acquisition and processing. So if the missing index request does at least line up with when you are acquiring data, that taking a long time, it’s probably worth thinking about creating an index similar to the missing index request.

But another thing that’s a very big deal is that the missing index requests often don’t give you the best possible index. There are a lot of things missing from how it decides what columns go in the key and what columns go in the includes or even what columns show up in there at all. So be very, very careful out there with them.

Don’t just think that because there’s a missing index request, you could make anything meaningfully better or faster. You got to run the query. You got to get the actual execution plan. You got to see where time is spent.

And then you have to figure out how to reduce that time. You don’t have to figure out how to reduce logical reads. You might create indexes and reduce logical reads like in this query. I don’t know.

Did logical reads go down? Maybe, probably. Who knows? Who cares? I don’t look at them anymore. They’re stupid. You might reduce logical reads, but you might have a query that went from 40 minutes to 2 hours and 7 minutes. I don’t know.

It’s not a good thing to judge things by. What you care about is making this final number go down. This is the metric that makes you a query tuner.

Going from 40 minutes or 2 hours to 1.4 seconds, that’s query tuning. Reducing logical reads, if it happens as a byproduct of this, cool. If not, whatever.

This over there in the corner, that’s the metric that you care about. And there’s one final go around on this. Adding missing indexes is not what got us there.

All right. So with that out of the way, I hope you enjoyed yourselves. I hope you learned something. I hope you will not care so much about missing index requests in the future because they make you lazy and dumb. They make you a bad performance tuner.

What you should care about is figuring out why that query is actually slow because it’s probably not going to be all those missing indexes. Anyway, I’m going to go finish my drink and go to bed. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Eager Index Spools

Indexing SQL Server Queries For Performance: Eager Index Spools



Thanks for watching!

Video Summary

In this video, I delve into the intricacies of eager index spools in SQL Server query plans, particularly focusing on their appearance and impact. I explore why these spools often arise, especially when nested loops joins are involved, and how they can significantly degrade performance without any accompanying missing index requests or optimizer hints. By examining specific examples, including a detailed walkthrough of a problematic query, I highlight the inefficiencies and potential pitfalls of eager index spools, offering practical advice on how to identify and mitigate them for better query performance.

Full Transcript

Erik Darling, right here in the flesh with Darling Data. Darling Data, of course, has no flesh except me. I am the only flesh. In today’s video, we are going to talk more about how to fix query performance issues with indexes. And in this one, we’re going to talk about when an eager index spool arises in your query plan. Now, I’ve talked about eager index spools many times. You’ll find other videos on this channel talking about them. But the things that I want to outline about them, that I find interesting about them, are they are not accompanied by a missing index request of any variety. There’s not going to be any green text in your query plan. And there’s not going to be anything in the DMV saying, hey, I created an index for you while you’re query ran, you should create this index. That doesn’t happen. Two, they are very, very likely to show up on the inner side of nested loops joins when you don’t have a good index, but sometimes even when you do. Three, even if they are building in a parallel plan, the index, the table that you read from to build the index, and building the index only happens on a single thread. So there’s a lot of wasted CPU threads in that whole thing.

And that can also make the index, make the table read in the index build take a very long time. Four, three, five and a half? B, Q, I don’t know, whatever. In parallel plans, eager index spool builds are accompanied by a weight called execsync, E-X-E-C-S-Y-N-C. But in a serial plan, a single threaded plan, they emit no weight, even though we did, we would be waiting on that spool to build. And that was something else. Oh yeah, building spools is highly inefficient. It is essentially a row by row build of the object. And Microsoft has made absolutely no progress in improving that scenario over the years. So there’s a lot of bad stuff about eager index spools, especially when they are built off very large queries.

So if you’re working with a query and you have big tables in your database and you see a nested loops join, and on the inner side of that nested loops join, you see an eager index spool, not a lazy table spool, an eager index spool, you can be pretty sure that either you don’t have a good index to support whatever predicate search you’re doing, whether it’s via join or where clause or a combination of both, or the optimizer is once again taking its nose beers and not using the index that you have. And I’m going to show you an example of that.

So with that out of the way, let’s talk about how you can buy me some regular beers, because I have enough nose problems without anything else. I don’t need your help. If you would like to support my videos, you can sign up for a membership. It is as low as $4 a month. That’s not even enough to get another knuckle tattoo.

You can join the 30 or so other people who have done so and who care enough about my endeavors and my mental well-being to pay me $4. There’s a link right there in the video description for you to do that. How convenient. Make it very convenient for you to hand over money. If you have spent your $4 on beers of any variety, you may like, you may comment, you may subscribe.

I would suggest waiting until you’re sober to do so, though, because the only thing worse than comments are intoxic comments. We don’t want those, do we? Because comments are forever. If you need help with SQL Server, if you feel like SQL Server has indulged in too much of any substance and it is just not working out well for you, I am the best SQL Server consultant in the world, according to BeerGut Magazine, of course.

And if you need any of this stuff done, I am just the nose to do it. And as always, my rates are reasonable. If you would like some very high quality, very sober training, sobering training, maybe.

It’s like these videos. It’s fun. About SQL Server performance, you can get all of mine for about $150 a month. That is the rest of your life.

You don’t need to resubscribe or rebuy. That’s about it. That’s the URL. That’s the discount code. Of course, there is a fully assembled URL for you in the show video description. The show description.

Is this a show? I don’t know. I am showing you things, but… Now, you show up on your screen. It’s not a podcast because I’m not just like, you know, vocal frying into a microphone about dumb stuff. Anyway, more about upcoming events in 2025.

For now, I don’t know. Enjoy your alone time. With that out of the way, let us talk about eager index pools. Now, probably the most fascinating thing to me about eager index pools is how often the optimizer will put them into query plans, often to the detriment of performance.

In a sane world, and by sane, I mean like in a world in which the optimizer kept up with the times, and the optimizer was more aware of current storage trends. I realize that storage in the cloud is probably as awful as the old spinning disk storage that you used to have on your, you know, on-prem SQL servers.

But, you know, at the same time, it would be nice if the optimizer were a little bit more aware of these things and didn’t feel the way it feels about IO patterns sometimes. There’s very little difference between sequential and, or rather, like when you think about like old hard drives, there’s very little difference between like doing this on an SSD and doing this on an SSD. Right?

Because SSDs don’t do this like old hard drives do. Everything is kind of randomly scattered about. Oh, there’s memory sticks anyway. There’s very little like logical order in there. So the optimizer, not really up to date on those things.

And the optimizer will frequently pick nested loops joins plans and put an eager index pool on the inner side of the plan. The outer side is up here. Then there’s the loop join.

And then you come down here. And this is where the spool goes. It’s not a very good time for your queries. In a sane, rational world, like the one that I was trying to describe there, the optimizer would just choose a merge join. Maybe not a merge join.

A hash join. Let’s say the optimizer would opt for a hash join and just throw a missing index request and move on. But no, SQL Server has to do something totally different. Now, there have been a couple weird occasions in my life where I have written queries in a way specifically to get an eager index pool.

And not just for the sake of a demo. There have been times in my life when having an eager index pool built was actually a better thing for the query than anything else. And the way that I ended up doing stuff like that was you can see this join up here.

It’s just a regular old equality predicate joining post to votes. But down here, I have written this as two inequality predicates. A greater than, equal to, and less than, equal to.

And since merge and hash joins require at least one equality predicate in order to work, this gets us a change in the execution plan. Now, I don’t recommend it for this particular query.

For this particular query, it is an absolute disaster. But this is the much better plan where we just have the single equality predicate. But if we rewrite that single equality predicate as two inequality predicates, all of a sudden SQL Server says the only thing I can do is use a nested loops join and have this on the inner side of the join.

So this is what I was talking about. Like in a sane and rational world, SQL Server would encounter a situation where it may have considered this in the past, and it would just do this instead.

Just do a hash join. Say there’s a missing index. Fine. Anyone can get on board with that.

What’s hard to get on board with is SQL Server building a 52 million row index in the middle of your query execution. Again, there were some very special edge cases in which I did that on purpose because it suited the workload better. In this case, not good.

Not a good time. So if I were to let that run and get an actual plan, that would run for about almost two minutes. So we don’t want to do that.

But what’s very funny, and I’m going to explain this query a little bit. Oops, what did I do? Oh, there’s already an object called P1. All right, well, we’re just going to call you P11. We’re not going to stop and mess with that. I guess I was practicing this one too much.

But we have this query here. And what I want to tell you about this query is that this is not a natural join in the Stack Overflow database. The users table, the ID column in the users table is a clustered primary key.

The ID column in the post table is also the clustered primary key. That is not how those tables relate. These tables relate by a column in the post table called owner user ID. The funny thing is that the ID column on the post table, like I said, is the clustered primary key.

So there is an index on it. So when I’m saying, like, joining one clustered primary key to another clustered primary key, you would think SQL Server would say, oh, I can do this in, like, a number of different ways. I can seek.

I can do a loop join and seek to everything that I care about. I could just scan it once and do a hash join. It could also scan it once and do a merge join, but we don’t talk about merge joins. But it doesn’t choose to do that.

It chooses to do this instead. And it chooses to build an eager index pool off of the clustered primary key here. This takes about, well, there’s four and a half seconds here. And this is where I’m going to show you a couple of the things that I mentioned about eager index pools.

One of them is that they read from this table and build the index pool on a single thread. So even though this is a DOP8 query, we ain’t doing that at DOP8, are we? We are doing that at DOP1, 17 million rows.

Two is that, like, so this is in a parallel zone and that happens. So if we take the four and a half seconds it took to read from this table single threaded and subtract it from this, we end up with, what, like 15 seconds or so of time building the eager index pool.

There is no missing index request up here saying, hey, we could use an index. And I forget if exec sync shows up in the query weights. It does.

There it is. This is the exec sync weight that I was talking about that shows up in parallel execution plans. If this query ran at DOP1, there would be no weight indicating that an eager index pool was built. We only get this in the parallel version of the plan.

So this is all highly inefficient, highly ugly. And the worst part is that we already have an index here. So sometimes when I’m tuning queries and I see an eager index pool, I’m like, dummies didn’t make an index.

And then I go look and I’m like, this is a perfectly good index. Eagle server. Toot toot. So sometimes what you have to do is you have to tell SQL Server, no, no, no. Trust me, buddy.

Put the spoon down. We should be seeking instead. So before I move on, I just want to say this query runs for 22.171 seconds in total. But if we run this query with the four secant, this should only take a couple seconds.

Oh, yeah, there’s already an object called P2. We’re going to make it P2.2 then. Again, I practice these too well. Sometimes I’m just too good at my job.

And this takes 1.5 seconds. And it turns out without adding an index or doing anything, we had a perfectly seekable index for SQL Server to use. Look at that.

We have a clustered index seek. Before we had a scan and we had a spool. SQL Server. Your trip to the powder room was ill-advised.

So in a lot of cases, eager index pools are really just irritating missing index requests. Sometimes they are optimizer deficiencies and shortcomings. But the grand scheme of things, I don’t know.

What do I prefer? Four secants or having to wait and sit there and wait while an index creates? I’ll take that four secant any day. So if you’ve already done your job and you’ve already indexed your SQL Server queries for performance and you’re not getting performance, take a look at those query plans.

You might see something awful like that. Now, let’s get this thing running. And we’re going to talk a little bit about this.

So a lot of the queries where I show off bad things that happen, I use either cross-apply or outer-apply because the inner side of nested loops is an interesting place for the SQL Server’s optimizer. It does all sorts of goofy things in there. Parallel nested loops are especially strange birds.

I thought for a bit about doing some videos about parallel nested loops, but I don’t think I could… I don’t think that even with all the time in the world in, like, Paul White’s notes and the SQL Server’s source code that I could do adequate justice to how weird parallel nested loops are. There is simply too much going on there.

But we have this query, and this is another situation where you might see an eager index pool arise. Like, just have a little subquery like this with an aggregate in it. And if we look at this query plan, and let’s just drag this way over here, something quite similar happened, right?

Here is our eager index pool. Here is our post table. So essentially the same thing happened.

Now, this is the index that we currently have on the post table, right? It’s on post type ID, then parent ID, then owner user ID. The part of the query where we got an eager index pool was on this.

So SQL Server is essentially complaining that we don’t have an index that helps with this. And again, for whatever reason, SQL Server could have just chosen a hash join or a join. But it didn’t.

It chose a nested loops join. It told us nothing about an index that might be missing. It just created that index down here. Now, the normal way of solving this problem is, or, you know, mostly what I do is I right-click on the eager index pool. And I look at two sections.

I look at the seek predicates. And this is what I would create as the key of an index to fix this problem. In this case, SQL Server wants an index on parent ID, right? We can see the parent ID column reference here, which makes sense because we are correlating the outer query with the inner query on parent ID, right?

Look at that. How nice. And the other thing you want to look at is, let’s go back to the query plan, and let’s get those things back up.

The other thing you want to look at is the output list, because the output list will tell you what columns you will probably need to include in your index. So look at the seek predicate. That’s the key of your index.

Look at the output columns. Those are the include columns in your index. 99% of the time, this will be good enough. You may have to think about things a little bit differently if you have, like, a top one with an order by or something, because the order by column might need to go in the key of the index, too. But for most cases, this will be good enough for SQL Server, at least to use the index.

I don’t know if that’s going to make your query as fast as you want it to be, but that’s what you could do. Now, there are ways that we could go about fixing this, of course. Let’s say we wanted to create this index to fix it.

We could say create this index on parent ID and then score, right? And that’s essentially what SQL Server was asking for in the eager index pool plan. It wanted a key column on parent ID.

Fine. It wanted us to include score and ID. Well, we have to explicitly reference score, because score is just a regular column in the table. We don’t need to explicitly reference ID in here, because ID is the clustered primary key on the table.

So that automatically gets built into any nonclustered indexes that we create. And now if we run this query, let’s just look at the estimated plan real quick. Make sure that we are spool free.

Look at that. We have no more eager index spool. And now if we run this query for real, it will not take 20 or 30 odd seconds to run. Oh, it should be a lot faster.

I don’t know what this thing is doing now. Ah, there we go. For some reason, that took 10 seconds. I’m going to have to have a talk with SQL Server. Did some…

There was some… Oh, see? This is the problem. This is the problem. This is why I hate… This is why I hate parallel merge joins. Look at that. Isn’t that ugly?

Look at this. I bet this repartition streams has an order by, and we have introduced intratread dependencies into our parallel plan. This is a terrible idea for a SQL Server.

If you see situations like this, where you have anything that preserves or requires ordered operators around SQL Server and parallel execution plans and your SQL Server queries, run screaming. So let’s look at…

This one has an order by on. Does this one have? It does. We have two order preserving exchanges in a row. Under worst circumstances, this could have been really ugly. We could have had exchange spills.

We could have seen spills here and here and maybe… Actually, no, that’s about it. I doubt that one would have, but you could have seen these spill and have real big problems. This one probably came very close to it.

So be very mindful of that sort of thing. I’m going to go a little off script here. And what I’m going to do is just to see if my hypothesis is correct. I’m going to throw a hash join hint on this query because…

Actually, hash join loop join because I have a feeling SQL Server would tell me it couldn’t create a query plan just doing hash joins here. But let’s just see if this goes any better.

Maybe it will. Maybe it won’t. We’re going to find out together. All right. So yeah, this one took five seconds. All right. Notice that we don’t have terribly painful things happening in here anymore.

And now we get a very helpful bitmap up here. And this, rather than taking almost 10 seconds, this is down to five seconds. So a little off script, but, you know, if I don’t do that sometimes, I don’t know, I might appear a bit stodgy here.

But anyway, that’s one way of dealing with it. Another way of thinking about this is let’s actually let’s drop this index because what I want to do is show you something a little bit different than just indexing.

So let’s think about what this query is doing. We’re looking for this, right? The thing is we’re looking for answers.

But in the query logic, we care about showing you back up here. We care about where the answer score is greater than the max score on this. The thing is, the thing is we know something SQL Server doesn’t.

We know something about our data. And what we know about our data is that only queries with a post type ID of two can be an answer.

And we already have an index where post type ID is the leading key column. So if we were to rewrite our query to do this instead, we wouldn’t need that other index, right?

So if we actually take a minute to consider our query a little bit, this looks like a very similar plan to the one we had before just without the spool down in here. Now we have another index seek in there. So if we run this query and we wait, I can’t promise that this is going to be as good and fast as our hash join query.

It’s probably about a second slower. But we look at the query plan. So this finished in about 6.2 seconds. The hash join plan was about five or so seconds. But we really do improve this area in here, right?

There’s no more eager index pool. And this time we took advantage of our index on post type ID and parent ID. And we have sort of a double hop seek where we have parent ID and post type ID being seeked into.

So sometimes when you’re trying to fix eager index pools, if you don’t have a good index in place at all for the query, you may need to create one. If you already have a good index in place, you probably need to use a force seek hint.

And if you have an index that like a nonclustered index that’s like is just a little bit off from what you’re trying to do, pay attention to the way that your query is written because you might be missing some valuable information to give SQL Server in order to be able to use that index more effectively. In this case, we used our domain knowledge about the data in the stack overflow database to say, hey, we only care about comparing answer scores to other answer scores. We need to just find answers in the inner side of that aggregate that we’re comparing to.

So with that out of the way, I guess that’s the whole video. Cool. It only took 22 minutes and some odd seconds.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope to see you. Well, I hope you see me in the next video. If you don’t, well, it was nice. You might be dead.

That’s real sad. Yeesh. I don’t like the sound of that. Anyway, thank you for watching. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Fixing non-SARGable Predicates

Indexing SQL Server Queries For Performance: Fixing non-SARGable Predicates



Thanks for watching!

Video Summary

In this video, I discuss the challenges of dealing with non-sargable predicates in SQL Server queries and how rearranging indexes can help improve performance. I share a practical example where a computed column is wrapped in a function within a WHERE clause, leading to suboptimal query execution. By creating an index that allows for better seeking, we demonstrate how to mitigate these issues without altering the underlying stored procedure, which might be immutable due to vendor restrictions or other constraints. This approach showcases a clever workaround using SQL Server’s indexing capabilities to optimize queries even when direct code changes are not possible.

Full Transcript

Erik Darling here. Lo and behold, with Darling Data. No acquisition letters yet. I applied to Y Combinator and I said, hey, anyone want to buy this thing? So far, someone sent me a bottle of lube and a can of pork beans. Those are the only offers I’ve gotten so far. I don’t really know what to do with that. I’ll probably just keep making these videos in the meantime. So anyway, we’re going to sally forth with our series on indexing to improve query performance in your SQL Server queries. And in today’s video, we’re going to talk about how you can rearrange indexes to fix non-sargable predicates. I know that there’s, I don’t know, maybe depending on what your reading habits are or what your interests are in general, you may have seen blog posts about using computed columns to do this. We’re going to talk about computed columns a little bit later in this series because I had a comment about computed columns that now has set me ablaze.

So this is not that, this is not about using computed columns. So if you are expecting that, well, stick around because we’re going to do something different. Maybe you’ve never seen it before. Again, depends on your educational background a little bit for SQL Server things. But before we do that, we of course need to talk about your financial background and your financial backing of Darling Data Enterprises. If you would like to support the content that I create on this channel, there is a link down in the video description that says become a member. And by clicking on that link, you can give me four bucks a month to support this channel, which after taxes is a little over $3. So who is the real winner?

I don’t know. If you, you know, I realize that it’s Christmas and while some of you might feel giving in the realm of $4, you may have already bought your, your, your Nana or your aunt or I don’t know, a step parent, some, some dollar scratch offs or maybe a couple of $2 scratch offs. And that, that, that $4 has evaporated from your wallet, from your financial world. In that case, you can always like or comment or subscribe.

I’m happy to see numbers of any sort go up except blood, blood pressure and cholesterol. It’s the only ones we like to see either go down or stay even. It’s the only things we care about.

If you need help with SQL Server, I am famous for processing. You’re performing Christmas miracles. So if you need any of this stuff done, if your SQL Server is slow, I don’t know. Do you sell a lot of things on Christmas?

Probably not. But you know, it is the holiday season when sales do tend to spike. So if you have, having SQL Server problems, I am a consultant that fixes them and my rates are reasonable. So there’s really no reason for you to go anywhere else or talk to anyone else because they won’t, they won’t do as good of a job as me.

If you want some great SQL Server training all about performance tuning and other, well, really just performance tuning. I was going to say other things, but I pretty much stick to the performance tuning stuff because I don’t want to spread myself too thin. If I start pretending that I know about availability groups, I feel like you would see through that charade pretty quickly.

But you can get all of mine. It’s about 24, 25 hours of content for the rest of your life for about 150 USD when you use Yonder coupon code. The fully assembled link to perform all of these Christmas miracles is also in the video description.

So you don’t have to do too much work or typing because I’ve seen, I’ve seen your typing. Not impressive. Mavis Beacon would be incredibly depressed if she saw you typing.

Rolling in her, rolling in her CD-ROM drive. Because she is. More news on events as 2025 progresses for now.

You must live in darkness. Sorry about that. But with that out of the way, let’s talk about fixing non-sargable predicates with indexes. Now, I think I dropped this index.

Yes. Well, if I didn’t before, I did now. So I’m going to start creating this and then I’m going to start talking. So I don’t, you don’t have to sit there and wait for an index to create. That’s just rude and crude and vile.

And for some reason, ZoomIt is taking its sweet time. All right, let’s move some of this stuff around a little bit. So it’s all on the same screen. We don’t need you struggling to read. I know I’ve, again, much like you’re typing, I’ve seen your reading and you are, you are not very good at either one.

So as a consultant who gets called in to deal with performance issues to this very day, despite the bevity of, be, be, be, be, be, be, be, be, be, be, be, be, be, be, be. Despite the bevy of knowledge distributed across the internet widely and freely that when you start wrapping columns in your where clause and your join clause and functions, even the built-in ones in SQL Server, you are bound to at some point have that cause some performance issue, big or small, it will creep up on you.

And the bigger your data gets, the worse these performance issues will get. I fix these things constantly. It is like an unwritten rule of my consulting engagements that I will be pulling is null and coalesce and substring and replace and all the other things out of joins and where clauses where people are just done, just plain dumb things with them.

And what, but what, what sucks is, gosh, the, the, I think the number of actually qualified third party vendors who make products that, that, that interface with SQL Server is, is very, very small. Even Microsoft’s own, like software products that touch SQL Server, the, some of the worst code and database design and indexing I have ever seen in my life. It’s like the people who make software at Microsoft are here.

And the database team at Microsoft is here. And there is like, just like this, like this silo bunker wall between them. Like there’s just no way to meet them.

Like even the people who work on like entity framework, like they just do things that like, if anyone with a reasonable sense of databases saw what they were doing, they would just hit them with a four by four. They would like hacksaw Jim Duggan, just whack them.

I don’t know where they get this stuff from. It, it, it, it never ceases to, I think they do they just, I think they just want to keep me in business. That’s what it feels like.

I mean, if that’s the case, thank you. I appreciate your contributions, but man, it’s astounding to watch. Now, this is not a giant big fireworks demo. This is, this is just to prove a point that you can, you can make a difference even if you can’t change the code, which was, I think was the point that I was going to make when, when I started talking about vendors and Microsoft and entity framework is that sometimes you can’t change the query.

Sometimes there are things and things that you are not allowed to change. Now, vendors might disallow you from changing store procedures. That might say that puts you out of support.

You might not be able to figure out how to fix whatever entity framework query is being generated. You might, you know, like you might have zero, like queries might come from like binaries or DLLs or something built into your software. And you might not be able to do anything with, with the change that, right?

Like, what are you going to do? You’re, you’re, you’re hosed on that. You can’t change that. The vendor has to change that. And the, you know, the vendor won’t do anything. Won’t lift a finger to help you.

Who are you? Anyway, let’s say we have this index, right? And we’re going to, we’re going to pretend that the store procedure below is completely immutable. We are not allowed to change it.

We, we, we’re going to, we’re going to, we’re going to, we’re going to straddle a fine line here and change an index. All right. We’re going to be, there’s a secret that you and I are going to have to keep. We are, we might be violating a EULA somewhere.

I don’t know. But this is the index that we have currently on community owned date and score. This is on the post table. In case you, in case, again, I know you’re, I know how you’re reading is. That’s the post table.

And let’s pretend that this queer, and I’m going to handicap this query a bit rather. I’m going to, I’m going to say, I’m going to hobble this query a bit. I’m going to, I like, I like misery as a movie and I like, I like the term hobbling. So we’re going to say, we’re going to hobble this query, but I like misery much better than golf.

In golf, you get a handicap and misery. You get your ankles broken, the sledgehammer, but we’re going to hobble this query a little bit to make sure that SQL Server can’t use batch mode to do anything better.

And we’re going to limit this to max.stop one so that I can sort of show you just how profound a difference you can get with these changes, because you know, who knows, maybe you had some, maybe you hired a crappy consultant who told you to set cost threshold for parallels to like 350 or something for no reason.

Just a wild guess. They were like, Oh, those C, look at all those CX weights. Let’s, let’s fix that. Because this is the kind of help you get at unreasonable rates. When you pay Erik Darling with his reasonable rates, you get good advice.

When you pay other people, I don’t know what you get. I’ve seen the results though. They’re not good. So we have this store procedure and this store procedure is going to break a cardinal rule of querying.

It is going to wrap the community owned date. There’s the lead column in our index and the coalesce function. We’re going to, I don’t know why people do things like this. They think that like, what do you think nulls are going to throw an error?

You think SQL Server can’t handle nulls? It’s your stupid programming languages that throw those null exceptions. It’s not SQL Server. But people do dump stuff like this all the time.

Don’t ask me why. I didn’t, I didn’t make them do it. But when we run this query and we look at the execution plan, we’re going to be unhappy, aren’t we?

We’re not going to be thrilled with the performance of this thing. It takes 1.5 seconds and it scans our index. All right. We have, we have a perfectly good index for seeking into the, the, the community owned date column, but yet we don’t.

We do not seek into this index. We scan the entire thing. It takes a second and a half and we’re, we don’t like that. So what we’re going to do, because we’re, we’re allowed, we’re, we’re, we’re going to pretend that we’re allowed to do this.

We can’t change the store procedure. We can’t just take that coalesce. Like we can make a copy of the store procedure where the coalesce is removed and we could run it side by side with that, with the other store procedure. And we could send the results to a software vendor and they would say, no, no, we’re not helping you, which is what happens all the time because software vendors suck.

But if, but let’s just say we were going to change this index and we’re going to use one of my favorite index options in the world, drop existing equals on, right? Just slide this index in place, drop it on in, which, which is good for us.

So, uh, now when we run this query or sorry, when we run this store procedure, it does not take a second and a half. It takes about a hundred milliseconds because now we can seek into the index for the score, to the score column.

And we just have that stupid residual predicate, right? We now have a SQL predicate on score and we just have the awfulness that is the coalesce function, uh, running its case expression on the community owned date column. And that’s a residual predicate, but we can, we can at least seek to the scores we care about.

And that’s, that’s the good part for us, right? We no longer have to worry about, um, all that, uh, scanning of the index because of the coalesce function. Now we can seek to the, the, the, the scores that we want and then evaluate the community owned date second.

And there’s a lot less over, or there’s a lot less pain in that. There’s a lot less suffering than that. It’s far less overhead because we have already sought, you’ve already done a seek to the limited number of values we care about and applying that residual predicate is not a big deal. If the score column were on a, the score come at a less selective predicate, you know, might be, might be a seek of the whole table plus a residual predicate of the whole table.

That would potentially be less, that would potentially be equally as unpleasant as just a scan of the whole table. But, uh, we’re going to pretend that we’re, we’re a little bit smarter than that. So anyway, this is how you can re, uh, arrange your indexes to beat non-sargable predicates.

Uh, often SQL Server will do the, the smart thing and choose better indexes when it can seek because it thinks, you know, it’s like, Ooh, look at this nice cheap seek I can do. Uh, SQL Server being the cheapskate that it is, will often choose the smarter index in this scenario.

Even if you had another index hanging around, um, if like, you know, in this case, we change an index. You like, you know, if you, you, you, you run the SQL Server that your software runs on, uh, that started the vendor software runs on, you might just create a new index.

And then, you know, when you have to do a software upgrade, you just quietly drop or disable the, the custom indexes you’ve created. So the, the, the, the installer doesn’t explode and say, how dare you try to do better than us? You know, like, cause you’re all idiots.

You know, I don’t know. I don’t understand how you get paid to do what you do. I don’t understand how, how your software company makes hundreds of millions of dollars a year. The product is garbage.

Ah, well, what can you do? SQL Server is $7,000 a core and it can’t even handle a column wrapped in a function still. So, one wonders how, how deep the rot goes sometimes, I suppose. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. Uh, I hope that you, uh, are, are happy and healthy and feeling loved and appreciated by all those around you. I don’t know.

Maybe you have, maybe you have a cute dog or something. Hmm. Like, licks your hand and take it for a walk. Sounds nice, right? Smoke a cigar.

Anyway, uh, I’m gonna go record something else now. I think I have another video. Uh, apparently I have 50,000 videos to record. So, um, I’m gonna try to get that done. Anyway, thank you for watching.

Goodbye. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Predicate Key Lookups

Indexing SQL Server Queries For Performance: Predicate Key Lookups



Thanks for watching!

Video Summary

In this video, I dive into the world of indexing for performance optimization, specifically focusing on fixing predicates in key lookups. I explore how these lookups can become problematic when queries suffer from parameter sensitivity issues—those pesky “sensitive Sally parameters” that make query plans unpredictable and inefficient. The goal is to reduce the time it takes for a user’s query to return results, ensuring every second counts. Throughout the video, I also highlight ways to support my channel through membership, offering training and consulting services for SQL Server performance tuning, making sure you have all the tools needed to tackle these challenges yourself.

Full Transcript

Erik Darling here. Darling data. I’m a little fuzzy here. Let’s unfuzz myself. There we go. Now you can see everything in crisp, clear, high definition detail. In today’s video, we’re going to talk about indexing for performance, specifically around fixing predicates in key lookups. Now, I’m going to talk about exactly what all that means. when we get to the demos. But it is generally a sensible approach to take when index tuning, because the last thing you want to do every time you make a trip back to the clustered index is evaluate if a predicate passes one of your filters. You don’t want to do that. It’s not good for performance at all. It’s a bad time, especially when you have queries that suffer from parameter sensitivity issues. Which, oh boy, don’t we all have those. sensitive Sally parameters. But before we get into that, let’s talk about things that I am sensitive to. Like money. I do need it for things. If you would like to become an appreciative member of this channel, and support my endeavors in making this content for you, you can go to the video description and click the link next to become a member. And you can become a member for as little as $4 a month, of which I will see $3.10 post taxes. If you have spent all your money on dirty women and booze or whatever else you spend money on. Grass seeds, lawn mowing equipment, great outdoors, I don’t know, hiking gear. You can like, you can comment, you can subscribe. Always to keep me from bashing my head against the wall. It’s a great time.

If you need help with SQL Server because you are having problems like the ones that I describe in these videos. Or maybe you don’t know what kind of performance problems you’re having and you’re just looking for someone to untangle all that for you. Oh gosh darn it, I am the best consultant in the world. At least for SQL Server. For other things, probably not so much. I can do all of this stuff. And as always, my rates, they’re reasonable. You can get all of my training for $150 just about for the rest of your life. If you use that discount code at that training site up there, it’s a very good deal. You should you should get it. It’s not just for Black Friday. It’s for every day. So aren’t you lucky? No upcoming events until later in 2025. I’ve said this a million times. We’ll talk about this stuff later. And with that out of the way, let’s get on to talking about these here predicate key lookups. So our goal as query tuners, as people who attempt to make things faster in exchange for money, is to reduce the amount of time it takes from a user pushing a button until that user gets a result. I don’t care what other metric you reduce or increase in that endeavor. You could make logical reads go up or down. You can make CPU go up or down. You can make memory go up or down.

But as long as the duration, the wall clock time, the tick, tick, tick, tick, tick, tick, tick, tick that it takes before that query returns results to someone gets shorter, I think you’ve done a pretty good job. Sometimes we’ll have to make queries go parallel to get faster. Sometimes we’ll have to use more memory to avoid spilling to disk or something. Sometimes you might have to use more space in tempTB by putting some result in a temp table first. Sometimes we need to add indexes to our database in order to make queries faster. These are all tradeoffs. All things that represent tradeoffs when you are query tuning.

Yes, sometimes you have to give up a little of this stuff in order to make the query faster. It’s okay. This is what databases were built to do. Deal with these tradeoffs. The only real exclusion to any of this stuff is, of course, a query rewrite. If you can just change the way that a query is written fundamentally in order to make it faster, great. You have sacrificed very little except your time and knowledge and fingers and typing and probably some portion of your humanity.

You know, like I’m not going to tell you to add a hundred indexes to a table to make some weird queries faster, but sometimes you do. Sometimes if a table has like no, like just a clustered index on it and you’re like, well, gosh, we’re not just searching from that thing. You’re, you’re okay. You have some freedom to add other indexes in. Other times you have to use some judgment.

You may have to consider what indexes are already there first. Sometimes altering an existing index is a better choice. Sometimes there are a lot of unused or already duplicative indexes you could clean up before adding another one to the pile. There are many things that you could consider that you could use the old jogging noggin and you could figure out first.

Now, like I said, there are various things in queries that are, that force the SQL Server to do dumb things. Some of those dumb things are things like local variables, table variables, non-sargable predicates, overly complicated joins and where clauses and stuff. There are all sorts of things that, you know, you can untangle for free that don’t involve you adding another thing to the database, whether it’s an index or whether it’s a temp table or something like that.

There are all sorts of things you can do that, that don’t have much of a trade off. But changing those things might change other stuff. Like you might get a parallel plan. You might use more memory, something like that. So, you know, I tend to, you know, I like rewriting a query, but sometimes all the query rewriting in the world isn’t going to change the state of the database in a way that makes that query any faster.

Now, key lookups represent a choice made by the optimizer between scanning a clustered index where all of the columns are available in the table. Sometimes it helps to, instead of thinking of things as a clustered index, to think of a table as a clustered table if a clustered index exists on that table. But the clustered index makes all, has all of the columns available in it.

And sometimes SQL Server thinks that it is cheaper to do one big scan of the table, have all the columns available, and have, be able to touch all of the data that it needs from one single place than it is to seek or scan into a nonclustered index also on the table, and then go back to the clustered index in order to retrieve columns that are not part of that nonclustered index to satisfy the remainder of the query. Typically, lookups get chosen when there’s a relatively small number of rows that are expected, but, you know, all sorts of things play into this.

And sometimes the number of rows that are expected are not the number of rows that end up making that transversal in reality. That traversal in reality. I guess I should have said. I don’t know what a transversal is. All right. Ah, screw it. Bad estimates in general.

And, of course, self-inflicted bad estimates. You know, again, local variables, table variables, non-soluble predicates, things like that, are a very way to eat, all things that can contribute to you having those problems.

But we’re going to look at an issue with parameter sensitivity around lookups. So, I think a lot of the times fixing lookups to avoid just columns you’re selecting is almost a last resort for me. There’s a lot of other things I would rather do than that.

But there’s only so much time in this video that I can talk about those other things. So, we already have this index created on the post table. And let’s pretend that this index was created by some long gone developer in the past.

And this query was great. Sorry, this index was great either, I don’t know, for the query below before we added more stuff to it, or for a different query altogether.

This index made perfect sense. And since this index made perfect sense for some other query, this query comes along and says, hey, you make perfect sense to me too. Now, there are a couple of things in this query that you might notice.

If you look up at the index definition, it is on score and then creation date and then post type ID. And that does take care of most of what we’re doing in here. It even takes care of most of what we’re doing in here.

Now, just make sure that everyone peeps this in their head. If you have a clustered table or a clustered index on a table, right, it’s the same deal. The table is clustered.

The clustered index key column or columns will be part of every nonclustered index you create. In a non-unique nonclustered index, they are an additional key column. In a unique nonclustered index, they are an included column.

So the ID column on the post table is the clustered primary key. So this is part of this index up here. You can just pretend it’s right at the end here.

So this does take care of most of what we care about. Now, the other thing you might notice is that I am doing select star. The fun thing is, it doesn’t matter how many columns are involved in the select or how many columns are involved in the lookup that you might do.

Every lookup has the exact same optimizer cost. It doesn’t matter if you’re selecting one column or a thousand columns. Every optimizer estimate for that lookup will have the same cost.

It doesn’t matter how many columns you select. So if we run this query, rather run the store procedure with a set of parameters that will touch a small number of rows, this will run relatively quickly.

Right? 326 milliseconds, but we have this lookup over here. And you might look at these numbers under the lookup and think, gosh darn, that looks funny. 7 of 20844467.

That’s 2.2 million something numbers. And the reason why that estimate looks so funny is because this lookup has a predicate. Now, every lookup will have at least a couple things in it.

Well, actually, that does depend a little bit. Every lookup will have at least this one thing in it. This seek predicate, which you can kind of see is sort of like a self-join because we’re joining the post column to itself on the ID column.

That is why the clustered index key column or columns end up in your nonclustered indexes. So SQL Server can do neat stuff like this. It can take that ID and it can look up rows in the clustered index from the nonclustered index.

Now, the two things in a lookup that may or may not be there. One is an output list. If you are just applying a predicate, actually, that might show up in the output list as well.

This might always be there. But the important thing is that this one has a predicate up at the top. So you have the output list, which are columns that are being selected.

And you have the predicate, which are additional filters that are being applied when we make a round trip back to the clustered index. So for every row that comes out of this nonclustered index here where we seek into it, every row that comes out, since this is a loop join, it’s one row. Boop, boop, boop, boop, boop, boop, boop, boop.

One row comes out and then we evaluate another predicate there. So we’re able to do our initial set of filtering. We’re able to seek to some of the filters we care about here and apply a predicate to some other filters here. I guess there’s a little bit of overlap in there just because of the state of things, but sort of beyond the point of this.

But the seek here does allow us to get to some of the rows that we care about. But then we have to filter out additional rows that we care about. Look how much this reduces things by.

Well, we get 327,567 rows out of here. That means we make that many round trips down to the clustered index here to do those lookups. So we made 327,567 round trips.

And at the end of all of those round trips, we ended up with only seven matching rows came out of that. That’s why for the rest of the query, we only have those seven rows here. So we were able to do some of the filtering with the nonclustered index, but not all of the filtering.

So, you know, like one thing that’s sort of important to figure out is what’s going to happen when we put it, we pass in a different set of parameters. So the set of parameters we passed in up here were a score of zero, which is we’re going to find every score greater than zero, which is most all of them. We’re looking for this is a creation date.

So just the last couple months of 2013, which for the stack overflow 2013 database is actually pretty tiny, too. And post type ID of one. Well, there’s about 6 million questions in the post table.

So, you know, this this thing doesn’t really filter stuff out. This thing does a pretty good job of filtering stuff out. And then this this is this is what we evaluate in the lookup because this owner user ID column is not in our index. So if we run this for a more set of parameters that has more rows come out of that initial nonclustered index seek, this is where lookups start to get painful.

Because what was a pretty sensible plan that ran in like 300 something milliseconds for that initial set of parameters does not do well with this one. We end up with a lot more rows. Now, keep in mind, we have the same cardinality estimate here that we did in the last plan.

In the last plan, this was high. In this plan, the number of rows that we get out is 1884% higher than our estimate. So we make a lot more trips back to the where is it actual number of rows.

We make a lot of a lot more trips back to the a lot more round trips back to the clustered index in order to get what we want from this thing. Right. So that remember, that’s still one row down here and then into here. Right. So one row at a time.

We do that and we end up with twenty seven thousand eight hundred and sixty nine matching rows from here. And that’s why we have. But you can see like the majority of the time is that we have about a second there, about a second and a half there. So that’s three and a half seconds total.

And then a little additional time in the nested loop. And then for some reason, the gather streams has a lower number than this. And then we go into the sort and we spend about twenty five more milliseconds there. One thing that’s important to do is when you’re looking at parallel plans, try not to focus too much on the execution time of exchanges.

They are a weird black hole of mystery and the code and all this. The code and all the operator timing stuff is very strange around them. So try not to look at those too much.

Anyway, if knowing what we know about indexing, we could do one of two things. We could change our index and we could add owner user ID as a last key column here. This would at least allow us to do all our filtering from one single index.

If you’re able to do this and get good enough performance, don’t sweat it. That’s totally fine. Adding one more column to the end of the key of an index is not going to change anything for any other queries that touch this index. But if we apply a little bit of domain knowledge, right, we might want to think about either adding a new index that better suits this query overall.

Or we might want to think about maybe if we have really good domain knowledge and like we’re like this index sucks and it, you know, never helps anything or rather it makes everything slow or blah, blah, blah, blah, blah. Every query that touches this index is awful. We might just we might just replace the index entirely with one that better satisfies what we’re searching for.

So owner you in our query owner user ID and post type ID are equality predicates and then scoring creation date are inequality or range predicates. We’re doing greater than or equal to on both or on both of these. And then, of course, if we do equality searches on owner user ID and post type ID, then score is in order for us here, which means that when we run these two queries, these will be a lot faster in general.

And we don’t have to we don’t have the same parameter sensitivity issues when these things run. One other thing that’s important for these, which, you know, might be important when in queries that you’re tuning, is that since we have our index set up correctly, we also don’t have to deal with sorting this data. Remember, our query from up here a little bit, if you’ll permit me to scroll, is asking.

Remember, we have our equality predicates on these two columns and we have our inequality or range predicates on these two columns. And then we’re ordering by these three columns. Now, ID is still going to be that last key column in the clustered index and the nonclustered index that we have because it’s a clustered primary key.

And then score and creation date are going to be in order because we have searched equality predicates across the leading two columns in the index. So we avoid having to sort data altogether. We have a top in here, but we do not have it’s not a top and sort.

And we do not have an explicit sort to put our data in the order that we care about because the index does that for us, which is pretty cool. So when you’re troubleshooting query performance, whether it’s parameter sniffing or just bad estimates or whatever it is, and you have a key lookup in your plan that is evaluating a predicate like I showed you in the earlier query. And, you know, you get the actual execution plan and you spend a lot of time in that loop.

It is usually worth fixing the predicate part of that lookup by adding whatever column or columns are being evaluated in the predicate to whatever index is being used by the query already. You might also choose to add a new query or add a new index that better satisfies the query as a whole. That’s also an option to you. Or you might want to, you know, replace the current index because maybe it’s just not good for any searches that touch it.

But that’s a much, much more difficult thing to ascertain just from query, tuning a single query. You’d have to know the workload pretty well to make a call like that. But anyway, this is a good thing to fix.

This is like honestly something that, you know, I have to do quite a bit is, you know, fixing up indexes a little bit to make sure that they are, you know, that they, they, they act, they act, they act. And I don’t want to adequately service the queries that are touching them because a lot of the times I’m dealing with queer indexes created by developers from, you know, all walks of life and all levels of experience with SQL Server. Some of them maybe knew what they were doing better than others.

But this is something that I do have to fix quite a bit. So hopefully you can use this knowledge to start fixing things for yourselves. So anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in the next video, which will be about something. I forget what, for some reason, tab number five is hiding from me. So maybe that’ll just be a nice surprise for all of us when I, when I get to that.

So anyway, thank you and goodbye. Bye. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Unpredictable Searches

Indexing SQL Server Queries For Performance: Unpredictable Searches



Thanks for watching!

Video Summary

In this video, I delve into indexing strategies for handling unpredictable searches in SQL Server. Unpredictable searches are essentially any queries where the WHERE clause, JOIN conditions, or selected columns can vary widely—think dynamic SQL within stored procedures or ad hoc queries from front-end dashboards. I emphasize a common pitfall: using double wildcard searches on every column with a parameterized search string, which can lead to significant performance issues due to full table scans and implicit conversions. To address this, I introduce the `RECOMPILE` hint as an effective solution for many scenarios, explaining how it allows SQL Server to use actual parameter values in the query plan, thus avoiding caching plans that might not be optimal. Additionally, I discuss the challenges developers face when dealing with such queries, including the lazy approach of relying on `RECOMPILE`, and highlight upcoming features like Microsoft’s new “Oppo” optimizer for handling optional parameters more efficiently in future SQL Server versions.

Full Transcript

Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Erik Darling here with Darling Data. As you’d well expect at this point, if you’re surprised by any of this, I don’t know what’s wrong with you. Maybe you should start taking notes about your day. You might have some sort of head knock that’s plaguing you in some way. I would suggest getting that checked out by a medical professional as well. At this point, I feel like this is a well-established set of facts. I am Erik Darling, and my company is Darling Data, and we talk about SQL Server. That is our goal. That is our role goal. In today’s video, we are going to talk about indexing for performance with unpredictable searches. You can consider an unpredictable search as anything that generates a query. Intel. Shut up. Intel. God damn it. Anything that generates a query that you don’t know what the WHERE clause is going to be, or maybe you don’t know what the JOIN sometimes are going to be, or what columns are going to select, or something like that.

It could be dynamic SQL in a store procedure, or it could be some ad hoc query generated by a front-end dashboard type thing that you have created, hopefully in a sober and thoughtful way. But that is what an unpredictable search is for me. Now, you have no idea what users are going to ask for, or maybe even what quantity they are going to ask for it in. There are all sorts of crazy things that you have to account for. But of course, before we talk about how you can account for that, we must talk about other accounting matters. More like my accounting matters. We are being honest here.

If you would like to support the content that I create on this channel somewhere in nearer around this channel for the low, low price of $4 a month, you can click the little video description in the video description. Click the link in the video description. There we go. Those are the right, that’s the right chain of words. And you can become a member. $4 a month. That’s about $3.10 for me after YouTube takes out taxes.

If you are just too encumbered by other debts and ransoms and your money is already allocated fully to other endeavors, you can like, you can comment, you can subscribe. There are all sorts of things you can do to let me know that you love me and would die for me. If you look at the things that I talk about on this channel and the first thing you think is, gosh darn it, that Erik Darling sure can make SQL Server faster in exchange for money.

Perhaps I could do one of these things for you because I’m very helpful when it comes to these things. And not only am I helpful, but my rates are reasonable. And while we’re on the subject of reasonable, gosh darn it, have you ever wanted to get 24 hours of SQL Server Performance Tuning training for $150 and not have to pay another $150 every year for the rest of your natural life?

You can do that with my training. It’s amazing. You click on that link, you enter that discount code, and kaboom, you have what I just described. It’s pretty wonderful.

Mouse. No upcoming events 2025, blah, blah, blah. We’ll talk about that later. But now let’s talk about this indexing conundrum that we have. First, I’m going to close event viewer.

I was troubleshooting a blue screen earlier. I know. Do my talents end anywhere? Yes. Pretty much where the computer ends.

The limitation right there. Where the computer stops. No idea what I’m doing outside of that. It’s funny how that works.

So we’re going to talk about one of my favorite things. I’m going to move some of this text down a little bit there so it fits on the screen better. One of my favorite things that I help clients with are big ugly queries. I know. Weird that a performance tuner likes big ugly queries.

Suppose it’s a lot easier to take a big ugly query and chop out all the easy stuff and make it go faster than to take what looks like a perfect query and tune that better. But, you know, some people just don’t know what they’re doing in either case. So that’s why people like me exist.

So one of the most common things that I see people start with looks something like this. What you see my head is mildly infringing upon. Where someone will have a parameter or whatever called search string.

And that search string will be used to double wildcard search every column in a table. This is one of the worst possible things you can do if you need to handle unpredictable searches in SQL Server. I beg, I beg you not to do this.

This will end up terribly. You may, you may look at some of the columns in this list like owner user ID and creation date and last activity date and think, hmm, Eric, I don’t think those are strings. And I’d say, well, you are right, but you’re searching them like strings and all hell’s going to break loose.

It’s not going to be a good time for you or your SQL Server. So, like, you really shouldn’t be doing anything that even looks like this. The problem is somewhat obvious if you’ve been using SQL Server for long enough.

Double wildcard searches and searching columns with mismatching types will lead to not only big scans because of the double wildcard, but also you’re going to be dealing with implicit conversions. Everything, the title column actually is a nvarkar 250 or something column.

So that would at least not result in an implicit conversion right there, but the rest of them you are hosed on. It’s going to be a bad time. What you might think is that using something like some of the built-in functions in SQL Server might be more highly performative, not performant, we don’t say that word in this channel.

I might beep that out later if I can figure out how to beep things out. You might think that it might be more performative. There might be more performativeness if you use car index or pad index, but generally there is not. There are some edge cases where I’ve seen them, like, do somewhat better, but it’s not really worth talking about.

But the problem in, like, the way that you deal with null parameters, if you use, like, isNull or coalesce or anything like that, it’s not a more clever scenario than doing something like this. Both of these end up with the exact same problems unless you use a statement level recompile on your query, right?

Something like that. If you do this, you can get around most of the issues that come out with these unpredictable searches. Most of the time, this will be Mr. Fix-It for you, right?

This will do okay. And for a lot of people, this is the path of least resistance, and this is the simplest thing to do, because the alternative is writing a lot of dynamic SQL to cope with which parameters actually need to be part of the where clause. I’m going to show you an example of just using recompile, it being nice and easy, and as long as you’re not allergic to compiling a query plan every time a thing runs, whether it’s because, you know, you are burned out on CPU anyway, if you have CPU slack, and you’re able to compile queries whenever you want, this is a perfectly fine thing, as long as that, like, the compile time for that query is not awful.

If you, every time you, every time you run this query with a recompile, it runs for a long time, but the time isn’t, like, in the query plan, you might want to check the compilation time to make sure SQL Server isn’t going off on one of its little, one of those little, like, thought cloud staring things. So there is that.

Now, the problem with developers is that they are often quite lazy in the database. They often think that they are far more clever than they are when it comes to things in, doing things in the database. You know, SQL Server is an expensive piece of equipment, but people treat it, it’s kind of like a garbage dump for, like, their worst code.

The recompile advice is generally good enough, but when you use it, again, just pay attention to compile times. There’s like a certain, like, you know, if it’s like, you know, a few hundred milliseconds, screw it. But like, if it’s, you know, getting up into like five, 10 seconds of compile time, it might be, you might have to think about alternate ways of doing this.

SQL Server right now doesn’t, right now doesn’t offer any great programmability or optimizer support for the types of queries I’m talking about. However, it was, it was a thing that came up in the SQL Server 2025 release notes that there is a new optimizer feature, feature called Oppo, which is the optional parameter problem orifice, or I forget what the other PO is for. The optional parameter problem or something.

So it looks like Microsoft is taking some steps to try to address this in SQL Server 2025. So, you know, sooner or later, you will probably see that in Azure SQL DB. And if Microsoft ever decides that it cares about managed instance again, you might even see it come to managed instance.

Microsoft’s track record with getting things cloud first has not been awesome lately. There are signs that there may be a return to that ethos, but who knows? It’s a bit weird.

But anyway, sometimes writing good queries does require extra typing and thinking and that’s often not what developers are famous for. You want to start throwing weird features at it like Hecaton and memory stuff. You feel like you need to partition your table when you don’t.

You want to start, you want to move to Postgres because you swear Postgres will just do better at everything. You’ve heard it’s this magical unicorn and everything’s better there. Or you want to start using NoSQL, you know, stick it all in Elasticsearch or MongoDB or whatever.

Or you decide that maybe you just need to build your own ORM. Start from scratch. It’s 18 to 24 months of work.

Far more interesting than five to 10 minutes of typing some extra characters into SSMS. So, if we have this query and like a store procedure, let’s say, this gets a little more convenient to show this here than like a dumb front end that I would make badly anyway. And we stick a recompile hint at the end of this thing.

This will do okay. Without the recompile hint, this will go really, really poorly. You might notice that the time over here at the end is six minutes and 17 seconds. That is a very long query.

A lot of that is because of a bad memory estimate where the sort does take an additional like six minutes and three seconds. Because when we get up to this filter operator, we’re at 13 seconds, which still isn’t great. Right?

We can see very clearly that we spent a lot of time in other places in the query leading up to that six minute ordeal. But the main problem is without the recompile hint, the predicates in your search, and this doesn’t matter, again, if you use is null, coalesce, whatever other clever arrangement you think you’ve found that makes the optimizer do smarter stuff. That predicate is always going to look something like this.

This is because SQL Server has to cache a plan that’s safe for any outcome of these parameters being null or not. The recompile hint gets around this by allowing for something called a parameter embedding optimization, which allows SQL Server to infer the parameter, take the parameter values and use them in the query as literals. So it doesn’t have to cache a plan that’s safe for anything.

It can just, it can just say, hey, there’s a plan for these values. This is as good as I can do, which may not always be great either. But that depends on a lot of other things like indexes and whatnot. So you should really avoid this sort of thing without the, without the benefit of the option recompile hint, because it will, it will go quite poorly.

If we do use option recompile with this store procedure, things will go generally okay. Now, the thing that I run into a lot is that with store procedures like this, there’s usually a value that people think users will always search on. And they’ll design an index or two that makes sense for those types of queries.

But then as soon as you depart from that, as soon as that one value isn’t involved, things get really, really bad. That’s especially true without the recompile hint, because you cache a plan and SQL Server reuses it and you might have used the entirely wrong index for that query. So for our purposes and for the intent of the query, let’s pretend that we think we’re always going to be searching on owner user ID.

And we know that we’re always going to be ordering by score descending. That’s exactly what our query does up here. We have an equality predicate on owner user ID and we have an order by score descending.

This does get a little bit more complicated if you have dynamic sorting allowed as well. But we’ll talk a little more about that when we get it somewhat further down. So we can design an index that looks like this, that takes care of our immediate equality search and our ordering.

Right. So we can search to whatever user IDs we care about and we have the score descending column in order based on that equality search. And then we can put any secondary search columns over here. Now I’ve done that for the date columns.

These are going to be residual predicates. You cannot see to these because the score column is in the way. Again, we’ll talk about that a little more in a moment. But since post type ID is an incredibly unselective, it’s a very dense column. I have stuck that in the includes because it doesn’t matter so much in this case.

So I already have this index created, I believe. I mean, it should anyway. Yes, I do.

Good. We got an error there. I’m smart. So like if I go and run these two queries with where I do use owner user ID, like the plans that I get from these are perfectly fine. Right.

We get two seeks. The SQL Server does okay enough cardinality estimation from these. No one is upset or hurting from this. And where this query. So like what the parameters that I used for the very slow plan up here.

This were these were actually the creation date and last activity date parameters that you see in here. These things. So that’s what I passed in to get this slow query without the option recompile with option recompile.

SQL Server makes a much better choice overall. Just chooses to scan the clustered index. Doesn’t choose to reuse a plan.

Gets close enough to, you know, okay cardinality estimate. And like, you know, this thing finishing in under a second without it without a good supporting index. Totally reasonable.

Now, where things get tricky, of course, is that you when you start writing queries like this that are unpredictable, you have no idea what set of indexes to roll out because you have no idea what are going to be the most common set of search criteria that people pass in. And it takes kind of a lot of a lot of sort of like logging and analytics to figure out what the most common search things are in like, like, like if they’re slow or not, and then how to index for them. And you can end up with a lot of different permutations of sort of a similar index definition when you when you when you go down that route.

Now, this is a pretty simple thing because we’re just hitting one table, right? We’re just we’re only hitting the post table and we have all sorts of search parameters against that. So, but, you know, if we were to think about this for a query that might touch more than one table, you might be dealing with more than one search element and things might get really tricky.

Because now you have to think about indexing multiple queries and taking into account join keys and stuff like that. There’s all sorts of things that get really, really tough and complicated with this. So, if you don’t have the ability to rather let’s say much like the $4 a month that you could use to become a member of this channel, let’s say you can’t afford to recompile this query every time.

The way to get around that is to write dynamic SQL, the safe parameterized kind that avoids SQL injection and has equivalent plan reuse functionality to any other store procedure that you would run. Because we execute the query using SP execute SQL, which is a stored procedure and you get the type of parameterization and plan reuse and all that other good stuff that you would find in other cases. But you would use that to build a sort of a custom where clause based on what parameters apply to the particular search that you’re running.

So, if we recreate this with all that in mind, Oh, jumped around on me a little bit there. These things will all still be okay.

All right. We’re going to get, you know, pretty much equivalent performance to what we saw with the recompile. We get the two seeks here with the, you know, I mean, we do reuse this plan. So, the cardinality estimate is reused, right?

We can see that there is plan reuse with this. So, this guessed one of 82 and this is 5,000 of 82. So, the guess of 82, rather that cardinality estimate of 82 rows persisted there. And that also lives on with this.

If we executed more than one variation of this, that we would reuse the estimates for that plan, right? So, no big deal here. But again, coming back to what’s difficult with these things is knowing how often they run. Query store is pretty helpful for this sort of thing.

Because, well, if you’re smart and when you write dynamic SQL, you put in a comment with the procedure name that something comes from. You can use my free store procedure, spquickiestore, to search query text for this type of token. And you can find all the queries that run and see how many executions they have, which is a very useful thing.

So, the problem with rowstore indexes is that the order that the keys are defined in defines how queries can access stuff. Now, there are clients who I’ve been working with for years and I’ve been talking about indexing for years. And they’re still unclear on the fact that if you have an index like this that leads with owner user ID, that is not the same as if you have an index where owner user ID is like the second or third or fourth or fifth key column in the index.

Like, by that point, you have lost any useful sorting for searching stuff. Multi-column indexes really are only useful for either things that search for the leading key column or things that search for the leading key column and then other stuff. Right?

Or, you know, ordering and other things. But, you know, if we’re thinking about just being where clause centric here, this is really when, like, when rowstore indexes, multi-key rowstore indexes are very useful because you can seek to owner user ID and then you can seek to score or order by score. And then you can seek to creation date or last activity date or post type ID or ID.

You could, you could seek, you can seek across all the keys, but you have to start seeking with that first key column. Otherwise, it’s just a scan of everything because there’s no helpful, it’s not ordered in a helpful way after that. So if we, if, so like, for example, like with our other query where we need to search on these two columns, we’re not accessing these two columns first via where clauses or anything.

So everything is all out of sorts for those, for those searches. That’s why we ended up scanning the clustered index. So what you generally want to do is, if you can, and there are lots of considerations for this, and this is, you know, we’ve already kind of gotten to the 20 minute mark here.

And somewhat, some of you may feel dissatisfied that I have not gone into all of the potential, you know, things that might come up by using columnstore. But it’s sort of generally creating a non-clustered columnstore index, right? And just because it doesn’t say non-clustered in here doesn’t mean it’s not.

I can’t create a clustered index on the post table. It already has a, I can’t, rather, I can’t create a clustered columnstore index on the post table, because it already has a clustered rowstore index. So it is creating a non-clustered columnstore index.

So in general, if you create a clustered columnstore index that, you know, spans the columns you care about searching in your query, things will end up a lot better, and the unpredictability doesn’t matter as much. The reason it doesn’t matter as much is because columnstore indexes don’t have that column to column dependency that rowstore indexes do.

Each one of these columns is stored in its own little, like, vertical up and down columnar index within the columnstore index. And there’s, like, lots of little mini indexes, sort of, unlike all the different segments and row groups. So you can pick and choose which columns you want to hit.

So if you want to hit owner user ID and creation date, or you want to hit score and last activity date, it doesn’t make any difference to the columnstore index what order you search for things in or anything like that, because each one of these columns has independence from any other column that might precede it in the key of the index, unlike rowstore indexes.

So you can make all of your search query, you can give all of your search queries a single, good, highly compressed, very nice place to search for data from, without having all the problems that, of creating multiple non-clustered rowstore indexes in order to try to satisfy every different permutation of a search. So the way that I love to handle this sort of thing for clients, and as long as, like, you know, the shoe fits with using columnstore in their database, you know, like, I think my biggest consideration is, like, if we’re relying on query parallelism to make things fast and you’re on standard edition, we need to really carefully test the columnstore thing because Microsoft hates people on standard edition, and it limits all of the columnstore parallelism to a DOP of 2.

You cannot exceed a DOP of 2, so if you have a query that runs really fast at DOP 8 using rowstore indexes in standard edition, and you start using a columnstore index and you’re like, wait a minute, my queries got slower, even though I’m using batch mode in the columnstore index.

Well, it’s because your query is limited to a DOP of 2, and you can’t do nothing about that. So the way that me, Erik Darling, with Darling Data, likes to handle unpredictable searches by using columnstore is typically a lot easier and more efficient than creating, like, 17 different rowstore indexes to account for everything that someone might search for.

So we’re going to wrap this one up here. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something, and I hope that you will continue watching.

I forget what the next video is about, but it probably, hopefully it won’t be as long as this one, because, you know, the tongue does get tired. And, you know, there’s a wise man who once said, the reason dogs have so many friends is because they wag their tails, not their tongues.

I guess that explains why I spend so much time alone. Anyway, thank you for watching. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Equality and Inequality Predicates

Indexing SQL Server Queries For Performance: Equality and Inequality Predicates



Thanks for watching!

Video Summary

In this video, I dive into the intricacies of indexing queries for better performance in SQL Server, focusing specifically on equality versus inequality predicates. As we explore common scenarios and problems encountered when designing indexes, you’ll learn how to optimize your queries so they can take full advantage of those indexes. The discussion covers a range of topics, from understanding why certain index designs might not perform as expected due to non-selective columns required in every query, to strategies for rewriting both the index and the query itself to improve performance. By the end of this video, you’ll gain insights that can help you avoid common pitfalls and make more informed decisions when designing indexes for your databases.

Full Transcript

Erik Darling here with Darling Data. We’re going to embark on an adventure together, you and I. We’re going to delve deep into the caverns of how to index queries for better performance. I obviously can’t cover every single scenario, but I’m going to cover many of the most common ones and some of the most common problems that I see when I’m doing my consulting work. And hopefully you will get a at least enough knowledge to be dangerous in your own work. In this first video, we’re going to talk about equality versus inequality predicates, because the way that you design your queries and the way that you design your indexes really has to kind of go hand in hand so that you make sure that your queries actually use your indexes well. And if you have queries that are not performing well, perhaps the way they’re written is the reason why you’re not taking full advantage of those wonderful indexes that you have. Perhaps, you can stop worrying about fragmentation and start worrying about real problems. Be a grand thing. But before we start talking about equality and inequality, I promise we’re not going to go off on any soapbox tangents here about the world at large. But before we do that, if you would like to become a member of this channel and support the channel and all the work that I do to bring you this wonderful content, there is a link at the very top of the video that says become a member in which you can click on that very link.

And you can give me a minimum of $4 a month towards continuing to create this content. It’s a nice thing to do. If you’re in the market for nice things to do that are a little bit less expensive, you may like, you may comment, you may subscribe. You can do all of the above, none of the above. But of course, then you’d be mean. If you just need to be a mean, cruel person. Well, maybe this isn’t the channel for you. We’re all nice people here. If you are in the market for SQL Server consulting help, well, by golly and gosh, I do quite a bit of it. All these things over here to my right. I’m in the market to do them. And if you need something aside from maybe one of these things, well, you can always talk to me and tell me what it is.

And as always, my rates are reasonable. Training. I got some. You need some. It’s about $150 for the rest of your God-given life. Again, you can use that link and use Spring Cleaning the discount code or visit the video description for a link that puts all of these things together for you.

So that you can just go right on ahead and not have to deal with copying and pasting anything. I think by the time these videos start airing, it will be 2025. It will be the beginning of the year. So I guess news about upcoming events will be forthcoming.

Won’t that be lovely for you and me? Because that means I’m getting out of the house. But with that out of the way, let’s do a little disco dance and talk about these inequality and equality predicates. Now, a little preamble on this stuff.

Pretty much since I started reading about indexes, when I started caring about SQL Server performance, the general wisdom has always seemed to have been to you should design your indexes. The key of your indexes to support the most restrictive search predicates first.

That could either be, you know, like a unique column, a unique column of values, or just one that is highly selective in some way. And that’s not terrible starting advice.

If you’re a beginner out there in the world and you see that advice, that is not advice that you need to be terribly suspicious of. You can follow that advice up to a point and probably have pretty well good enough performance for a lot of your queries.

You know, I realize that there’s going to be a lot of questions about, you know, whether you should index for your where clause first or your join clause first or whatever. And the crazy thing is that there are going to be different scenarios where indexing for, like prioritizing different parts of your query for indexing can make a big difference.

But if I were to give you some just basic starter level advice to go along with indexing the most selective columns first, I would say that you should index for your where clause first and then prioritize other parts of the query. That’s not always going to work and that’s not always going to be true because sometimes you’re not going to have a where clause, you’re only going to have a join clause.

And sometimes other parts of the query might come into play as to how you want to design your indexes. Things like, you know, windowing functions, group by, order by, things, you know, columns you’re selecting, very specific restrictive filters on things.

All sorts of things come into play when you’re designing indexes, but we’re going to talk through a bunch of those during the course of the series, so I’m not going to talk too much about those here. One problem that I see pretty consistently is that a lot of databases end up designed with some very non-selective columns that are required for just about every query.

Think about something like soft deletes where most rows are not deleted. Most rows are in play. You might have a status column that has a variety of statuses in it that is still not very selective, but, you know, has outcomes in it that you need to, that most of your queries are going to be filtering specifically for.

Now, leaving filtered indexes specifically out for the moment, a lot of the columns that I see, rather, a lot of the indexes that I see, have those types of columns in them.

I don’t want to say included in them because I don’t want you to think that we’re just talking about included columns because those columns end up in various places in an index depending on how the index is laid out overall.

If it is a filtered index, then they usually end up in the include columns because they’re in the where clause of the index. And having them in the includes gives you a little bit more, gives you a little bit more gusto than leaving them out entirely.

Of course, putting them in the key is fine too, but usually includes is good enough. But, like, for all of these required type columns, whether it’s, you know, you know, like if you have a multi-tenant type database and you have, like, your organization ID or you have, like, the soft delete thing or the status thing that you need to look up, there are all sorts of columns along those lines which are not very selective, which you do need in the index to make your queries fast because you need your data organized by these things.

Some of the index design patterns that stem from that are rather unhealthy. Now, in this post, what I want to look at is both an index change and a query change that can help you out when you have, like, let’s say you have an index that maybe wasn’t designed great, but your query can be rewritten to take better advantage of that index or how to rewrite the index so that whatever way your query is written, you can do pretty well with it.

So I’ve already created two indexes and a constraint on the post table. Sorry. I’ve created two indexes, one on the badges table on name and then user ID, and this index is plenty good enough, and then one on the post table on post type ID and then owner user ID.

Post type ID is taking the place of sort of like a status column because, you know, there are about like eight different, sorry, there’s like eight different post types in the table, so they’re going to, and like some of them are much more selective than others, and the problem is that, like, for some types of posts, an index that leads with post type ID would be very selective, and for other types of posts, an index that leads with post type ID is going to be not very selective at all.

And then I also have a constraint on the post table telling SQL Server that every post type ID that exists in that column is greater than zero and less than nine. All right.

These are hard limits that we have set on the post table. So, in this case, the post type ID column, like I said, it’s going to play the role of our non-selective column that every query is required to have filtered in there.

Granted, that’s not going to be the case with a lot of Stack Overflow queries, but, you know, just go along with me here. Let’s, like, just pretend that all the queries need to find a particular type of thing, but what else you might be searching for depends on what else the module that your users are executing is going to run.

Now, I have this query, and this query is not going to be very fast. I’m just going to walk you through what it does real quick. Let’s start down here a little bit.

So, we are selecting from the badges table, and the reason why we have the index that we have on the badges table is to grab, well, it partially helps with this, but really this part of the query is mostly assisted by the ID column on the users table being the clustered primary key. So, seeking to various user IDs is totally fine and easy.

The reason why we have the index that we have on the badges table is because we start filtering the badges table with an initial, I mean, I know it says in popular question. I guess before there were some more things in there, and I narrowed it down to this to make the demo, like, work in a reasonable amount of time, but this is essentially an equality predicate on popular question.

And then for the post table, this is what we’re doing. And if you remember the index that I created, it was on post type ID and then owner user ID, and then it had score in the includes, which is obviously not going to be a completely ideal index for this situation, but the thing to pay attention to really is what’s going on in the where clause here.

We can just leave this out of the equation for the time being. Now, if we look at the plan that we get for this query, you’ll notice it doesn’t return any results, and I purposely have this not return any results because I want you to see the raw execution time of the query without returning any rows to SQL Server Management Studio.

That’s why I filtered to where that row number column equals zero down here. I want nothing to come back from this. Now, looking, like, more closely at the query plan, like I said before, getting to the user’s table is trivial.

It takes about eight milliseconds of time, but you’ll see that we spend about 32 and a half seconds executing the query. The majority of the time, if you’ll allow me to drag this over, is in this section of the query plan. And in this section of the query plan, we start with a scan on the post table.

And the reason why we start with a scan on the post table is because we don’t have SQL with the inequality predicate, the way we’ve written this, we can’t seek into that index that leads with post type ID. SQL Server ends up building an eager index spool based on this.

Or rather, we could seek into it. SQL Server just doesn’t. SQL Server just says, like, I don’t know. And this is even with that constraint on the table, showing where SQL Server knows that any value less than three is going to be one or two. SQL Server could have converted that to equals one or equals two, but it doesn’t.

So we end up building an index at runtime. And this takes about 21 seconds because there’s about four seconds spent scanning the index. So the majority of the execution time really is in here.

Okay. Now, like, it would be nice if SQL Server did do this, but, like, did do things a little bit more naturally. And, like I said, it could choose to do this, but it doesn’t.

If we look at the estimated plan for this with the 4C Kent on there, this looks a lot friendlier, right? This looks like a lot better of a situation, but it’s not. If I start this running, we are going to be waiting here for a very, very long time.

I have, in the past, attempted to let this thing run to completion, but it runs for about an hour before I give up on it. So this thing will just run and run and run and run and run. This actually doesn’t do any better with the 4C Kent.

Sometimes it is useful to apply various hints to your queries to see why SQL Server doesn’t choose plans. You know, I find things very interesting to do are 4C Kent and option loop join at the end of your query because that will really show you indexing issues with whatever query you’re running. Now, this thing has been running for about 45 seconds, and we haven’t gotten anything back from it.

The main problem with this query is if you look at the index seek, we are doing this. And actually, let me expand this out as well just so it’s a little bit more clear in here. Range columns.

Range columns. So for every time we go into the POST table, we scan for everything that’s less than 3. And the trouble with that in our case is that everything less than 3 is the majority of the table. So every time we go and seek into that index, we end up with a big, big problem.

Having to read through about 17 million rows and then doing a residual predicate on the owner user ID. If you look at the, look at this, we actually have a, do an index seek. Oh, wait, that’s the wrong one.

That’s the user’s table. I’m sorry. Highlighted the wrong thing. I was like, wait a minute. That doesn’t make sense. So we have a seek predicate on a POST type ID less than 3 and then a residual predicate on owner user ID. This is a bad time because every single time we go into this and seek, we seek through 17 million rows and apply a residual predicate on owner user ID.

That is not a very good scenario. If we were searching on a very limited set of POST type IDs, that would be maybe okay. But the way that this thing is going, it’s very much not okay.

So the 4C does show us that this is a bad plan idea. So when we, like I was saying, if you look at, let me start this running. And then we’re going to talk about the distribution of records for POST type ID in the POST table.

I actually have to put that in a new window probably because I’m building an index there. So like I was saying, we have to go through about 11 million records to find anything with a POST type ID of less than 3. 3 is down here with 167 records.

But the POST type IDs that we care about that are less than 3, that’s 1 and 2. SQL Server doesn’t. SQL Server would have to scan through 17 million of those.

Now, what I was saying before, which I think I mumbled up a little bit, was that SQL Server could convert this logically because there’s a constraint on the table that says, like, you know, all of the values in here are greater than 0 and less than 9. SQL Server could logically translate less than 3 with everything that it knows about that data to equals 1 or equals 2.

But it doesn’t do that on its own. Right? It just doesn’t.

It can’t unfurl that logic on us. So I’m creating an alternate index here on owner user ID and then post type ID. And what I want to show you first is that if we get the estimated plan for this, this thing should start using the new index that we created, which is not badges underscore x.

And if we do that, this will actually come back pretty quickly. At least it did at one point. Who knows how long it’ll take now.

But there we go. So that was about 8 seconds. And we can see that this is much improved. All right? This is no longer building an index. This is no longer, what do you call it, building an eager index boolean is no longer taking 30-something seconds to execute this whole part of the loop.

And, you know, having the reverse index where we use the other one, that didn’t cause us, like, this actually does better. Now, if we wanted to write this query in a way that would have it turn out well for our indexes with our original index, what we could do is actually let me tweak this a little bit to say index equals not badges on this.

And if we look at the query plan for this, it is pretty okay. There is still no eager index boole being built in the estimated plan. And because we have separated out our search logic a bit, and we’ve said select this stuff where post type ID equals 1, and select this stuff where post type ID equals 2, we’ve done the optimizer’s job of unrolling less than 3 with the, you know, because, again, with the constraint being greater than 0, less than 9, less than 3 can only mean 1 or 2, or 1 and 2, depending on how you look at it.

But if we run the query and we tell SQL Server to use our original index, not badges, we still do fine with this because SQL Server does two pretty quick seeks, and this ends up taking roundabout the same amount of time as the query with the unchanged less than 3 predicate and the new index that we created that leads with owner user ID.

So with that index, we’re able to seek to each owner user ID and then filter out the post type IDs. Owner user ID is a far less dense column. It’s very, very unique amongst the 17 plus million rows in the post table.

So the highest count for a single user in this version of the database is around 28,000 rows, and seeking to span a range of 28,000 rows for a single user ID and applying a predicate after that is pretty trivial as far as seeks go.

And then another thing that we could do is we could convert the where clause to say where post type ID is in 1, 2, and we could run this, and this should also take roundabout eight seconds. The query plan is slightly different for this one.

6, 7, 8, there we go. Okay, it was a little under 8 seconds, 7.8. Maybe the cache was a little bit better or something on that one. But rather than having two distinct seeks like we had with the query 1, union all, query 2 thing, we just have one seek into this.

And if we look at the seek predicate, we will have actually two seeks in here. One of them will be for post type ID 1, and the other will be for post type ID 2. It’s a little hard to get this with ZoomIt.

Let me see if I can… Yeah. So there’s the first seek for post type ID 1. And then if we expand this one out, we will see the second seek for post type ID 2.

So you can, depending on how your indexes are arranged, you can sometimes change the way that where clauses are written to take better advantage of indexes as they exist now. If you are allowed to change indexes, sometimes that like, but not change queries, like if you’re working with a third party vendor app, and they’re like, no, you can’t change the queries, but you can do stuff with indexes if you’re smart.

One thing that you could do is change the order of some key columns, depending on how the queries are written, in order to have those indexes get used more intelligently by those queries. So there is that.

Anyway, that’s a little foray into inequality and equality predicates in queries, and how those tend to align with the way that you’re in the index key column order, and the way that you design your indexes for those queries to be able to get seeked into.

Sometimes it does work. Sometimes you can write queries so that having a less selective column first works out okay. Other times you have to either rewrite the query or reshape the index so that the intent of your query is better serviced by the key columns in your index.

So with that out of the way, I hope you enjoyed yourselves. I hope you learned something, and I will see you in the next video, which will be about, oh, what’s number three over here? Unpredictable search queries.

So don’t read this yet, because that’s what I’m talking about next. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About OUTPUT

How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About OUTPUT



Thanks for watching!

Video Summary

In this video, I dive into the often-overlooked `OUTPUT` clause in SQL Server queries, explaining its basic functionality and more advanced uses. While the `OUTPUT` clause might seem mundane at first glance, it offers powerful capabilities for simplifying archival processes and maintaining data integrity during merges. I demonstrate how to use `OUTPUT` with both regular DML statements and merge operations, showcasing its ability to track changes in inserted and deleted rows, as well as access values from other tables involved in the modification query. By the end of this video, you’ll understand why it’s a valuable tool for optimizing your SQL Server queries and managing large datasets efficiently.

Full Transcript

Erik Darling here with Darling Data. Actually recording my final video for the evening, which I’m excited about because I can’t wait to take a shower. I know it seems weird, but once you get on these video kicks, you just got to keep going until you’re exhausted either yourself or your content. This is going to be the final video about how to write a video. SQL Server queries correctly. In this video, we’re going to talk about the output clause. Just like CTE, there is very little that you need to actually know about output. If anyone ever tries to say that they have information worth selling about output, they are full of it and once again, should throw something heavy at them. They’re not good people. In fact, they’re terrible people. They’re conning you out of money. This is free information that you should be able to find anywhere because there’s not a whole lot that is very in-depth or very technical about it. If you like this content and you want to support this channel, you can actually spend money on it. I guess I’m kind of reversing course there a little bit. I actually don’t want money for this video. Watch the other videos and if you decide you’re like, wow, this is a great channel, then you can give me four bucks a month by clicking the link in the video description that says, become a member of the channel.

And then you’re a member and then you’re a member and I get $3.10. If you don’t have the four bucks, if you spent it all on the ponies or a pony, you can like, you can comment, you can subscribe and that won’t buy either of us a pony, but it’ll feel good. If you need help with SQL Server, if it’s slow, if it’s annoying, if it’s upsetting you in some way, I am available for hire. You can pay me money. In exchange for that money, I will fix your SQL Server problems. And as always, my rates are reasonable. Speaking of reasonable, reasonable man, the face of a reasonable man here. You can get all of my training for 150 US dollars for the rest of your life. That’s the URL, that’s the coupon code.

These are also helpfully glued together for you in the video description so that you don’t have to do much thinking because you’re probably tired of thinking. I will be out and about in 2025. I don’t know where yet, but when I get there, I’ll tell you. With that out of the way, let’s talk about the laughably mundane output.

Now, one of the most interesting things you can do with the output clause is use it to simplify archival processes. That’s going to be the first demo that I show you. And that’s really, if you stop watching there, I wouldn’t blame you. The rest of it, yeah.

Output isn’t very interesting, but maybe it will be to someone. So, the first thing is you can access the inserted and deleted tables to see modified values. Wow. So, you write an insert or an update or a delete. You can throw the output clause on there.

And you can see what changed. Sort of a helpful thing. If it’s a lot of rows, be careful.

But, you know, when there’s a lot of rows, you should be careful anyway. You don’t need to select star from that 24 billion row table. You’ll probably have a bad time.

Output gets a lot more interesting when you use it with merge. But, of course, then you have to use merge. And if you never write a merge statement, your life will be complete. I’m telling you. You’re not going to miss out on much.

And the third thing about output is you need to be careful what you output data to. If you output to a client, whether it’s SQL Server Management Studio or whatever client you write that accesses SQL Server, you will force the query.

Or rather, the more clear way of putting it is you will disallow a parallel execution plan. The same thing will happen if you use output into a table variable. For very small inserts, updates, and deletes, you probably won’t notice this.

But if you’re using output for something like I’m showing you, you may notice it. Because for archival processes, you are usually getting rid of modifying a lot more rows. Outputting to a temp table or a normal user table does not have that problem.

Only outputting to a client and to table variables. So this is primarily how I use output. And I use it to make archival processes all one statement.

Usually with archival processes, you’re like, Oh, I don’t know, I’m going to update these rows to mark them as deleted. Or I’m going to delete these rows.

And then I’m going to insert these rows into the archive table and then go back and delete them. With output, you can do that all in one go. Because you can nest a delete statement and put that into a derived table. Note I’m not using a CTE here.

Just despite you, CTE lover, 11357. So you can put, you can nest the data modification like delete. You can give it the output clause.

Look at these wonderful rectangles I’m drawing for you. There’s some symmetry in there, baby. And then you can select from that derived table. And you can insert, you can use that to drive an insert into an archive table.

So if let’s say we do this, we’re already in the right database. And I’m going to put this into a transaction so that I can roll it back immediately. And I’m also going to turn on query plans because they’re going to be of some interest to us.

And we do this. Notice in our execution plan, we have a clustered index delete and then a clustered index insert. This is going to the votes table and this is going to the votes archive table.

And if we had, of course, if we had supporting indexes, this would be a lot faster. But you know, whatever. Not really the biggest deal here. And then I have a couple of validation queries where we make sure that the everything that we cared about was deleted from votes and everything that we cared about was deleted into the votes archive table.

And that is the case when we finish the query. These are all gone from the votes table and they only exist in the votes archive table. So a very, very handy way of simplifying archival processes when you’re finally ready to purge some data out of those massively oversized tables of yours.

Now, this is originally from, partially anyway, from a blog post that Aaron Bertrand wrote that I’m stealing the table definition from because it was easy. But thanks, Aaron. You’re a sport. You’re a curling pro. And so what I’m going to do is just for the sake of making it easy to do stuff, I’m going to use a transaction here.

I’m going to show you what’s in the table called my table and then I’m going to do a merge into my table. And the primary thing that I want to show you is that when you use output with merge, you have access to two things that you don’t have access to when you use output with a regular insert or update or delete. One is this special dollar sign column called action. I’m actually, this is probably a fancy word for that that I don’t know.

And the second thing is that you can access data from other tables involved in the modification query to see what values are in those. This can be very handy if you want to do like a delete and then an insert and you want to maintain certain values from the table you’re deleting from in the other table. It’s pretty handy for some stuff like that.

So what I’m going to do is run this whole thing. And really the results of this are what we care about. We don’t care so much about performance or anything. But when we start off, when we started off, our table, my table had the IDs one and four in it.

And when we ended, we had one, two and three in it. And then this was the output, the result of the output clause. And so we have an update where we deleted, where we updated ID one.

We have an insert for ID two and insert for ID three because we only had one and four before. We didn’t have a match for four. So we deleted row four.

Now what’s neat about this is this, that word column. Why did you change colors on me? Let’s correct that. Let’s make sure that’s the right color.

We only use pink in this house. This source dot word is from this. This was not anything that was in the table that we merged into. Right?

These values were not in there. They were only in here, but we could reference those with merge. So there are neat things you can do with output and merge that you can’t do with regular DML statements. So that’s the second thing that you should know about output clauses.

The third, and this is not, I don’t actually have to run any of these. I just have to show you the estimated plans for them. But if I make the target of my output query a temp table like this, we will be able to use at least a partially parallel execution plan here.

Right? There’s nothing stopping us from being able to read from a table in parallel. Of course, modifications like inserts and updates and deletes are always going to be part of a single threaded portion of the query plan.

Right? Those are not allowed to go parallel. Inserts are sometimes allowed to go parallel depending on the right conditions. But in this case, they are not.

And then over here, what I’m going to show you is the same thing, except now going to a table variable. And we’re going to output into that table variable. And this is going to show you the estimated plan again.

And this time we do not have a parallel read portion over here. And if we go to the properties, we’re going to see there, non-parallel plan reason, table variable transactions do not support parallel nested transaction. I don’t know why transaction is, transactions is plural here and singular here.

You will have to ask the summer intern at Microsoft who does all of the SQL Server development these days. The same thing will happen if you use delete and you output to no target except the client. That’s, that’s this, that’s in our case, that’s SSMS.

If we get the estimated plan for this, there will also be no parallel read portion here. And if we go to the properties, we will see a different non-parallel plan reason. This time, DML query returns output to client.

So that one is very clear about what’s, about why it cannot go parallel. And with that, that is literally everything you need to know about the output clause. Can simplify archival processes.

It’s cooler with merge because you can access a column called action, a pseudo column called action, which is probably what it’s actually called, that will tell you if it was an insert and update or delete. And you can reference columns from the source of the merge, not just the target of the merge.

That’s something you can’t do with regular insert or update or delete queries. The third is that you should be very careful what target you use with output. Because depending on how much data you are modifying, you might really want some portion, the read portion of your query plan to happen with a parallel plan.

It could slow down quite a bit if you lose that. So anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I am now going to go bathe and enjoy the company of my family until I go to sleep.

So with that, I’m also wrapping up the how to write query correctly series. It’s a playlist. You can watch them all. You should watch them all, especially if you’re having trouble writing queries correctly. You just might learn a thing or two or like 16, depending on where you’re at in life.

All right. Cool. We did it. Thanks for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About CTEs

How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About CTEs



Thanks for watching!

Video Summary

In this video, I dive into the world of Common Table Expressions (CTE) in SQL Server, providing a comprehensive yet critical analysis of their use and limitations. Erik Darling from Darling Data shares his extensive experience and expertise to show that while CTEs are useful for certain scenarios, they often come with significant drawbacks, such as redundant queries and performance issues due to single-threaded recursion. He emphasizes the importance of good formatting and comments over the supposed readability benefits of CTEs, encouraging viewers to focus on more effective query writing techniques like APPLY operators for simpler tasks.

Full Transcript

Erik Darling here with Darling Data, and in this video we’re going to continue on with our How to Write Queries Correctly series. And in this one I’m going to tell you, I’m going to teach you everything you need to know, literally everything you need to know about CTE for free. If anyone ever tries to convince you that they have something worth selling you, for you to learn about CTE, laugh in their face, throw something heavy at them. This is it. This is all you need to know about CTE.

Alright? So, before we do that, if you would like to spend $4 a month on this channel, you can. There’s a link in the video description where you can click that says, become a member, and then you become a member. And I get $3.10 a month after taxes. If you are just, no way, $4.00, you can like, you can comment, you can subscribe.

All very valid ways to show your undying love for me. Consulting! I do it. A lot of it. And according to my wife I need to do more of it, because it keeps me in my office. So, if you want to make my wife happy, you can hire me to do any one of these things, and more, with your SQL Server or servers.

And as always, my rates are reasonable. If you would like to get some very high quality, very low cost SQL Server training, none of it about CTE, at least, you know, not in a very elementary way, because I would never charge you for that. Save the good stuff for training.

You can get all of mine for $150 US dollars for the rest of your life. It’s not quite free, but if you live a long time, it’s close to it. There is also a link for that stuff in the video description.

You know, 2025, I’ll be on the road again for now. Give me a break. So, with that out of the way, let’s talk about the laughability of CTE. So, one, they’re just inline views, and you need them for some types of queries.

Unlike derived tables, you can reference them multiple times in a query. But this is where they start to cause problems. You can write modifications against them, which is mostly useful for doing ordered deletes.

For some reason, you can’t put a top with an order by and a delete. It’s very strange. Sorry, a delete with an order by and a top, a delete, and an order by. It’s weird. You can write recursive ones, but it’s really annoying.

And if you find yourself having to write recursive CTE a lot, you should probably consider a different data model. You can often do things a lot more cleanly with apply than you can with CTE. CTE require you to stack things one above the other and keep doing things.

With apply, you can just have a nice, neat query where you just have stuff very nice and streamlined and inlined without having multiple steps that someone has to read through and remember the column names and the CTE names. It’s all very aggravating.

So, like I said, the big gap that CTE were meant to fill in in SQL Server is with derived tables, you might notice some red squiggles here. That’s because we can’t talk to the derived expression x more than once.

Right? We just can’t do that. It just doesn’t work for us. Even if I name that correctly.

There we go. Even if I name that correctly, we still get red squiggles. And if I try to run this, it’ll say invalid object name. Next one, who are you? I don’t know.

But we can do that with CTE. The problem, as always, with CTE is every time you re-reference them, SQL Server runs the query in them again. If you look at this, we touch the user’s table not once, but twice.

And if we quote this in and we add a third join to our correctly named CTE, we will now see that we touch the user’s table three times. A one, a two, a three.

So, like I said, the thing that, like, Microsoft was like, cool, we have CTE now. Now you can reference them multiple times. No way to materialize them.

Other database engines give you that. Microsoft, as of this recording, does not. Maybe that’s a secret thing in SQL Server 2025. I don’t know. Couldn’t tell you.

I don’t know how much work goes into CTE. Microsoft is clearly busy with very important things involving fabric and AI that will go absolutely nowhere. So, we have that to look forward to.

Thanks. So, this is what I was talking about with deletes. You cannot do this, right? You can’t have that in there. But we can have that in here.

If we add an order by, u.id, this is completely valid where this is not, right? This will say, bleh, no. No ordering.

No ordering for u. This thing, though, this will give us a nice ordered delete, and we don’t have to worry about anything in there. There’s no sort in this because I’m ordering by the id column. I think when I was messing with this before, I think I deleted the order by by accident.

But if we were to order by a column that is not supported by an index, you would see it gets sorted there. When we use id, that is the clustered primary key of the users table. So, it’s already ordered for us.

So, we can delete things nice and orderly this way. Sometimes, you need a CTE. You could also do this with a derived table. My style guide says it doesn’t matter because for a query this simple, it’s just not that big of a deal, right?

You generate a row number, and you want to filter out to where that row number equals something. There’s no way in SQL Server to do that all in one step, right? You can’t do that all in one go for some reason.

And SQL Server, Microsoft makes you write a two-step query and generate everything. And then at the very end of that query, filter out everything in there, right? So, that’s a lot of fun, right?

Sometimes, you need to use them because Microsoft won’t implement useful things that make your life easier. You get big data clusters, synapse, managed instance. Why would anyone want these things?

They’re garbage. They’re stupid things. And, of course, you can write recursive CTE using CTE. Of course, you couldn’t write a recursive CTE without a CTE because there’s no just plain recursive.

You need that. But, you know, I did go a little bit more into performance details on this in the other video in this series about CTE, which you can find if you look at the playlist.

It’s all in there. But you start off with this part. And this is where you write the anchor portion of your CTE. And this is where you write the recursive portion of your CTE. And if you run this query, well, actually, let’s look at the query plan.

I forget. I don’t think I have indexes that make this good. So this would run for a long time. One of the big problems of CTE is that, recursive CTE, rather, is that the recursive part of the query is forced to run single-threaded.

And it can be very, very slow if you don’t have good indexes in place to support it. And another thing is that usually you’re going to see a lot of nested loops inside of your recursive CTE. And if you don’t have good indexes, you’re going to see eager index spools inside of here.

And that is a surefire sign that they are going to be terribly, terribly slow. So not a lot to say about the recursive CTE. Again, if this is something you need to do frequently, you probably need a better data model.

Now, for anyone who says they make queries more readable, again, for like the 10 billionth time, they don’t. They don’t make them more readable. They don’t make them more understandable.

What makes them, what makes queries readable is good formatting. And what makes queries understandable is good commenting. Writing a CTE does absolutely nothing to make a query either one of those things. It does not help.

There is still no explanation for what you did. There is still just wild formatting where you, for some reason, it’s like select and then like columns and then from is on the same line. And then join is on the same line, but on is down.

And like just format your queries and people will find them readable and make comments on your queries and people will find them understandable. It’s a fantastic thing. It’s a wonderful thing to do.

Otherwise, you’re just writing gobbledygook. So one thing that some people will do with CTE is stuff that they could do with apply very easily. Like, let’s say this is something that Itzik talks about sometimes in his training is that like just people who want to do simple things.

Like just get the year from a date or date time or whatever column. Date, date time to take time 1 million 17th. Yeah, fire.

Big city living folks, big city living. And then let’s say you wanted to get the following year, you could write another CTE. And then you could get all this stuff.

And then, I mean, granted, this is just sort of a goofy example, but it gets to the mentality of people who use CTE for just everything. Because they have it in their goofy heads that they think it’s going to make the query readable and understandable for everyone. Where you could just do something very simple like this and just cross apply some values to that year.

And then you have that expression available to you in the select list and your query becomes a lot more compact and you don’t need 17 different steps to express yourself or figure out or write your logic out. You can do things very easily and more compactly when you use better forms of query writing. So this is literally everything you need to know about CTE.

There is nothing magical or mystical about them. There is nothing fantastic about them. Perhaps someday materialization will come to SQL Server, but we don’t have that now. Fabric did just get Azure SQL database.

Wow. Hold your applause. No, I mean it. Hold your applause.

Anyway, thank you for watching. I hope you enjoyed yourselves. Honestly, I hope you learned nothing. Because this is all very generic information about CTE, but it is all the information you need about CTE. Alright.

I will see you in the next video about the output clause, which will be literally everything you need to know about the output clause. Alright. Have a great day. It’s pretty salty.

It’s clear!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

How To Write SQL Server Queries Correctly: Case Expressions

How To Write SQL Server Queries Correctly: Case Expressions



Thanks for watching!

Video Summary

In this video, I dive into the intricacies of case expressions in SQL Server queries, explaining why they can be problematic for performance tuning and optimization. I share insights from my experience as a database professional and highlight how seemingly innocent use of case expressions can lead to complex query plans and significant execution times. By walking through examples and referencing Aaron Bertrand’s blog post on the “Dirty Secrets of the Case Expression,” I aim to help you understand why it’s crucial to be mindful of where and when you use these constructs in your queries, especially in joins and where clauses.

Full Transcript

Erik Darling here with the old Darling Data. Having a lot of fun. A lot of fun. I don’t know how anyone doesn’t have fun with databases. They’re just the most amusing things one could possibly conceive of. Barrel of laughs. Today’s video, we’re going to get ever closer to the close of our series about how to write queries correctly. And in this one, we’re going to cover case expressions. Not case statements. Case is not a statement. Case is an expression. You’re just going to have to deal with that fact. Or else, I don’t know, I’ll drop a bag of hammers on you. If you’re that gung-ho about calling case a statement, it might have no effect on your skull. It’s probably rock solid anyway. If you like this content, and you’re like, wow, where would I be without this content? Well, for four bucks a month, you can support this content. And I can, you know, carry the two. I’ll make about about $3.10 after taxes on that.

Give or take. So, that’d be nice. If you don’t have $4 to turn into the alchemy of taxation to hand over to me, you can like, you can comment, you can subscribe. And there’s no tax on that. That is free love. If you are in need of SQL Server Consulting. If the things that I say in these videos resonates with you, and you think that perhaps my wit and wisdom would be of some benefit to the SQL Servers that you care and feed, well, you can hire me to do all sorts of things. And as always, my rates are reasonable. I also have training. And you could have that training for about $150 for the rest of your life. Not a subscription. You can go to that link and plug in that coupon code. And through, again, magical transformations. Actually, reverse taxation. You actually get money off of things. I guess that’s a tax on me.

Boy, this is a rough mental awakening for me. Me giving you discounts is a tax on me. That’s messed up. But you can get all of my training for, yeah, that money, that URL, that code. It’s amazing. No upcoming events because end of year, family, things like that. I’ll be around in 2025. With all that out of the way, let’s talk about these here case expressions. Now, one thing that you should know off the bat is that green screens later in the day are hard. That’s why I have strange things happening behind this armpit. Do not look behind the armpit.

Is that some things internally, even though they appear to be built-in functions that SQL Server has, the things that turn pink when you type them. Some of those things are just wrappers for case expressions. Translate is not one of them, though. When I was first looking into this, I half expected translate to be a case expression, but it is not. If we look at the query plan and we look at the compute scalar, the defined thing in here will not say anything about a case expression.

But if we dig into some of these other things, like if, and we look at the execution plan here, and we look at the compute scalar, we will see that internally, if, is just doing a case expression. You can see where that starts right here. So if is just a case expression. This is one of the reasons why when I start having to warn people about using functions and where clauses and joins, I also need to warn them about using case and where clauses and joins, because guess what?

They both have the same performance problems. It’s amazing. It doesn’t matter if you explicitly write case. Some of the functions that you might write will use case anyway. Behind the scenes, they’ll just magically crop up there for you. Another one of them is coalesce. If we do this, and we look at the execution plan, and we look at the predicate that gets applied to the table, that is a case expression too.

Case when is not null, then, blah, blah, blah, end. See? Coalesce. Inside, just a case expression. Choose, similarly, is a case expression. If we look at the query plan here, and we will look at the compute scalar, we will once again see the defined values show us a case expression.

So, if you can, because you’re the one writing queries, you should probably not put things like if, and coalesce, and choose, and your where clause, because they are case expressions, and they will have performance problems. Other built-in functions, like is null, and replace, and stuff like that, they are not case expressions inside, but they have very similar performance problems.

All of a sudden, using indexes becomes difficult for a SQL Server. Aaron Bertrand, who, I don’t know, if you ask me, I don’t talk to him enough. If you ask him, I talk to him too much.

So, there’s two sides to every coin. Either, you know, I don’t know what Canadian coins are called. But, these, he has a blog post anyway, called, The Dirty Secrets of the Case Expression.

Thank you, Aaron, for not calling it a case statement. And this is an example that he has in his blog post. There are more if you use the search engine of your choosing. I don’t care what it is.

And you search for Aaron Bertrand, Dirty Secrets of the Case Expression. You will find his post. This is the most amusing one that I think exists on there. Is this, when you declare a variable called i, and you say, case when i equals one, then one, else the minimum of one divided by zero.

And you get a divide by zero error, even though this part never should have been explored, right? Because this equals one. If you rewrite that a little bit, and you say, else select min, that will work.

Case figures that one out. Case doesn’t go that far along in that. But it’s an interesting thing, where case does not always short circuit where and when you think it might. All sorts of things can happen in there.

Now, let’s talk about performance, though. Because that’s what we care the most about. We are, for the most part, performance tuners. We’re pretty good at it at this point, I think.

At least I hope, anyway. That’s what we get paid for. I see a lot of queries with a lot of problems. And a lot of them look something like this.

And notice that there’s a join with a case expression here. All right, so there’s a, we’re going to join depending on if this, right? Remember, so like if one equals, ah, gosh darn it.

If one equals case when this, then one. And when this condition, then one. But if we don’t meet those conditions, then zero. And then another thing where one equals case when these things are, these conditions are true than one.

Else zero end. And I’m not actually going to run this query now. Because we’re already around the eight minute mark.

And this query takes a full minute and one second to execute. The query plan is not terribly interesting. Just note that everything is fine until we try to apply that expression right here. All right, a whole bunch of bad things get involved at the join.

And this portion of the plan eats up the majority of the time. It would be nice if I could draw like different shapes with ZoomIt. But I guess making two rectangles isn’t going to, you know, isn’t that much of a challenge for me.

It’s not that difficult. It would just be nice if I could like draw better in general. If I could draw better, I would probably have a different career, honestly.

But this query plan is, of course, a disaster. And you should not emulate this kind of query. This query is no good.

For the sake of those who may, I don’t know, be watching and not listening, maybe you take visual cues better than you take verbal cues. We are going to say no.

No. No. If I was better at art, I would make this a skull. But I’m not very good at art.

So, er, er, er. And we’ll get some things here that hopefully don’t look too much like other things. Eh.

We’re getting good at this. We’re getting better at this. Every day. Every day. We want to avoid this. This is bad. Okay. This is not what you want to do with your queries. This is frowny face.

Bad for performance. But, but. Okay. If all this is not enough clue, I don’t know what to do for you. We are, we are, I’m out of ideas.

So, one way that you could rewrite this query would be something like this. You don’t have to do this. But this, to me, this just looks kind of nice. We’re going to put the query, the predicates that we care about for the users table in here.

Right? Because that’s good. We care about that. And we’re going to cross apply to a union all. And if you are, you’ve been following this series, separating out queries like this.

Very common. Very useful query pattern. Even if it doesn’t solve the precise performance problem that you’re after, it can certainly illuminate you as to why you are having a performance problem with one or both of these queries or sets of, the set of predicates in here.

Perhaps it’s an indexing issue or something else that’s just driving things amok. But when we run this query, this thing will let me highlight things. This returns instantly.

Right? We no longer have to wait a very long time. We no longer have a giant parallel plan with a spool in it and join predicates figured out at the, or rather, predicates figured out at the join clause.

We have an apply nested loops plan. Again, we can tell that the apply nested loops kicked in because we have this outer references thing in the nested loops join, which means that our join predicates were sent out as seek predicates to here and here.

So, I mean, obviously that’s the, somewhat obviously, because it’s the same index name. We hit the same index twice and seek into it and all is well. And we get the, back the result that we care about immediately. So, when you’re writing queries, putting things like case expressions and joins and where clause may seem like a convenient vehicle for you to express the logic that you care about, but it is not, often not a very convenient vehicle for SQL Server to optimize.

Again, this is really, when you’re writing queries where performance matters, which you should write every query like performance matters because someday it might, even if it doesn’t today, someday performance might be of critical essence, critical importance to this query or to whoever’s running this query.

And you don’t want it to be slow. And you don’t want like one of those comments up at the top where it has your full name and a date and like all that stuff in there. And where you’re like, you know, you make notes about things that you swear work, but don’t actually work.

So, these may seem very convenient for you, but these are very bad for SQL Server. Much better ways exist to write queries that SQL Server can interpret into a much better, much faster execution plan for you.

So, one thing that I talked about at the, well, not at the beginning of this post, obviously, because this is, this is the culmination of several things is handling, like using or. So, the case, this video on case expressions really ties in with the, the two videos on where, where, where clauses with or in them and join clauses with or in them.

Uh, they have very similar performance implications and sometimes even query plan implications. So, one thing that I haven’t covered in these that I’m going to cover in a longer format about dynamic SQL is how to handle cases where you want to, um, have optional parameters passed in, uh, to a store procedure.

Sometimes they are null. Sometimes they are used and there are different ways to sort of deal with that. Um, but please, for, at least for the time being, keep case expressions out of your joins and your where clauses and, uh, be, be careful, be very diligent in how you write case expressions, even in your select list, because they may not shortcut when you expect them to.

SQL Server may do some additional exploring in those case expressions that you are not aware of until something weird happens. So, be careful there.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video on, uh, CTE. I know that I already did one in this series, but, um, the, the next two videos on common table expressions and output are going to be literally everything you need to know about CTE and output.

Do not buy anything from anyone purporting to teach you about case expression, about common table expressions and output. There is nothing worth paying for to learn about them. Okay?

Spend no money on these things. They’re not, they’re not worth it. Anyway, I will see you in those videos. Au revoir.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.