Indexing SQL Server Queries For Performance: Indexed Views And Filtered Indexes

Indexing SQL Server Queries For Performance: Indexed Views And Filtered Indexes



Thanks for watching!

Video Summary

In this video, I dive into the intricacies of using index views in SQL Server as a workaround for limitations with computed columns and filtered indexes. I explore how these three features—computed columns, filtered indexes, and indexed views—should ideally work together but often don’t due to restrictions imposed by SQL Server. By creating an indexed view with a computed column that mimics the behavior of a filtered index, we can optimize queries without running into errors or suboptimal performance. I also discuss how this approach can be particularly useful on standard edition where batch mode limitations make it challenging to achieve optimal query performance.

Full Transcript

Erik Darling here with Darling Data. I don’t know why my hair looks the way it does. It’s very strange to me. It’s clean, it just looks plastic or something. I don’t know what’s happening to me. Anyway, today’s video we have exciting things to talk about. Actually, we have frustrating things to talk about as usual. Today’s video we’re going to talk about how you can use index views sort of like a filtered index. But that seems sort of obvious, right? But where I want to talk about is how you can use index views, sort of like a filtered index. But that seems sort of obvious, right? I don’t know what I’m talking about using it. So Microsoft has these three things that they put in the SQL Server that are really useful. The problem is these three things don’t get along very well. You have indexed views, you have computed columns, and you have filtered indexes. So what’s awesome is if you create a computed column, you can save your queries all sorts of time calculating crappy expressions, especially if those crappy expressions are in like where clauses or joins or something. But you can’t create a filtered index on a filtered index on a filtered column. Then you have index views, which are wonderful, because you can like make this whole complicated query, like presented as a table that you don’t have to like aggregate every single time, which is wonderful, but you can’t create a filtered index on a index view. So it’s like you have these things that like should be in this lovely like thruple relationship, just like getting along, everyone’s laundry is clean, everyone’s got dinner, everyone’s happy. No, we all get We get to like, we get to like, we get to watch all the TV shows and movies we like, but no, no, we don’t get that. We don’t get that. We have like, everyone’s like, separate bedrooms angry at each other, right? It’s no good. It’s not a good situation. So if you like the content that I create, you would like to support my endeavors, like to hear more about not non traditional database relationships, you can you can become a member of this channel for as little as $4 a month, and make me feel great about myself. If you don’t have four bucks a month, the school bully stole your lunch money, you know, it’s gone out of your pocket. If you like, comment or subscribe to like, like my videos, comment on my videos and subscribe to my channel. Then that makes me feel all comfy cozy too. If you are in need of SQL Server consulting, you might you might watch these videos and they the the topics that I cover and them may seem like things that you could use help with in your SQL Server, I am available to do all of these things at a reasonable rate. Hmm. If you would like some fantastic SQL Server training that is far less expensive than what you will find elsewhere on the internet, you can get all of mine for about 150 USD for the rest of your long happy life.

And I mean, honestly, these this this these videos make great heirloom gifts to you can leave these to your kids loved ones. Maybe not even the other people’s kids, you can give them give them away all over the place. They’re great. They’re stocking stuffers. up Let’s get started rumor lindo and then you would like to answer whatever I’d like of life! arded about jakiś photo with SPEED drive!

Press the B video app on wifi Drop your Acts gotta make sure that’s the I’m obviously, well, come on, zoom it.

Kick in, buddy. We’re going to get rid of these two things first, because if we don’t, we might hit some issues. Now I’m going to create a table. Didn’t I just drop you?

What’s happening in the world? Drop table if exists, and then it’s just like, oh, yeah, let’s say we’ll exist all right. Thanks, database.

Selling me down the river here. So I’m going to create a table called index view, and that index view is going to have a computed column in it, or I’m just going to multiply the ID column by two. Now we don’t even have to put any data into this table to see what happens here.

It’s just very annoying. But I cannot create a computed column, or rather I cannot create a filtered index on my computed column, filtering it to where not fizzbuzz equals zero without SQL Server throwing up on me.

All right. So this is what, this is the error we get. Filtered index n cannot be created on table dbo.index view because the column, not fizzbuzz, in the filter expression is a computed column.

Rewrite the filter expression so that it does not include this computed column. Well, then I wouldn’t have a filter expression, would I? Would I, SQL Server?

There would be no filter expression. You need to get over yourself. Allow the nice people who use you to create filtered indexes on computed columns. I mean, fine.

If you need some rules in place for that, I get it. But it should be a thing that you can do. Now, we could, of course, create an indexed view, right?

And let’s say that we do that, right? We get real cool. We say, all right, we’re creating this view called computed column, and we’re going to create it with schema binding so that we can index it.

And then we add a unique clustered index to that indexed view. So now we actually have an indexed view, right? We did this. If I try to create this again, it’ll say it already exists, right? So we indexed this view.

It is alive and existing as an indexed view. But I can’t create a filtered index on that indexed view, right? Now, if I go look at what this whole error message says, I’m going to make this a little bit nicer for all of us because I like to zoom in and why Microsoft doesn’t pretty print error messages and just sends them sprawling across the screen and beyond me, you know, we all make choices.

But the filtered index, NFB, cannot be created on the object dbo.computed column because it is not a user table. Filtered indexes are only supported on tables.

Indexed views are not tables, even though they are sort of presented as such when you query them. If you are trying to create a filtered index on a view, consider creating an index view with a filter expression incorporated in the view definition.

Well, okay. We can do that, can’t we? All right. So one way you can get around this is, of course, by doing this, right?

So we’re going to change our view. And I’m going to tell you something very funny about indexed views. If you alter them, so I have creator alter here. So if you alter an indexed view, it drops the indexes on it.

So if you have an indexed view with like a clustered index and maybe you’ve created other nonclustered indexes on that indexed view in order to make queries faster that touch the indexed view, if you alter that view, your indexes disappear.

You have to add your indexes back. So we’re going to create or alter this view. And we’re going to include where not fizzbuzz equals zero in our where clause here. And now we’re going to create our index.

We get to re-index that table. Isn’t that fun? It’s great. The problem that we run into sometimes is, though, is that with indexed views, for various reasons, especially if you’re on…

So for various reasons, your query, even though it matches the indexed view, might not match the indexed view, right? Like SQL Server’s optimizer might get hung up in one of a couple places.

One, if you’re on standard edition, you’re probably going to need the no expand hint so that your indexed views get touched. If you’re on any edition of SQL Server, you might have a trivial plan or…

Well, a trivial plan accompanied by simple parameterization of your query that might mess things up. And I would say that almost on any edition of SQL Server, you should be using the no expand hint when you touch indexed views because SQL Server does not automatically create statistics on your indexed views without using the no expand hint.

So like you’ll have statistics on whatever column you created your clustered index on, right? Your unique clustered index because you created an index so you get a statistics object. But if you touch other columns in the index view, SQL Server does not auto-generate stats on those unless you use no expand.

So if we run this query and we say ask forward not fizzbuzz equals zero, we actually… You can see in the query plan that there are a bunch of square brackets in here that I did not type.

And there are a bunch of spaces around periods that I did not type. And we have a little parameter looking thing over here which I certainly did not type. And we have touched our indexed view right here.

Or sorry, we have touched our table called indexed view here. Remember, our table is named indexed view and our indexed view is named computed column. I went through great lengths to make this very clear and straightforward.

And the reason why this happens, we use a couple rocket sciencey trace flags here, 8607 to look at compilation information and 3604 to print that information to the messages tab.

We will see over in the messages tab that we got a trivial plan. So it’s possible with… Even if we just got a trivial plan that SQL Server may not have gone to the point of expression matching our query to the indexed view definition.

SQL Server might have just been like, oh, whatever. I don’t know. This is stupid. But then if we scroll way down, we will have two little messages here. The query is marked as cacheable and the query is marked as safe for auto-parameterization.

Oh, sad. So there are two ways that we can get around this issue, right? We can say where not fizzbuzz equals zero and we can use the no expand hint, right?

And this will allow us to touch our indexed view named computed column. Even though we have all this stuff, now SQL Server is like, oh, right.

Got it. Got it. Or we could say where one equals select one and… Oh, there’s nothing in the messages tab there. But the one equals select one prevents… Actually, I should run these next to each other, right?

Do this. And let’s look at these execution plans. And these both hit the index view, right? These both hit compute… Oh, go away, tooltip.

No one asked for you. These both hit computed column. The one up here… Actually, the no expand hint does that? No, it doesn’t. Great. So the no expand hint does keep the optimization level trivial and does have this in there.

But SQL Server now trivializes our query to touch the index view. Down here where we use one equals select one, we’re going to have this additional compute scalar in there, but it’s like nothing.

And now we will have full optimization and full optimization will be like, hey, we found an indexed view. Lucky us. Right?

So good news there. So if we were to look at this information, the trace flag information for either of these, you would see that this is safe for auto-parameterization, but we have said to SQL Server, you will not expand our views.

So this was considered safe there. And then if we look at this and the messages have, we will not see the simple, the trivial plan information up here and we will not see…

Oops, zoom in too well there. We will not see the safe for auto-parameterization message here. So you can do things with indexed views that you cannot traditionally do with filtered indexes and computed columns.

You do… You still cannot create a filtered index on an indexed view, which is annoying as hell, but it is nice at least that you can create an indexed view that has a computed column in the where clause and then like use that in your queries generally to filter to where that indexed…

To where that computed column hits some values that you really care about. I’ve used this a few times over the years. It’s certainly not something that I need terribly often, but it is something that when it comes up, it is incredibly useful.

So I do hope that you can take something away from this that you will perhaps use in your endeavors, tuning queries and indexes. Stuff like this is especially useful if you’re on standard edition.

You know, a lot of stuff on enterprise edition, just using batch mode is like a good enough shortcut to getting good performance on big aggregate queries without having like this additional index structure to maintain.

So oftentimes like, you know, I’ll short be like, okay, let’s just see where batch mode gets us. If batch mode gets us like, you know, you know, 70, 80, 90% speed up, we’re probably not going to go this far unless we absolutely need to.

But if you’re on standard edition, we’re batch mode because Microsoft hates you, is limited to a .book 2, you’re going to have a bad time. Like it’s not like that query. It might, it might at best break even at .2 using batch mode.

But, you know, if you’re running it like .4, 6, 8, or higher on standard edition in your aggregation query and index views don’t have that limitation on them. Somehow Microsoft let that one slide.

I don’t know. The bean counters must have figured, hey, no one uses these things anyway. But yeah, like if you’re on standard edition, the, like, like there’s no limitation on . when you query index views.

So if you have a big parallel plan or something, those might turn out better with an index view than using batch mode. So stuff worth thinking about, stuff worth considering. And as always, it’s worth protesting outside your local Microsoft office to get them to treat standard edition users with a little bit more respect.

You know, with all the 2025 announcements that have started bubbling around, I still haven’t seen anything about standard edition being able to use more than 128 gigs of RAM, which is just a laughable situation in the year that we’re in.

So I don’t know. Maybe, maybe, maybe we’ll get some love on that. But I kind of doubt it because, you know, you need to pay that friendship task tax.

If you don’t pay the friendship tax, Microsoft isn’t going to be your friend. Anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And, in the next video, we are going to talk one more time about index views, but in a slightly different way, because we have a little bit more to go over with them.

Don’t we? We do. Index views. Who knew? Who knew? Anyway. Cool. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Indexed View Maintenance

Indexing SQL Server Queries For Performance: Indexed View Maintenance



Thanks for watching!

Video Summary

In this video, I dive into the intricacies of indexed views in SQL Server, exploring their maintenance and limitations. Indexed views can significantly enhance query performance but come with a set of restrictions that make them less than ideal for certain scenarios. I discuss setting up optimal conditions for index view creation and maintenance, highlighting common issues like subquery limitations and the absence of basic aggregate functions such as `min` and `max`. Additionally, I share practical tips on how to mitigate these challenges by creating supporting indexes, ensuring efficient query performance. This video is part of a series where we’ll explore more ways to leverage indexed views for complex queries and address some of the limitations they present in SQL Server.

Full Transcript

Erik Darling here with Darling Data. And fortunately, I remember to turn my microphone on at the beginning of this video, so I don’t have to do 15 more takes of this. That’s nice. In this video, I think the next three videos we’re going to be talking about, I mean, still indexing SQL Server queries for performance, but there are three things that I want to talk about when it comes to indexed views. Indexed view maintenance, how you can use indexed indexes, where you can’t normally create filtered indexes, and what was the other one? Fixing what would normally be sort of a non-sargable predicate, or making aggregate queries faster, specifically when you have a big query that has a having clause on it. So, one thing that we’re going to have to talk about right at the outset, is that we’re going to talk about the first thing. The first thing I want to talk about the first thing, but the first thing that we’re going to talk about is the first thing that we’re going to talk about is that all the restrictions and things you can’t do with indexed views, because there are a lot of them in SQL Server, and they are depressing and painful. So, we already covered this one, so I forgot to click, or maybe, I don’t know, maybe I clicked and nothing happened. It’s hard to tell sometimes. We live in a world of mystery, don’t we?

But if you want to support the work that I do, creating content for this channel, you can click the little link in the video description right down in here somewhere, thereabouts. Sorry, my finger disappeared. If you know me, that happens a lot. And you can become a member of the channel for as little as $4 a month. These fingers don’t disappear, do they? If you don’t have $4 a month to spare, for whatever reason, maybe you spent it all on Spare Ribs, liking, commenting, subscribing, all fantastic endeavors that you can embark upon in your daily life to keep me company on the internet. If the stuff that I’m talking about, say, gets into your head, and you’re like, man, I’m having these problems with my SQL Server, maybe hiring Erik Darling, maybe he can fix these problems with my SQL Server in exchange for money. And yes, I can. I can do that. Everything that you see me do here, I can do on your server. It’s not just this server that these things work on. It’s not this magical microcosm of SQL Server where I can solve these little problems and things get better. No, no, I can do this on your servers too.

Even if they’re in Microsoft’s cloud, which is just notoriously awful. So all sorts of things that can be done here. And as always, my rates are reasonable. If you would like some fantastic SQL Server training, I have it. No one else does. I’ll be honest. I’ve seen everyone else’s. Mediocre at best. You can get all 24 hours of my training for 150 USD. It’s a much better bargain than you’re going to find anywhere else on the internet. Unless you pirate these. And then there’s nothing I can do. But if you go to the link up there and you use the discount code there, you can get all of it. And there’s no subscription. It’s just yours forever.

It’s like when you used to buy a CD or a DVD. You just you just had it. You didn’t have to like pay 10 bucks a month to keep listening or watching. Funny, I guess. Upcoming events for 2025. Again, you will know as soon as I’m allowed to tell you. With that out of the way, let’s talk a little bit about these here. Indexed view things. So like I like I said, the first thing we have to talk about is some of the stuff that like preempts using index views. So really? See, I didn’t notice this when I first cracked this open, but apparently Microsoft is now advertising on their learn page.

That’s cheesy. Who would do that? All right. Kind of weenie. What kind of weenie advertises on their website? But if you’re going to create index views, this actually goes for filtered indexes as well and computed columns. And you expect your queries to use them off the like without any sort of interference or without throwing errors. You do need to follow these set options. So anti nulls, anti padding, anti warnings, erythabort, concatenal, yields null, and quoted identifier all need to be set to on.

Right? This is all says on. And then, of course, numeric round abort, this weirdo. That has to be set to off. So that’s probably the easy part. I say probably because there was one time when I was like, I’m going to create a filtered index. It’s going to be dope.

And this cold fusion application, which uses like the weirdest driver to talk to SQL Server, just started throwing errors everywhere. And I had to get rid of the filter on the index. And that was a funny day. But hey, you live and you learn sometimes, you know.

Mistakes from years past turn us into better people, right? We don’t experience any pain. We learn nothing. You can tell by maybe the number of tattoos that I have that I’ve learned quite a bit.

So there are some other problems with indexed views as well. And there are things that you can’t do inside of them. Some of them are worse than others.

So, you know, CTE. I don’t really care about CTE anyway. So screw it. Subqueries.

This is an interesting one. Because, you know, some people do have, you know, very pathological issues with subqueries in general. But this actually doesn’t really fully explain just how many limitations this introduces.

You cannot use exists in an index view definition. You can only use joins. You can only use inner joins.

You might see, if you look down here, there are many other types of relational table joining syntaxes that you cannot use. But subqueries actually prevent the use of saying, like, where exists something in this table that correlates to this other table. Which is wild.

You cannot put an exists clause in a SQL Server Indexed View. Huh. I know. I know. Basic stuff. Right?

There’s some other things in here. So this one is particularly funny to me. You can’t use distinct, but you can group by every single column. What do we think that is?

Hmm. Hmm. What is that? Funny. Funny, funny, funny. So, derived table expressions. So, if you wanted to, like, put a derived join in your index view, you cannot do that.

So, if you want to say, like, from select from this thing join the result of this select to, like, some base table, you can’t do that. You cannot do self-joins, whatever. I don’t know.

I guess that makes sense. You cannot use table variables, inline table valued functions, or multi-statement table valued functions. This makes sense, and this makes sense. This does not make a lot of sense.

Inline table valued functions are typically very simple things. You can only put, like, a select in there anyway, so that doesn’t make a lot of sense. And then there’s some stuff down here that, like, you know, kind of, like, who cares a little bit? Pivot and unpivot.

Screw you. Table sample system time. Full text search. If you’re using full text search in SQL Server, you deserve every ounce of pain that you experience. That is nonsense. Cube rollup.

Okay, whatever. You cannot have having in there, which is actually one of the problems we’re going to solve later. And these are interesting ones, I think. Union, union all, accept and intersect.

Again, basic relational syntax. Microsoft, what are you doing? What’s happening? Where’s that $250 billion a year going? I mean, I know you’re burning a lot on AI lately, but index views have been around for a long time.

How about, like, this much of an improvement? There’s something, actually. It should be in this page, but I don’t see it immediately.

Yeah. This is what I was looking for. So you have to use count big.

You can’t use count. That’s fine. You can’t use average. You have to use count and sum as separate columns. But I think the big one in here is this.

Well, actually, probably this. You know, like, one of the single most common things that you can… And this is funny for a reason.

I’m going to tell you in a minute. Some of the most basic stuff that you would want to aggregate in having an index view are min and max. And a long time ago, a guy named…

Decided that it was not worth maintaining a nonclustered index in the index view to support min and max. So we don’t have that option available to us. And, you know, like 20-something years later, I don’t know if we’re ever going to get it.

You can use greatest and least in index views. Those are newer T-SQL syntax from, like, 2022. You can put those in index views all day and everyone’s like, fine.

Great. So there’s a lot of restrictions on index views that are really annoying. And it would be nice if Microsoft invested, like, a little… A fraction of the money that they blow on OpenAI on improving SQL Server just a little bit before, you know, SQL Server slowly turns into COBOL and Fortran and whatever.

But there’s a lot of stuff with index views that’s missing. Like, you can’t do it. You can do it anywhere else. So in this video, we’re going to talk about index view maintenance. Because one thing that you do need to consider when you create index views is, like…

So, like, under, like, optimal conditions, having an index view shouldn’t be any more maintenance for your query than having another nonclustered index on the table or whatever. But, like, you do have to consider how that index view is assembled.

So let’s look at this query, right? And I’m going to run this update in the transaction so that we don’t actually change all these scores to a higher score and have weird results in the future.

But if you look at the query plan for this, this runs pretty quickly, right? Just a little over half a second. Like, I know, this could be better with an index. We just recorded a view on improving…

Sorry, we just recorded a video. Not a view. We just recorded a video on using indexes to improve the performance of modification queries. So, yes, I know, we could make this better.

But just stick with me for now. Now we’re going to create this view. And this view, I mean, we’re updating the post table here. And this view has the post table in it.

Now, there are plenty of warnings that go along with views that reference more than one table, especially around locking and blocking. Serializable isolation levels will kick in when these views need to get maintained.

So it really does behoove you to make index view maintenance as quick as possible. So we’ve got that index view… Sorry, we’ve got that view in place. Now we have to create a unique clustered index on the view in order to get it to be an index view, a materialized view, if you prefer that parlance.

It’s fine if you do. So this takes 20 seconds or so to complete. I don’t know. Hey, 20 seconds on the nose.

Look it. Right under my armpit. There it is. 20 seconds. Wonderful. It’s almost like I’ve done this before. But now when we go to run this update query, I just got the estimated plan real quick to show you that it gets a bit more complicated, doesn’t it?

And then when we actually run the query, this thing that went from finishing in 600 and something milliseconds, well, this is going to drag on a bit.

This is going to end up being, I don’t know, probably around like 12 seconds. 12. Look at that. Oh, sorry.

Look at that. 12. 1, 2. Right there. Look at that. 12 seconds. And the query plan explains a bit why. We have some problems in this query plan, don’t we? IndexView maintenance, if you’re not paying close attention, can be quite painful.

We spend about seven seconds scanning the clustered index of the votes table. SQL Server misunderstands this hash join and this hash aggregate. This adds a fairly significant chunk of time to the query plan.

We go from seven seconds there to almost 11 seconds there. And then, well, another second or so there. And then another sorty, spilly thing here.

So the indexView maintenance in this case is kind of gnarly. But if we were to pay attention to the query plan that SQL Server has there, there’s actually a missing index request that is helpful for like once ever.

SQL Server is like, hey, psst. If we had an index on this post ID column in the votes table, I think the indexView maintenance would suck less. And SQL Server, for all its faults, is actually right on this one.

If we had an index on the post ID column, that indexView maintenance would be a lot less terrible. So we created that. This index is there now.

And now if we go look at this update, this should only take a couple seconds. And so like this went from being 600 something milliseconds. I mean, like the time did double, which maybe isn’t great.

But if this indexView is quite helpful. And look, there’s even another, hey, if you create this other index, boy howdy, we can speed this up. And of course, this one is sort of right too because that is half the time we spent there.

But this we knew about. This we knew about before, right? We were aware of the situation when we first ran that update. We knew because we watched my other video on improving modification performance with indexes that if we had an index there, we could make this faster.

So if we were to create that index, this thing would take, this thing would be pretty quick all in all. Now, indexViews do need to be created for a good reason because you do introduce this overhead, particularly when that indexView references more than one table. You know, it can, again, like really puts in weird emphasis on locking with the serializable stuff.

So you have to be careful there. But indexViews when properly used and when properly indexed for can be very powerful things. So in the next two videos, we’re going to talk about different ways you can use indexViews to get around some tricky stuff in SQL Server queries and query plans.

And some limitations with other fancy indexing features in SQL Server. So we’re going to do that. I’m going to end this one here.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I hope that you will petition Microsoft to make indexViews not suck, finally. I don’t know.

Something along those lines. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Blocking and Deadlocking

Indexing SQL Server Queries For Performance: Blocking and Deadlocking



Thanks for watching!

Video Summary

In this video, I delve into the intricacies of SQL Server indexing for performance optimization, specifically addressing how to index tables effectively while avoiding common pitfalls like over-indexing. I also explore the ins and outs of blocking and deadlocking issues under the default isolation level “read committed,” explaining why enabling read committed snapshot isolation can significantly reduce these problems. Throughout the video, I share practical examples and tips on how to tune both select and modification queries for better performance, emphasizing the importance of considering your isolation level when dealing with indexing challenges.

Full Transcript

Erik Darling here with Darling Data. Feeling ultra professional today for various reasons. Did some cool stuff, some very cool SQL Server stuff, but of course it’s nothing I can talk about here because, you know, it wouldn’t make any sense. In today’s video we are going to continue with talking about how to index SQL Server for performance. And in this video, we’re going to talk about how to index SQL Server for performance. And in this video we’re going to address problems with blocking and deadlocking. Now, what’s funny about this is this is a situation that really rares its ugly head, at least between read queries and write queries, under the unfortunate default isolation level read committed. If you need some help understanding how terrible an isolation level read committed is, you should watch the playlist of isolation level read committed is, you should watch the playlist of videos that I have in my channel called Everything You Know About Isolation Levels Is Wrong, where I will go into great detail about how awful read committed is as a default choice for SQL Server. And I really hope that more people start understanding how lousy and isolation level it is, especially as Microsoft does takes, takes some steps towards saying, hey, maybe you should do something else. For example, you know, in Azure SQL EB, read committed snapshot isolation is the default. You’re allowed to turn it off if you really need to, but you know, if Microsoft is like, you know, hey, this is the default here, maybe it’s not as bad as those stupid blog posts made it out to be that you may have read. And, you know, another thing to consider is that there’s a new feature that’s a new wish feature that’s rolled out to Azure SQL DB, and seems to be full steam ahead with being present in SQL Server 2025, called optimized locking. And if you read the docs page or the learn page, I forget what they start, whatever they call it these days, for optimized locking, there are two suggestions in that page. One of them is to enable accelerated database recovery recovery, actually, I think that one’s a requirement, I forget. And the other one, which is definitely a suggestion, but a strong suggestion, is to enable read committed snapshot isolation, makes it makes the optimized locking work better. So if you are, you know, like, wow, I’d like to optimize some locking my SQL Server database, enabling read committed snapshot isolation is a pretty good way to do that. Anyway, right, you have less blocking between read query and write queries. That’s great. You have less deadlocking between read queries and write queries, all you have to do is turn on this one thing. And your life will be instantly better. But anyway, before we get into all that, let’s talk about how what a bunch of generous souls you are. Up to nearly 40 people who will become members of my channel, and donate some for four or four or more dollars a month to support me making this content. So that’s very sweet of all of you. I appreciate every last one of you. I do not appreciate the 30% that YouTube withholds in taxes, but that’s not your fault.

If you don’t have the $4. If you don’t have the $4, liking, commenting, subscribing, all just, I mean, very humanitarian efforts on your part. I would appreciate that. If you need help with SQL Server, blocking and blocking included, if you would like to hear more about our Lord and Savior read committed snapshot isolation, or even snapshot isolation, you can hire me as a consultant to do these things. And as always, my rates are reasonable. Training. I have stuff that I can teach you, including about locking and blocking. Wow, what a low price, about $150 USD. And you can get all of that stuff at that link up there with that discount code there. And there’s also a link to that in the video description in case you didn’t, there weren’t enough reasons to look at the video description. Now you have this one. It’s wonderful. Again, upcoming events for 2025 will be announced as soon as I hear about them. All sorts of stuff that I got to deal with, isn’t there? All sorts of things, also the hoops I have to jump through before I start announcing things.

But let’s talk about locking and blocking in SQL Server. And what was that? Okay, that was where I rolled that thing back. We’re good there. All right. All right. Let’s make sure we have nothing going on. Now, what I’m going to show you in this video is how, well, you know, it’s the default read committed isolation level, which most of you are suffering under. And before I show you stuff about how adding an index can help fix some blocking and deadlocking problems, I do need to warn you a little bit about how indexes can cause problems.

There are three main areas where over-indexing. And by over-indexing, I don’t necessarily mean the number of indexes you have. I mean, like how those indexes are used and how those indexes are defined. So when I when I think about over-indexing, I think about if you have tables with lots of indexes on or with indexes, don’t even lots, just with indexes on them that have like zero reads and a high number of writes or a really low number of reads and a high number of writes. Or I’m thinking about indexes that have very similar key column orders, maybe different included columns and stuff, but like, you know, indexes that you could theoretically merge together.

So you have like one index that does the work of, you know, two or more indexes. To me, that’s where that’s really over-indexing. If you have a bunch of indexes on a table that all have very different definitions or even different enough definitions, but they’re all doing like hard work, like they’re like queries are reading from them a lot. They’re helping queries go faster. You’re not necessarily over indexed.

But the three main areas where over indexing, as I just described, it can hurt you are one, the buffer pool, right? Because most SQL servers I look at just don’t have enough memory compared to the amount of data they have. So you have all these different indexes competing for space in the buffer pool.

The transaction log, because every time you modify data in the base table, you have to modify data in any indexes where the column that have the columns in them also change. So like for inserts and deletes, that’s, unless you have filtered indexes, that’s going to be like all of them.

For updates, it’s just going to be the columns that are in the index definition, if those changed. And the third area is going to be around locking. The more indexes you have that have to change when data gets put into a table, the more you increase the likelihood of locking and blocking situations and deadlocking situations.

And, of course, you run the chance of SQL Server trying to escalate locks from the row or page level up to the object or table level. So there are ways that over-indexing, having lots of unused indexes, duplicative indexes, stuff like that can hurt you. But you do need indexes to make queries go faster so that you run, like fast queries run less of a risk of having those problems or like being slow and causing problems or other queries, interfering with other queries, stuff like that.

In my experience, unless you’re in Microsoft’s cloud, because Microsoft does this really, plays this really nasty trick on its customers. And it doesn’t matter what you use. It doesn’t matter if it’s Azure SQL DB.

It doesn’t matter if it’s managed instance. It doesn’t matter if it’s a hyperscale. Microsoft throttles your transaction log. So writing that stuff out to the, writing data modifications out to the transaction log can be incredibly slow and painful.

So if that’s your problem, that’s not necessarily the fault of the indexes. That is the fault of your cloud provider. You should, you should maybe have a talk with them about how underhanded of a maneuver that is.

But what was I going to say? Anyway, you can end up with locking and deadlocking problems when you sort of have the opposite problem, right? When you have too few indexes.

Because whenever we, like I talk to people about index tuning, without fail, the majority of the queries that they’re worried about are going to be select queries. Very few people mentally think or like understand that modification queries need indexes too. The faster modification queries can find the data they get to, the better off you are.

Sort of in a general sense, I mean, aside from like query speed, if your modification query locates the data that it needs to change via seek, you will most likely, like unless you’re hitting a lot of rows, start with row locks. If your data modification query acquires the data that it needs to change via a scan, is most likely going to start with page locks.

Again, unless you’re hitting a lot of rows right off the bat. But let’s just take a quick look at this query, right? I’m not going to run this here, of course, because here is not where we want to run this.

We want to look over here. So I have query plans turned on and I’m going to run this query with no indexes. If we do this, this runs for about a little over half a second.

And it does a lot of work over here and it does a lot of work over here. And, you know, it’s acquiring data down here from the post table where it’s also updating, right? So we need to find data that we want to update in the post table by like joining it to the users table.

If we wanted to reduce the risk of this query colliding with itself or with any other queries that are touching the post table, we would want to add some indexes in that help this query go faster. So if I, I mean, we don’t necessarily, we don’t strictly need the index that I’m making here on the users table.

It’s just to sort of illustrate that the more you do to help your modification queries go faster, the simpler and the simpler your modification queries are, the query plans are, the faster they go, the better up you are when you need to deal with these things.

So these indexes are created here. And now let’s just rerun this. And remember, this was like a little over, a little north of half a second. But now with those two indexes in place, this thing runs for about 19 milliseconds, right? And it’s like, so just, you know, coming over here and looking, we no longer need a parallel plan.

SQL Server, just, you know, be able to seek to data that it cares about on both sides of this. Everything is much better, right? So like when, if you’re like serious about query tuning, don’t just stop at the select queries in your workload.

Make sure you’re tuning any modification query that, you know, has like a query underneath it. If you’re inserting values, what are you going to do, right? There’s not a lot you can do there.

But like, you know, updates, deletes, even if it’s like an insert with a select, you know, don’t be afraid to tune those queries as well. Like modification queries need love too. So with that out of the way, let’s talk a little bit about how indexes can help you under read committed.

Now, I’m going to go off on something a little bit here where if you are truly worried about over indexing and you are truly worried about locking and blocking, the answer is not no lock. If you are using an optimistic isolation level, like read committed stop shut isolation, not only would you avoid the many, many pitfalls of no lock, but you would also avoid having to sort of over index tables because read queries and write queries would no longer be blocking each other or deadlocking with each other. So if you’re truly worried about it, optimistic isolation levels can help you get out of lots of blocking and deadlocking scenarios without having to tune queries over index.

Oh my gosh. I can only have five indexes on this table. If I create a sixth index, my workload will go to hell.

So like just, you know, one thing to consider is, you know, what isolation level are you using? If you’re afraid of indexing too much, use an optimistic isolation level. You won’t have to index as much.

You’re blocking and blocking problems and deadlocking problems will largely go away. You’ll still have write queries fight with each other, but that’s okay, right? Because, like I said, who’s worried about their write queries anyway? So let’s begin a transaction and let’s run this update query.

And, of course, this thing switched over to that mode for me. So what we have here is the results of my handy little helper function, What’s Up Locks. This is available at my GitHub repo.

If you go to code.erikdarling.com, you can get a nice little shortcut there. But if you look at the results here, we have an intent exclusive lock on the object. That’s not what we’re worried about.

We’re worried about this exclusive lock on 11 pages in the badges table. Now, if we come over to this window with, like, this query is open. We’ve done the transaction.

We just got the results of What’s Up Lock. We didn’t commit or roll anything back down here. If we look at this query, this query will be able to finish. Like, you know, this query is able to finish because we’re seeking into the primary key right here, where this ID equals this.

And this ID is just outside the range of pages. Now, remember, pages contain many rows. So if, like, this query is able to finish, but this query that is just one ID higher, this one gets stuck.

And if we come over here and look at SP who is active, we’ll see the locks that this thing has taken, which is exactly what I showed you before. Right? It’s the 11 page locks on the badges table.

Right? There we are. Like, my results line up with SP who is active. I consider that a real feat. It’s a pretty good deal.

And then this query is trying to take some locks, but the object level shared locks got granted, but the page level shared locks, we are waiting on that. So this query is blocked because this query could not get to the data that it cared about. It was like this row was on a page that is currently locked by this query over here.

Right? So let’s roll this back. Let’s make sure that’s double rolled back.

And let’s create this index down here on the date column on the badges table. Now, remember, this update query is hitting the badges table. I’m just going to get the estimated plan for this.

We’re able to seek into that index now. Before, all we had was the clustered primary key on the ID column. So we could seek into the ID column, but if pages in that ID column that had dates in this range were on those pages, or even like if dates a little before or a little after happened to be on these pages, those queries got blocked.

All right? So now let’s run this with that index in place. And what we’re going to see is the pages have intent exclusive, so not actual locks.

And the object still has intent exclusive. Now, though, we have granted 856 row locks. All right?

Oops. Sorry. Makes a little more sense over here. Here are our row locks. Here are the 856 row locks that have been taken on the badges table now with that index in place. And if we come over here, now we can run this query.

And now we can run this query. All right? These both are able to finish now because SQL Server is only locking rows. And so the pages that are unrelated to data that was on pages that are unrelated to those rows, we can now get to that data.

We can access that data. We have a different access path to see that. Now, really, no demo about locking and blocking would be complete if we didn’t make fun of no lock just a little bit.

Right? And so if we run this query, and keep in mind, this where clause here matches the where clause of the update that we ran over here. Right?

This is 2010, 1225 through 1226. If we run this query with the no lock hint, we will, of course, see all of these user IDs change to what we, or rather displaying now what we change them to in this query that just has the open transaction. Right?

Like this query is open. We were changing every user ID to this that was in that range. But, of course, let’s say this query failed or someone changed their mind or we decided to go do something else and we roll this stuff back. Now this query is going to show us what those user IDs were before, which has nothing to do with the number that we just had in there.

So, this is something that I talked about quite a bit in the video series that I mentioned before. Everything you know about isolation levels is wrong. That playlist is available again on this channel.

But, you know, what I talked about before is that these are the kind of problems that a lot of developers hit under the default read committed isolation level. And they have decided that the situation with read committed is so bad that they are willing to have data problems like I just showed you with the no lock hint rather than deal with the problems that read committed exposes in your database and your workload. So, like, you know, like, like everyone likes to make fun of no lock, me included.

But really, like, no lock is what people choose because read committed is so crappy. So, if you’re using read committed and you’re, you know, you’ve got this idea in your head that no lock hints are a good solution, they’re really not, you should be looking at optimistic isolation levels that avoid dirty reads like I just showed you that return you a far more stable and correct set of data without the locking and blocking problems. So, I think we’re about done here.

Indexing your modification queries can absolutely help them not only go faster, but it can also help reduce the number of locking and blocking problems that you have in your database. Of course, the ultimate solution to locking and blocking and deadlocking problems between read queries and write queries is an optimistic isolation level. I can’t say that enough.

So, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that if you haven’t seen my series on isolation levels already, you will go find that playlist and watch the whole thing. It is well worth your time.

And thank you for watching. Now, we’ll see you in the next video, which is about more indexing stuff. So, we’ll see you there. All right.

Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Missing Index Requests Are Bad And Dumb

Indexing SQL Server Queries For Performance: Missing Index Requests Are Bad And Dumb



Thanks for watching!

Video Summary

In this video, I dive into the often-overlooked topic of missing index requests in SQL Server. While many people might jump straight to creating indexes based on these suggestions, I argue that it’s crucial to take a step back and actually analyze the query plan. Missing index requests can be misleading; they don’t always point to the most impactful changes you could make. In this video, I demonstrate how simply following missing index scripts without deeper analysis can lead to significant performance degradation. By walking through an example where creating the suggested indexes resulted in a query that took over two hours instead of 39 minutes, I highlight the importance of understanding the broader context and potential alternatives, such as using batch mode or leveraging early aggregation techniques. The key takeaway is that missing index requests should not be your primary focus; they are just one tool in your performance tuning arsenal, and it’s essential to critically evaluate their impact before implementing them.

Full Transcript

Erik Darling here with Darling Data, failing my Darling Data dandiest. And I have a special video for you. I’m recording it a little bit later than I thought I would, and I think the reason why will become apparent in just a moment. But we’re going to talk about missing index requests in this video. Because I don’t know what I’m going to say. I still, to this day, see a lot of people talking, almost bragging about implementing them. Like, like, cool, you followed the instructions on the bottle. Alright. We’re all proud of you. Ding dong. And how missing index requests do not really always expose the performance problems that queries to the performance problems that queries are having. And how sometimes you’re going to actually have to get up off your lazy keister and actually look at the query plan and figure things out. You can’t just, like, I don’t know, use whatever bargain basement script you got off the internet that scripts out every missing index request and spend some time staring at them and then create them and act like you did your job. Because that’s crappy. So, let’s, we’ll talk about that. But before we do, we need to talk about four dollars. Four of them. If you, if you would like to support the work that I do in this channel, you can, you can give me four dollars a month by clicking on the little link in the video description right down there, right about where my desk chair is. You can’t see via the magic of green screens. And that’s a nice thing to do.

Pretty spiffy of you if you do that. If four dollars a month is beyond your means. If you just don’t have four bucks, well, you can like, you can comment, you can subscribe, and all that stuff is free. At least, at least it is for now. Who knows what you two will start charging for that. That’d be weird. If you are in need of SQL Server consulting, because you are the type of lazy person who just creates missing indexes based on what SQL Server is, you know, you can get a lot of money. SQL Server is telling you might help. And you actually want someone who will do some more better work than that. I am available to do all these things. And it’s been requested by the editors at Beer Gut Magazine that I clarify their position on me being the best SQL Server consulted in the world. Apparently, that title is not valid in New Zealand. So, everywhere else, I win. Some places you can’t beat, though.

If you would like some fantastic SQL Server training, you can get all of mine for about $150 and you don’t have to subscribe to that. That is just a one-time payment. And then you can watch it for the rest of your life over and over again until something finally sinks in, which hopefully won’t take you for the rest of your life because I hope you live a long life. And I would be sad if it took that long for these lessons to be learned.

Once again, I will have news about upcoming events later in 2025. Right now, I’m just enjoying living life in the moment. With that out of the way, let’s do this thing. Let’s talk about missing index requests. So, I’ve run this query, and this is why this is getting recorded later than usual.

This thing took 39 minutes and 5 seconds to run. All right. 39 minutes and 5 seconds. Now, SQL Server is telling us that it wants an index on the comments table.

It wants that index to be, have score as a key column and include user ID, well, as an included column, somewhat obviously. I guess that’s a little bit of a spoiler there. But let’s use our heads for once.

Let’s not just be missing index idiots. Let’s actually look at the query plan. And let’s see, how long did SQL Server spend touching the comments table over here? 998 milliseconds.

What happens if we create that index? Can SQL Server spend 0 milliseconds touching the comments table? Maybe.

Maybe. But if you take 998 milliseconds away from 39 minutes and 5 seconds, you have 39 minutes and 4 seconds. It doesn’t seem like an index that is going to be very helpful, does it?

Does it seem like an index that is going to really do much for us? Now, missing index requests are, of course, sort of, well, this one is interesting. And this one is interesting because if you grab the XML from this thing, there are actually two missing index requests.

SSMS only shows you one in green, but there are two that get logged in the XML of the query plan. If you are the type of person who is smart and likes to go digging into query plans and you hit the properties of things and you expand the missing indexes thing, you will see both of them in here. They show up.

It’s not very friendly for scripting, of course. If you look at these things, you kind of have to keep expanding these little plus signs until you have some idea of what’s going on. But they’re both in here, right?

Now, let me ask you a question. Between you and me, I know none of us in here are born and bred lifelong mathematicians, but is 998 milliseconds, is that really 22% of 36 minutes? I don’t think so.

I don’t think that’s the impact that they have in mind here. I think SQL Server thinks it can reduce the effort of running this query by 22%, but I don’t think that we’re going to see those results if we’re looking at whatever time-telling device you care about. Maybe you have an hourglass or a stopwatch or a grandfather clock or, I don’t know, maybe you’re just real good at counting exact seconds.

But I don’t think that 22% of anything is going to disappear if we create that missing index. Do you? I don’t.

I’ll be honest with you. I don’t see that happening for us. The problems in this query are very much because of spills, right? This is where things start to get real hairy. We spend three minutes and 59 seconds up to this point.

Well, sort of. This looks a little weird. The operator times, God bless, this is 13 minutes and then we go to 36 minutes. So something weird is happening in here, right?

This is not a well-performing query. We’re having some problems. So what I did ahead of time and part of the reason why we’re up so late recording is because I actually created both of the missing indexes that SQL Server asked for. There were the two on the comments table.

And then if you keep getting the estimated plan after you create indexes, SQL Server will keep suggesting indexes. And so I kept doing that. And eventually SQL Server suggested one on votes and then one on posts.

And so after I created the one on posts, I ran the query. And you can’t see it because my armpit’s in the way a little bit. But we’re going to make this magic work.

We’re going to do a little Hollywood. We’re going to do some Michael Bay style special effects. And we’re going to scroll over this way. And you’re going to see you’re not reading that wrong. That’s not two minutes and seven seconds.

There was not a 37 minute improvement. That is two hours and seven minutes. If we look at that query plan, I’m going to have to be real careful hitting control in our here. That is not two minutes.

That is two hours, seven minutes and three seconds. This query has all sorts of problems. Many problems.

Gigantic problems. Two hours and seven minutes worth of problems, really. And SQL Server still thinks that a missing index on the user’s table is going to be what solves that problem. But if we go all the way over here and we look to where we touch the user’s table, that was 36 milliseconds out of two hours.

What does SQL Server say? We could have an impact of nearly 35% on this query. Wow.

Wow. Would that really fix this? No. No. We would still be screwed. We would still have a query that runs for, well, I guess two hours, seven minutes and minus 36 milliseconds probably. That just might do it.

Just sounds about right anyway. So this thing just really gets screwed up in all sorts of places. You know, there’s 44 minutes in here and 36 minutes in here. And, well, we spent 46 minutes up until here.

So this thing is really screwy. The funny thing is that, really, no amount of missing index is going to help this thing. But if all you do is look at missing indexes and think, well, if I create that index, something will get better, right?

I created, like, five indexes. And it got worse. It went from 39 minutes to two hours and seven minutes.

That’s not an improvement. That’s a worsement. It’s not a good showing. What missing indexes tell you are a very brief opportunistic, a passing thought, really, by the optimizer.

Well, I’m creating this query plan. You know what? It might be better if we had this thing, right?

Oh, it might be okay. Maybe things would get better if we had this. But it doesn’t really know. It doesn’t really understand what’s going to happen after you create that index. After you create an index, you could get an entirely different query plan.

All of a sudden, the costs are all different. Maybe it wouldn’t be 35%. Maybe it wouldn’t have a 35% lower cost. Maybe it would come up with a query plan that cost more. What can we do?

This is quite a conundrum. What missing indexes don’t tell you are ways that you can tune a query that have nothing to do with indexes. It has everything to do with how you write the query and maybe even how you get this query to run.

Right? So what a missing index request won’t tell you is that if you run this query and you use batch mode, if you do something, well, for the sake of making sure that you don’t think I have any weird things up my sleeve, I just allowed this query to run in compat level 160.

Now, cardinality estimation under compat level 160 using the new cardinality estimator is very hit or miss. So I’m letting this one go. Sometimes when I use the optimizer compatibility level hints so that I can get intelligent query processing features like batch mode on rowstore, I pair it with the legacy cardinality estimator because it’s generally better.

I like it better anyway. So this one, I just let it fly. And this all finishes in about eight seconds.

All right. So we made this query better either from the original one by about 36 minutes or 39 minutes or by the second one after I created all the indexes by two hours and seven minutes.

And if you notice, this query is not using any nonclustered indexes. This query is using just all clustered indexes. It’s just it’s scanning all of them.

Right. We’re not doing anything better here. We’re not doing anything more efficient when we touch these indexes, except for the fact that they are all happening or probably the majority of them are happening in batch mode. All right.

So batch mode works a lot of magic on queries that do stuff like that and have lots of problems running in row mode. All right. This query looks incredibly dissimilar to what we had before, but it’s the exact same query. This is what I ran here.

This is what I ran here. All right. It’s the exact same query over and over again. This is what I ran here. Let me scroll back up to where this thing is. It’s the exact same query, just in a different execution mode. All right.

And another thing that missing index requests won’t tell you is that sometimes you can get things better by using a temp table. All right. We create a little table here called precheck, and we dump a partial aggregation of the results we care about into that temp table, and then we join that temp table off to just one other query.

We can get this thing down to three seconds, right? I mean, close enough. There’s 1.7 seconds there.

There’s 1.8 seconds there. So, okay. So maybe a little closer to four seconds. That’s fine. But we’re still benefiting from batch mode. We’re still getting some results there.

So we could run this query just as it is and see a quite significant improvement from, you know, 40 minutes to over two hours to eight seconds. That might be good enough for most people, right? We could also, you know, if we’re allowed to rewrite the query, we could use a temp table.

We can get that down to three seconds. Probably three seconds, probably preferable to eight seconds, depending on the activity, right? Depending on your level of impatience.

And one thing that missing index requests are also not going to tell you is when the optimizer just misses serious opportunities, like early aggregation. Early aggregation is one of the most powerful things you have up your sleeve as a query tuner because the optimizer often misses this. The optimizer often doesn’t think, oh, I can group by some stuff first.

I could group by these columns that I’m joining and doing other things with, and I could join fewer of them because I’ve aggregated early. So if we could rewrite this query to do one early aggregation as part of a derived join, and another aggregation as part of another derived join, and I could join those two derived joins together, and I could even use batch mode as I’m doing all of these things, and I could get this query to run in about a second and a half.

And, of course, SQL Server still thinks that we’re missing an index on the user’s table, which, you know, granted, in the other query where we would have saved like 35 milliseconds on a query that took, you know, 40 minutes or two hours, well, that’s not too significant, but maybe 40 seconds out of 1.425 seconds.

Maybe if we were really just itching to get this thing down to 1.38 seconds or something, what would be, I don’t know, I actually don’t know anyone who would do that. But if you’re out there, I want you to leave now.

I would appreciate if you just step away from the computers. You can buy a book on rock collecting. It might be better for everyone if you would just do that.

So if there’s a message here, it’s that missing index requests should not be a big part of your query tuning efforts. Every once in a while, missing index requests do line up with something that you should do or an index that you should have in some way. But for me, when I’m dealing with this sort of thing, the last thing that I generally care about is missing index requests.

Even when there’s a missing, like even when you go and run the query and get the actual execution plan, you might see the missing index request line up with, let’s, Adam Mechanic calls them data acquisition operators. So there’s two forms of operators.

There’s data acquisition. There’s data processing. Data acquisition is, of course, whenever you read from a thing, right? So whenever you read from an index, whenever you do one of these things, you are acquiring data. Whenever you do one of these things like join or aggregate data or compute scale or filter, you’re processing data.

So there’s acquisition and processing. So if the missing index request does at least line up with when you are acquiring data, that taking a long time, it’s probably worth thinking about creating an index similar to the missing index request.

But another thing that’s a very big deal is that the missing index requests often don’t give you the best possible index. There are a lot of things missing from how it decides what columns go in the key and what columns go in the includes or even what columns show up in there at all. So be very, very careful out there with them.

Don’t just think that because there’s a missing index request, you could make anything meaningfully better or faster. You got to run the query. You got to get the actual execution plan. You got to see where time is spent.

And then you have to figure out how to reduce that time. You don’t have to figure out how to reduce logical reads. You might create indexes and reduce logical reads like in this query. I don’t know.

Did logical reads go down? Maybe, probably. Who knows? Who cares? I don’t look at them anymore. They’re stupid. You might reduce logical reads, but you might have a query that went from 40 minutes to 2 hours and 7 minutes. I don’t know.

It’s not a good thing to judge things by. What you care about is making this final number go down. This is the metric that makes you a query tuner.

Going from 40 minutes or 2 hours to 1.4 seconds, that’s query tuning. Reducing logical reads, if it happens as a byproduct of this, cool. If not, whatever.

This over there in the corner, that’s the metric that you care about. And there’s one final go around on this. Adding missing indexes is not what got us there.

All right. So with that out of the way, I hope you enjoyed yourselves. I hope you learned something. I hope you will not care so much about missing index requests in the future because they make you lazy and dumb. They make you a bad performance tuner.

What you should care about is figuring out why that query is actually slow because it’s probably not going to be all those missing indexes. Anyway, I’m going to go finish my drink and go to bed. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Eager Index Spools

Indexing SQL Server Queries For Performance: Eager Index Spools



Thanks for watching!

Video Summary

In this video, I delve into the intricacies of eager index spools in SQL Server query plans, particularly focusing on their appearance and impact. I explore why these spools often arise, especially when nested loops joins are involved, and how they can significantly degrade performance without any accompanying missing index requests or optimizer hints. By examining specific examples, including a detailed walkthrough of a problematic query, I highlight the inefficiencies and potential pitfalls of eager index spools, offering practical advice on how to identify and mitigate them for better query performance.

Full Transcript

Erik Darling, right here in the flesh with Darling Data. Darling Data, of course, has no flesh except me. I am the only flesh. In today’s video, we are going to talk more about how to fix query performance issues with indexes. And in this one, we’re going to talk about when an eager index spool arises in your query plan. Now, I’ve talked about eager index spools many times. You’ll find other videos on this channel talking about them. But the things that I want to outline about them, that I find interesting about them, are they are not accompanied by a missing index request of any variety. There’s not going to be any green text in your query plan. And there’s not going to be anything in the DMV saying, hey, I created an index for you while you’re query ran, you should create this index. That doesn’t happen. Two, they are very, very likely to show up on the inner side of nested loops joins when you don’t have a good index, but sometimes even when you do. Three, even if they are building in a parallel plan, the index, the table that you read from to build the index, and building the index only happens on a single thread. So there’s a lot of wasted CPU threads in that whole thing.

And that can also make the index, make the table read in the index build take a very long time. Four, three, five and a half? B, Q, I don’t know, whatever. In parallel plans, eager index spool builds are accompanied by a weight called execsync, E-X-E-C-S-Y-N-C. But in a serial plan, a single threaded plan, they emit no weight, even though we did, we would be waiting on that spool to build. And that was something else. Oh yeah, building spools is highly inefficient. It is essentially a row by row build of the object. And Microsoft has made absolutely no progress in improving that scenario over the years. So there’s a lot of bad stuff about eager index spools, especially when they are built off very large queries.

So if you’re working with a query and you have big tables in your database and you see a nested loops join, and on the inner side of that nested loops join, you see an eager index spool, not a lazy table spool, an eager index spool, you can be pretty sure that either you don’t have a good index to support whatever predicate search you’re doing, whether it’s via join or where clause or a combination of both, or the optimizer is once again taking its nose beers and not using the index that you have. And I’m going to show you an example of that.

So with that out of the way, let’s talk about how you can buy me some regular beers, because I have enough nose problems without anything else. I don’t need your help. If you would like to support my videos, you can sign up for a membership. It is as low as $4 a month. That’s not even enough to get another knuckle tattoo.

You can join the 30 or so other people who have done so and who care enough about my endeavors and my mental well-being to pay me $4. There’s a link right there in the video description for you to do that. How convenient. Make it very convenient for you to hand over money. If you have spent your $4 on beers of any variety, you may like, you may comment, you may subscribe.

I would suggest waiting until you’re sober to do so, though, because the only thing worse than comments are intoxic comments. We don’t want those, do we? Because comments are forever. If you need help with SQL Server, if you feel like SQL Server has indulged in too much of any substance and it is just not working out well for you, I am the best SQL Server consultant in the world, according to BeerGut Magazine, of course.

And if you need any of this stuff done, I am just the nose to do it. And as always, my rates are reasonable. If you would like some very high quality, very sober training, sobering training, maybe.

It’s like these videos. It’s fun. About SQL Server performance, you can get all of mine for about $150 a month. That is the rest of your life.

You don’t need to resubscribe or rebuy. That’s about it. That’s the URL. That’s the discount code. Of course, there is a fully assembled URL for you in the show video description. The show description.

Is this a show? I don’t know. I am showing you things, but… Now, you show up on your screen. It’s not a podcast because I’m not just like, you know, vocal frying into a microphone about dumb stuff. Anyway, more about upcoming events in 2025.

For now, I don’t know. Enjoy your alone time. With that out of the way, let us talk about eager index pools. Now, probably the most fascinating thing to me about eager index pools is how often the optimizer will put them into query plans, often to the detriment of performance.

In a sane world, and by sane, I mean like in a world in which the optimizer kept up with the times, and the optimizer was more aware of current storage trends. I realize that storage in the cloud is probably as awful as the old spinning disk storage that you used to have on your, you know, on-prem SQL servers.

But, you know, at the same time, it would be nice if the optimizer were a little bit more aware of these things and didn’t feel the way it feels about IO patterns sometimes. There’s very little difference between sequential and, or rather, like when you think about like old hard drives, there’s very little difference between like doing this on an SSD and doing this on an SSD. Right?

Because SSDs don’t do this like old hard drives do. Everything is kind of randomly scattered about. Oh, there’s memory sticks anyway. There’s very little like logical order in there. So the optimizer, not really up to date on those things.

And the optimizer will frequently pick nested loops joins plans and put an eager index pool on the inner side of the plan. The outer side is up here. Then there’s the loop join.

And then you come down here. And this is where the spool goes. It’s not a very good time for your queries. In a sane, rational world, like the one that I was trying to describe there, the optimizer would just choose a merge join. Maybe not a merge join.

A hash join. Let’s say the optimizer would opt for a hash join and just throw a missing index request and move on. But no, SQL Server has to do something totally different. Now, there have been a couple weird occasions in my life where I have written queries in a way specifically to get an eager index pool.

And not just for the sake of a demo. There have been times in my life when having an eager index pool built was actually a better thing for the query than anything else. And the way that I ended up doing stuff like that was you can see this join up here.

It’s just a regular old equality predicate joining post to votes. But down here, I have written this as two inequality predicates. A greater than, equal to, and less than, equal to.

And since merge and hash joins require at least one equality predicate in order to work, this gets us a change in the execution plan. Now, I don’t recommend it for this particular query.

For this particular query, it is an absolute disaster. But this is the much better plan where we just have the single equality predicate. But if we rewrite that single equality predicate as two inequality predicates, all of a sudden SQL Server says the only thing I can do is use a nested loops join and have this on the inner side of the join.

So this is what I was talking about. Like in a sane and rational world, SQL Server would encounter a situation where it may have considered this in the past, and it would just do this instead.

Just do a hash join. Say there’s a missing index. Fine. Anyone can get on board with that.

What’s hard to get on board with is SQL Server building a 52 million row index in the middle of your query execution. Again, there were some very special edge cases in which I did that on purpose because it suited the workload better. In this case, not good.

Not a good time. So if I were to let that run and get an actual plan, that would run for about almost two minutes. So we don’t want to do that.

But what’s very funny, and I’m going to explain this query a little bit. Oops, what did I do? Oh, there’s already an object called P1. All right, well, we’re just going to call you P11. We’re not going to stop and mess with that. I guess I was practicing this one too much.

But we have this query here. And what I want to tell you about this query is that this is not a natural join in the Stack Overflow database. The users table, the ID column in the users table is a clustered primary key.

The ID column in the post table is also the clustered primary key. That is not how those tables relate. These tables relate by a column in the post table called owner user ID. The funny thing is that the ID column on the post table, like I said, is the clustered primary key.

So there is an index on it. So when I’m saying, like, joining one clustered primary key to another clustered primary key, you would think SQL Server would say, oh, I can do this in, like, a number of different ways. I can seek.

I can do a loop join and seek to everything that I care about. I could just scan it once and do a hash join. It could also scan it once and do a merge join, but we don’t talk about merge joins. But it doesn’t choose to do that.

It chooses to do this instead. And it chooses to build an eager index pool off of the clustered primary key here. This takes about, well, there’s four and a half seconds here. And this is where I’m going to show you a couple of the things that I mentioned about eager index pools.

One of them is that they read from this table and build the index pool on a single thread. So even though this is a DOP8 query, we ain’t doing that at DOP8, are we? We are doing that at DOP1, 17 million rows.

Two is that, like, so this is in a parallel zone and that happens. So if we take the four and a half seconds it took to read from this table single threaded and subtract it from this, we end up with, what, like 15 seconds or so of time building the eager index pool.

There is no missing index request up here saying, hey, we could use an index. And I forget if exec sync shows up in the query weights. It does.

There it is. This is the exec sync weight that I was talking about that shows up in parallel execution plans. If this query ran at DOP1, there would be no weight indicating that an eager index pool was built. We only get this in the parallel version of the plan.

So this is all highly inefficient, highly ugly. And the worst part is that we already have an index here. So sometimes when I’m tuning queries and I see an eager index pool, I’m like, dummies didn’t make an index.

And then I go look and I’m like, this is a perfectly good index. Eagle server. Toot toot. So sometimes what you have to do is you have to tell SQL Server, no, no, no. Trust me, buddy.

Put the spoon down. We should be seeking instead. So before I move on, I just want to say this query runs for 22.171 seconds in total. But if we run this query with the four secant, this should only take a couple seconds.

Oh, yeah, there’s already an object called P2. We’re going to make it P2.2 then. Again, I practice these too well. Sometimes I’m just too good at my job.

And this takes 1.5 seconds. And it turns out without adding an index or doing anything, we had a perfectly seekable index for SQL Server to use. Look at that.

We have a clustered index seek. Before we had a scan and we had a spool. SQL Server. Your trip to the powder room was ill-advised.

So in a lot of cases, eager index pools are really just irritating missing index requests. Sometimes they are optimizer deficiencies and shortcomings. But the grand scheme of things, I don’t know.

What do I prefer? Four secants or having to wait and sit there and wait while an index creates? I’ll take that four secant any day. So if you’ve already done your job and you’ve already indexed your SQL Server queries for performance and you’re not getting performance, take a look at those query plans.

You might see something awful like that. Now, let’s get this thing running. And we’re going to talk a little bit about this.

So a lot of the queries where I show off bad things that happen, I use either cross-apply or outer-apply because the inner side of nested loops is an interesting place for the SQL Server’s optimizer. It does all sorts of goofy things in there. Parallel nested loops are especially strange birds.

I thought for a bit about doing some videos about parallel nested loops, but I don’t think I could… I don’t think that even with all the time in the world in, like, Paul White’s notes and the SQL Server’s source code that I could do adequate justice to how weird parallel nested loops are. There is simply too much going on there.

But we have this query, and this is another situation where you might see an eager index pool arise. Like, just have a little subquery like this with an aggregate in it. And if we look at this query plan, and let’s just drag this way over here, something quite similar happened, right?

Here is our eager index pool. Here is our post table. So essentially the same thing happened.

Now, this is the index that we currently have on the post table, right? It’s on post type ID, then parent ID, then owner user ID. The part of the query where we got an eager index pool was on this.

So SQL Server is essentially complaining that we don’t have an index that helps with this. And again, for whatever reason, SQL Server could have just chosen a hash join or a join. But it didn’t.

It chose a nested loops join. It told us nothing about an index that might be missing. It just created that index down here. Now, the normal way of solving this problem is, or, you know, mostly what I do is I right-click on the eager index pool. And I look at two sections.

I look at the seek predicates. And this is what I would create as the key of an index to fix this problem. In this case, SQL Server wants an index on parent ID, right? We can see the parent ID column reference here, which makes sense because we are correlating the outer query with the inner query on parent ID, right?

Look at that. How nice. And the other thing you want to look at is, let’s go back to the query plan, and let’s get those things back up.

The other thing you want to look at is the output list, because the output list will tell you what columns you will probably need to include in your index. So look at the seek predicate. That’s the key of your index.

Look at the output columns. Those are the include columns in your index. 99% of the time, this will be good enough. You may have to think about things a little bit differently if you have, like, a top one with an order by or something, because the order by column might need to go in the key of the index, too. But for most cases, this will be good enough for SQL Server, at least to use the index.

I don’t know if that’s going to make your query as fast as you want it to be, but that’s what you could do. Now, there are ways that we could go about fixing this, of course. Let’s say we wanted to create this index to fix it.

We could say create this index on parent ID and then score, right? And that’s essentially what SQL Server was asking for in the eager index pool plan. It wanted a key column on parent ID.

Fine. It wanted us to include score and ID. Well, we have to explicitly reference score, because score is just a regular column in the table. We don’t need to explicitly reference ID in here, because ID is the clustered primary key on the table.

So that automatically gets built into any nonclustered indexes that we create. And now if we run this query, let’s just look at the estimated plan real quick. Make sure that we are spool free.

Look at that. We have no more eager index spool. And now if we run this query for real, it will not take 20 or 30 odd seconds to run. Oh, it should be a lot faster.

I don’t know what this thing is doing now. Ah, there we go. For some reason, that took 10 seconds. I’m going to have to have a talk with SQL Server. Did some…

There was some… Oh, see? This is the problem. This is the problem. This is why I hate… This is why I hate parallel merge joins. Look at that. Isn’t that ugly?

Look at this. I bet this repartition streams has an order by, and we have introduced intratread dependencies into our parallel plan. This is a terrible idea for a SQL Server.

If you see situations like this, where you have anything that preserves or requires ordered operators around SQL Server and parallel execution plans and your SQL Server queries, run screaming. So let’s look at…

This one has an order by on. Does this one have? It does. We have two order preserving exchanges in a row. Under worst circumstances, this could have been really ugly. We could have had exchange spills.

We could have seen spills here and here and maybe… Actually, no, that’s about it. I doubt that one would have, but you could have seen these spill and have real big problems. This one probably came very close to it.

So be very mindful of that sort of thing. I’m going to go a little off script here. And what I’m going to do is just to see if my hypothesis is correct. I’m going to throw a hash join hint on this query because…

Actually, hash join loop join because I have a feeling SQL Server would tell me it couldn’t create a query plan just doing hash joins here. But let’s just see if this goes any better.

Maybe it will. Maybe it won’t. We’re going to find out together. All right. So yeah, this one took five seconds. All right. Notice that we don’t have terribly painful things happening in here anymore.

And now we get a very helpful bitmap up here. And this, rather than taking almost 10 seconds, this is down to five seconds. So a little off script, but, you know, if I don’t do that sometimes, I don’t know, I might appear a bit stodgy here.

But anyway, that’s one way of dealing with it. Another way of thinking about this is let’s actually let’s drop this index because what I want to do is show you something a little bit different than just indexing.

So let’s think about what this query is doing. We’re looking for this, right? The thing is we’re looking for answers.

But in the query logic, we care about showing you back up here. We care about where the answer score is greater than the max score on this. The thing is, the thing is we know something SQL Server doesn’t.

We know something about our data. And what we know about our data is that only queries with a post type ID of two can be an answer.

And we already have an index where post type ID is the leading key column. So if we were to rewrite our query to do this instead, we wouldn’t need that other index, right?

So if we actually take a minute to consider our query a little bit, this looks like a very similar plan to the one we had before just without the spool down in here. Now we have another index seek in there. So if we run this query and we wait, I can’t promise that this is going to be as good and fast as our hash join query.

It’s probably about a second slower. But we look at the query plan. So this finished in about 6.2 seconds. The hash join plan was about five or so seconds. But we really do improve this area in here, right?

There’s no more eager index pool. And this time we took advantage of our index on post type ID and parent ID. And we have sort of a double hop seek where we have parent ID and post type ID being seeked into.

So sometimes when you’re trying to fix eager index pools, if you don’t have a good index in place at all for the query, you may need to create one. If you already have a good index in place, you probably need to use a force seek hint.

And if you have an index that like a nonclustered index that’s like is just a little bit off from what you’re trying to do, pay attention to the way that your query is written because you might be missing some valuable information to give SQL Server in order to be able to use that index more effectively. In this case, we used our domain knowledge about the data in the stack overflow database to say, hey, we only care about comparing answer scores to other answer scores. We need to just find answers in the inner side of that aggregate that we’re comparing to.

So with that out of the way, I guess that’s the whole video. Cool. It only took 22 minutes and some odd seconds.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope to see you. Well, I hope you see me in the next video. If you don’t, well, it was nice. You might be dead.

That’s real sad. Yeesh. I don’t like the sound of that. Anyway, thank you for watching. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Fixing non-SARGable Predicates

Indexing SQL Server Queries For Performance: Fixing non-SARGable Predicates



Thanks for watching!

Video Summary

In this video, I discuss the challenges of dealing with non-sargable predicates in SQL Server queries and how rearranging indexes can help improve performance. I share a practical example where a computed column is wrapped in a function within a WHERE clause, leading to suboptimal query execution. By creating an index that allows for better seeking, we demonstrate how to mitigate these issues without altering the underlying stored procedure, which might be immutable due to vendor restrictions or other constraints. This approach showcases a clever workaround using SQL Server’s indexing capabilities to optimize queries even when direct code changes are not possible.

Full Transcript

Erik Darling here. Lo and behold, with Darling Data. No acquisition letters yet. I applied to Y Combinator and I said, hey, anyone want to buy this thing? So far, someone sent me a bottle of lube and a can of pork beans. Those are the only offers I’ve gotten so far. I don’t really know what to do with that. I’ll probably just keep making these videos in the meantime. So anyway, we’re going to sally forth with our series on indexing to improve query performance in your SQL Server queries. And in today’s video, we’re going to talk about how you can rearrange indexes to fix non-sargable predicates. I know that there’s, I don’t know, maybe depending on what your reading habits are or what your interests are in general, you may have seen blog posts about using computed columns to do this. We’re going to talk about computed columns a little bit later in this series because I had a comment about computed columns that now has set me ablaze.

So this is not that, this is not about using computed columns. So if you are expecting that, well, stick around because we’re going to do something different. Maybe you’ve never seen it before. Again, depends on your educational background a little bit for SQL Server things. But before we do that, we of course need to talk about your financial background and your financial backing of Darling Data Enterprises. If you would like to support the content that I create on this channel, there is a link down in the video description that says become a member. And by clicking on that link, you can give me four bucks a month to support this channel, which after taxes is a little over $3. So who is the real winner?

I don’t know. If you, you know, I realize that it’s Christmas and while some of you might feel giving in the realm of $4, you may have already bought your, your, your Nana or your aunt or I don’t know, a step parent, some, some dollar scratch offs or maybe a couple of $2 scratch offs. And that, that, that $4 has evaporated from your wallet, from your financial world. In that case, you can always like or comment or subscribe.

I’m happy to see numbers of any sort go up except blood, blood pressure and cholesterol. It’s the only ones we like to see either go down or stay even. It’s the only things we care about.

If you need help with SQL Server, I am famous for processing. You’re performing Christmas miracles. So if you need any of this stuff done, if your SQL Server is slow, I don’t know. Do you sell a lot of things on Christmas?

Probably not. But you know, it is the holiday season when sales do tend to spike. So if you have, having SQL Server problems, I am a consultant that fixes them and my rates are reasonable. So there’s really no reason for you to go anywhere else or talk to anyone else because they won’t, they won’t do as good of a job as me.

If you want some great SQL Server training all about performance tuning and other, well, really just performance tuning. I was going to say other things, but I pretty much stick to the performance tuning stuff because I don’t want to spread myself too thin. If I start pretending that I know about availability groups, I feel like you would see through that charade pretty quickly.

But you can get all of mine. It’s about 24, 25 hours of content for the rest of your life for about 150 USD when you use Yonder coupon code. The fully assembled link to perform all of these Christmas miracles is also in the video description.

So you don’t have to do too much work or typing because I’ve seen, I’ve seen your typing. Not impressive. Mavis Beacon would be incredibly depressed if she saw you typing.

Rolling in her, rolling in her CD-ROM drive. Because she is. More news on events as 2025 progresses for now.

You must live in darkness. Sorry about that. But with that out of the way, let’s talk about fixing non-sargable predicates with indexes. Now, I think I dropped this index.

Yes. Well, if I didn’t before, I did now. So I’m going to start creating this and then I’m going to start talking. So I don’t, you don’t have to sit there and wait for an index to create. That’s just rude and crude and vile.

And for some reason, ZoomIt is taking its sweet time. All right, let’s move some of this stuff around a little bit. So it’s all on the same screen. We don’t need you struggling to read. I know I’ve, again, much like you’re typing, I’ve seen your reading and you are, you are not very good at either one.

So as a consultant who gets called in to deal with performance issues to this very day, despite the bevity of, be, be, be, be, be, be, be, be, be, be, be, be, be, be, be. Despite the bevy of knowledge distributed across the internet widely and freely that when you start wrapping columns in your where clause and your join clause and functions, even the built-in ones in SQL Server, you are bound to at some point have that cause some performance issue, big or small, it will creep up on you.

And the bigger your data gets, the worse these performance issues will get. I fix these things constantly. It is like an unwritten rule of my consulting engagements that I will be pulling is null and coalesce and substring and replace and all the other things out of joins and where clauses where people are just done, just plain dumb things with them.

And what, but what, what sucks is, gosh, the, the, I think the number of actually qualified third party vendors who make products that, that, that interface with SQL Server is, is very, very small. Even Microsoft’s own, like software products that touch SQL Server, the, some of the worst code and database design and indexing I have ever seen in my life. It’s like the people who make software at Microsoft are here.

And the database team at Microsoft is here. And there is like, just like this, like this silo bunker wall between them. Like there’s just no way to meet them.

Like even the people who work on like entity framework, like they just do things that like, if anyone with a reasonable sense of databases saw what they were doing, they would just hit them with a four by four. They would like hacksaw Jim Duggan, just whack them.

I don’t know where they get this stuff from. It, it, it, it never ceases to, I think they do they just, I think they just want to keep me in business. That’s what it feels like.

I mean, if that’s the case, thank you. I appreciate your contributions, but man, it’s astounding to watch. Now, this is not a giant big fireworks demo. This is, this is just to prove a point that you can, you can make a difference even if you can’t change the code, which was, I think was the point that I was going to make when, when I started talking about vendors and Microsoft and entity framework is that sometimes you can’t change the query.

Sometimes there are things and things that you are not allowed to change. Now, vendors might disallow you from changing store procedures. That might say that puts you out of support.

You might not be able to figure out how to fix whatever entity framework query is being generated. You might, you know, like you might have zero, like queries might come from like binaries or DLLs or something built into your software. And you might not be able to do anything with, with the change that, right?

Like, what are you going to do? You’re, you’re, you’re hosed on that. You can’t change that. The vendor has to change that. And the, you know, the vendor won’t do anything. Won’t lift a finger to help you.

Who are you? Anyway, let’s say we have this index, right? And we’re going to, we’re going to pretend that the store procedure below is completely immutable. We are not allowed to change it.

We, we, we’re going to, we’re going to, we’re going to, we’re going to straddle a fine line here and change an index. All right. We’re going to be, there’s a secret that you and I are going to have to keep. We are, we might be violating a EULA somewhere.

I don’t know. But this is the index that we have currently on community owned date and score. This is on the post table. In case you, in case, again, I know you’re, I know how you’re reading is. That’s the post table.

And let’s pretend that this queer, and I’m going to handicap this query a bit rather. I’m going to, I’m going to say, I’m going to hobble this query a bit. I’m going to, I like, I like misery as a movie and I like, I like the term hobbling. So we’re going to say, we’re going to hobble this query, but I like misery much better than golf.

In golf, you get a handicap and misery. You get your ankles broken, the sledgehammer, but we’re going to hobble this query a little bit to make sure that SQL Server can’t use batch mode to do anything better.

And we’re going to limit this to max.stop one so that I can sort of show you just how profound a difference you can get with these changes, because you know, who knows, maybe you had some, maybe you hired a crappy consultant who told you to set cost threshold for parallels to like 350 or something for no reason.

Just a wild guess. They were like, Oh, those C, look at all those CX weights. Let’s, let’s fix that. Because this is the kind of help you get at unreasonable rates. When you pay Erik Darling with his reasonable rates, you get good advice.

When you pay other people, I don’t know what you get. I’ve seen the results though. They’re not good. So we have this store procedure and this store procedure is going to break a cardinal rule of querying.

It is going to wrap the community owned date. There’s the lead column in our index and the coalesce function. We’re going to, I don’t know why people do things like this. They think that like, what do you think nulls are going to throw an error?

You think SQL Server can’t handle nulls? It’s your stupid programming languages that throw those null exceptions. It’s not SQL Server. But people do dump stuff like this all the time.

Don’t ask me why. I didn’t, I didn’t make them do it. But when we run this query and we look at the execution plan, we’re going to be unhappy, aren’t we?

We’re not going to be thrilled with the performance of this thing. It takes 1.5 seconds and it scans our index. All right. We have, we have a perfectly good index for seeking into the, the, the community owned date column, but yet we don’t.

We do not seek into this index. We scan the entire thing. It takes a second and a half and we’re, we don’t like that. So what we’re going to do, because we’re, we’re allowed, we’re, we’re, we’re going to pretend that we’re allowed to do this.

We can’t change the store procedure. We can’t just take that coalesce. Like we can make a copy of the store procedure where the coalesce is removed and we could run it side by side with that, with the other store procedure. And we could send the results to a software vendor and they would say, no, no, we’re not helping you, which is what happens all the time because software vendors suck.

But if, but let’s just say we were going to change this index and we’re going to use one of my favorite index options in the world, drop existing equals on, right? Just slide this index in place, drop it on in, which, which is good for us.

So, uh, now when we run this query or sorry, when we run this store procedure, it does not take a second and a half. It takes about a hundred milliseconds because now we can seek into the index for the score, to the score column.

And we just have that stupid residual predicate, right? We now have a SQL predicate on score and we just have the awfulness that is the coalesce function, uh, running its case expression on the community owned date column. And that’s a residual predicate, but we can, we can at least seek to the scores we care about.

And that’s, that’s the good part for us, right? We no longer have to worry about, um, all that, uh, scanning of the index because of the coalesce function. Now we can seek to the, the, the, the scores that we want and then evaluate the community owned date second.

And there’s a lot less over, or there’s a lot less pain in that. There’s a lot less suffering than that. It’s far less overhead because we have already sought, you’ve already done a seek to the limited number of values we care about and applying that residual predicate is not a big deal. If the score column were on a, the score come at a less selective predicate, you know, might be, might be a seek of the whole table plus a residual predicate of the whole table.

That would potentially be less, that would potentially be equally as unpleasant as just a scan of the whole table. But, uh, we’re going to pretend that we’re, we’re a little bit smarter than that. So anyway, this is how you can re, uh, arrange your indexes to beat non-sargable predicates.

Uh, often SQL Server will do the, the smart thing and choose better indexes when it can seek because it thinks, you know, it’s like, Ooh, look at this nice cheap seek I can do. Uh, SQL Server being the cheapskate that it is, will often choose the smarter index in this scenario.

Even if you had another index hanging around, um, if like, you know, in this case, we change an index. You like, you know, if you, you, you, you run the SQL Server that your software runs on, uh, that started the vendor software runs on, you might just create a new index.

And then, you know, when you have to do a software upgrade, you just quietly drop or disable the, the custom indexes you’ve created. So the, the, the, the installer doesn’t explode and say, how dare you try to do better than us? You know, like, cause you’re all idiots.

You know, I don’t know. I don’t understand how you get paid to do what you do. I don’t understand how, how your software company makes hundreds of millions of dollars a year. The product is garbage.

Ah, well, what can you do? SQL Server is $7,000 a core and it can’t even handle a column wrapped in a function still. So, one wonders how, how deep the rot goes sometimes, I suppose. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. Uh, I hope that you, uh, are, are happy and healthy and feeling loved and appreciated by all those around you. I don’t know.

Maybe you have, maybe you have a cute dog or something. Hmm. Like, licks your hand and take it for a walk. Sounds nice, right? Smoke a cigar.

Anyway, uh, I’m gonna go record something else now. I think I have another video. Uh, apparently I have 50,000 videos to record. So, um, I’m gonna try to get that done. Anyway, thank you for watching.

Goodbye. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Predicate Key Lookups

Indexing SQL Server Queries For Performance: Predicate Key Lookups



Thanks for watching!

Video Summary

In this video, I dive into the world of indexing for performance optimization, specifically focusing on fixing predicates in key lookups. I explore how these lookups can become problematic when queries suffer from parameter sensitivity issues—those pesky “sensitive Sally parameters” that make query plans unpredictable and inefficient. The goal is to reduce the time it takes for a user’s query to return results, ensuring every second counts. Throughout the video, I also highlight ways to support my channel through membership, offering training and consulting services for SQL Server performance tuning, making sure you have all the tools needed to tackle these challenges yourself.

Full Transcript

Erik Darling here. Darling data. I’m a little fuzzy here. Let’s unfuzz myself. There we go. Now you can see everything in crisp, clear, high definition detail. In today’s video, we’re going to talk about indexing for performance, specifically around fixing predicates in key lookups. Now, I’m going to talk about exactly what all that means. when we get to the demos. But it is generally a sensible approach to take when index tuning, because the last thing you want to do every time you make a trip back to the clustered index is evaluate if a predicate passes one of your filters. You don’t want to do that. It’s not good for performance at all. It’s a bad time, especially when you have queries that suffer from parameter sensitivity issues. Which, oh boy, don’t we all have those. sensitive Sally parameters. But before we get into that, let’s talk about things that I am sensitive to. Like money. I do need it for things. If you would like to become an appreciative member of this channel, and support my endeavors in making this content for you, you can go to the video description and click the link next to become a member. And you can become a member for as little as $4 a month, of which I will see $3.10 post taxes. If you have spent all your money on dirty women and booze or whatever else you spend money on. Grass seeds, lawn mowing equipment, great outdoors, I don’t know, hiking gear. You can like, you can comment, you can subscribe. Always to keep me from bashing my head against the wall. It’s a great time.

If you need help with SQL Server because you are having problems like the ones that I describe in these videos. Or maybe you don’t know what kind of performance problems you’re having and you’re just looking for someone to untangle all that for you. Oh gosh darn it, I am the best consultant in the world. At least for SQL Server. For other things, probably not so much. I can do all of this stuff. And as always, my rates, they’re reasonable. You can get all of my training for $150 just about for the rest of your life. If you use that discount code at that training site up there, it’s a very good deal. You should you should get it. It’s not just for Black Friday. It’s for every day. So aren’t you lucky? No upcoming events until later in 2025. I’ve said this a million times. We’ll talk about this stuff later. And with that out of the way, let’s get on to talking about these here predicate key lookups. So our goal as query tuners, as people who attempt to make things faster in exchange for money, is to reduce the amount of time it takes from a user pushing a button until that user gets a result. I don’t care what other metric you reduce or increase in that endeavor. You could make logical reads go up or down. You can make CPU go up or down. You can make memory go up or down.

But as long as the duration, the wall clock time, the tick, tick, tick, tick, tick, tick, tick, tick that it takes before that query returns results to someone gets shorter, I think you’ve done a pretty good job. Sometimes we’ll have to make queries go parallel to get faster. Sometimes we’ll have to use more memory to avoid spilling to disk or something. Sometimes you might have to use more space in tempTB by putting some result in a temp table first. Sometimes we need to add indexes to our database in order to make queries faster. These are all tradeoffs. All things that represent tradeoffs when you are query tuning.

Yes, sometimes you have to give up a little of this stuff in order to make the query faster. It’s okay. This is what databases were built to do. Deal with these tradeoffs. The only real exclusion to any of this stuff is, of course, a query rewrite. If you can just change the way that a query is written fundamentally in order to make it faster, great. You have sacrificed very little except your time and knowledge and fingers and typing and probably some portion of your humanity.

You know, like I’m not going to tell you to add a hundred indexes to a table to make some weird queries faster, but sometimes you do. Sometimes if a table has like no, like just a clustered index on it and you’re like, well, gosh, we’re not just searching from that thing. You’re, you’re okay. You have some freedom to add other indexes in. Other times you have to use some judgment.

You may have to consider what indexes are already there first. Sometimes altering an existing index is a better choice. Sometimes there are a lot of unused or already duplicative indexes you could clean up before adding another one to the pile. There are many things that you could consider that you could use the old jogging noggin and you could figure out first.

Now, like I said, there are various things in queries that are, that force the SQL Server to do dumb things. Some of those dumb things are things like local variables, table variables, non-sargable predicates, overly complicated joins and where clauses and stuff. There are all sorts of things that, you know, you can untangle for free that don’t involve you adding another thing to the database, whether it’s an index or whether it’s a temp table or something like that.

There are all sorts of things you can do that, that don’t have much of a trade off. But changing those things might change other stuff. Like you might get a parallel plan. You might use more memory, something like that. So, you know, I tend to, you know, I like rewriting a query, but sometimes all the query rewriting in the world isn’t going to change the state of the database in a way that makes that query any faster.

Now, key lookups represent a choice made by the optimizer between scanning a clustered index where all of the columns are available in the table. Sometimes it helps to, instead of thinking of things as a clustered index, to think of a table as a clustered table if a clustered index exists on that table. But the clustered index makes all, has all of the columns available in it.

And sometimes SQL Server thinks that it is cheaper to do one big scan of the table, have all the columns available, and have, be able to touch all of the data that it needs from one single place than it is to seek or scan into a nonclustered index also on the table, and then go back to the clustered index in order to retrieve columns that are not part of that nonclustered index to satisfy the remainder of the query. Typically, lookups get chosen when there’s a relatively small number of rows that are expected, but, you know, all sorts of things play into this.

And sometimes the number of rows that are expected are not the number of rows that end up making that transversal in reality. That traversal in reality. I guess I should have said. I don’t know what a transversal is. All right. Ah, screw it. Bad estimates in general.

And, of course, self-inflicted bad estimates. You know, again, local variables, table variables, non-soluble predicates, things like that, are a very way to eat, all things that can contribute to you having those problems.

But we’re going to look at an issue with parameter sensitivity around lookups. So, I think a lot of the times fixing lookups to avoid just columns you’re selecting is almost a last resort for me. There’s a lot of other things I would rather do than that.

But there’s only so much time in this video that I can talk about those other things. So, we already have this index created on the post table. And let’s pretend that this index was created by some long gone developer in the past.

And this query was great. Sorry, this index was great either, I don’t know, for the query below before we added more stuff to it, or for a different query altogether.

This index made perfect sense. And since this index made perfect sense for some other query, this query comes along and says, hey, you make perfect sense to me too. Now, there are a couple of things in this query that you might notice.

If you look up at the index definition, it is on score and then creation date and then post type ID. And that does take care of most of what we’re doing in here. It even takes care of most of what we’re doing in here.

Now, just make sure that everyone peeps this in their head. If you have a clustered table or a clustered index on a table, right, it’s the same deal. The table is clustered.

The clustered index key column or columns will be part of every nonclustered index you create. In a non-unique nonclustered index, they are an additional key column. In a unique nonclustered index, they are an included column.

So the ID column on the post table is the clustered primary key. So this is part of this index up here. You can just pretend it’s right at the end here.

So this does take care of most of what we care about. Now, the other thing you might notice is that I am doing select star. The fun thing is, it doesn’t matter how many columns are involved in the select or how many columns are involved in the lookup that you might do.

Every lookup has the exact same optimizer cost. It doesn’t matter if you’re selecting one column or a thousand columns. Every optimizer estimate for that lookup will have the same cost.

It doesn’t matter how many columns you select. So if we run this query, rather run the store procedure with a set of parameters that will touch a small number of rows, this will run relatively quickly.

Right? 326 milliseconds, but we have this lookup over here. And you might look at these numbers under the lookup and think, gosh darn, that looks funny. 7 of 20844467.

That’s 2.2 million something numbers. And the reason why that estimate looks so funny is because this lookup has a predicate. Now, every lookup will have at least a couple things in it.

Well, actually, that does depend a little bit. Every lookup will have at least this one thing in it. This seek predicate, which you can kind of see is sort of like a self-join because we’re joining the post column to itself on the ID column.

That is why the clustered index key column or columns end up in your nonclustered indexes. So SQL Server can do neat stuff like this. It can take that ID and it can look up rows in the clustered index from the nonclustered index.

Now, the two things in a lookup that may or may not be there. One is an output list. If you are just applying a predicate, actually, that might show up in the output list as well.

This might always be there. But the important thing is that this one has a predicate up at the top. So you have the output list, which are columns that are being selected.

And you have the predicate, which are additional filters that are being applied when we make a round trip back to the clustered index. So for every row that comes out of this nonclustered index here where we seek into it, every row that comes out, since this is a loop join, it’s one row. Boop, boop, boop, boop, boop, boop, boop, boop.

One row comes out and then we evaluate another predicate there. So we’re able to do our initial set of filtering. We’re able to seek to some of the filters we care about here and apply a predicate to some other filters here. I guess there’s a little bit of overlap in there just because of the state of things, but sort of beyond the point of this.

But the seek here does allow us to get to some of the rows that we care about. But then we have to filter out additional rows that we care about. Look how much this reduces things by.

Well, we get 327,567 rows out of here. That means we make that many round trips down to the clustered index here to do those lookups. So we made 327,567 round trips.

And at the end of all of those round trips, we ended up with only seven matching rows came out of that. That’s why for the rest of the query, we only have those seven rows here. So we were able to do some of the filtering with the nonclustered index, but not all of the filtering.

So, you know, like one thing that’s sort of important to figure out is what’s going to happen when we put it, we pass in a different set of parameters. So the set of parameters we passed in up here were a score of zero, which is we’re going to find every score greater than zero, which is most all of them. We’re looking for this is a creation date.

So just the last couple months of 2013, which for the stack overflow 2013 database is actually pretty tiny, too. And post type ID of one. Well, there’s about 6 million questions in the post table.

So, you know, this this thing doesn’t really filter stuff out. This thing does a pretty good job of filtering stuff out. And then this this is this is what we evaluate in the lookup because this owner user ID column is not in our index. So if we run this for a more set of parameters that has more rows come out of that initial nonclustered index seek, this is where lookups start to get painful.

Because what was a pretty sensible plan that ran in like 300 something milliseconds for that initial set of parameters does not do well with this one. We end up with a lot more rows. Now, keep in mind, we have the same cardinality estimate here that we did in the last plan.

In the last plan, this was high. In this plan, the number of rows that we get out is 1884% higher than our estimate. So we make a lot more trips back to the where is it actual number of rows.

We make a lot of a lot more trips back to the a lot more round trips back to the clustered index in order to get what we want from this thing. Right. So that remember, that’s still one row down here and then into here. Right. So one row at a time.

We do that and we end up with twenty seven thousand eight hundred and sixty nine matching rows from here. And that’s why we have. But you can see like the majority of the time is that we have about a second there, about a second and a half there. So that’s three and a half seconds total.

And then a little additional time in the nested loop. And then for some reason, the gather streams has a lower number than this. And then we go into the sort and we spend about twenty five more milliseconds there. One thing that’s important to do is when you’re looking at parallel plans, try not to focus too much on the execution time of exchanges.

They are a weird black hole of mystery and the code and all this. The code and all the operator timing stuff is very strange around them. So try not to look at those too much.

Anyway, if knowing what we know about indexing, we could do one of two things. We could change our index and we could add owner user ID as a last key column here. This would at least allow us to do all our filtering from one single index.

If you’re able to do this and get good enough performance, don’t sweat it. That’s totally fine. Adding one more column to the end of the key of an index is not going to change anything for any other queries that touch this index. But if we apply a little bit of domain knowledge, right, we might want to think about either adding a new index that better suits this query overall.

Or we might want to think about maybe if we have really good domain knowledge and like we’re like this index sucks and it, you know, never helps anything or rather it makes everything slow or blah, blah, blah, blah, blah. Every query that touches this index is awful. We might just we might just replace the index entirely with one that better satisfies what we’re searching for.

So owner you in our query owner user ID and post type ID are equality predicates and then scoring creation date are inequality or range predicates. We’re doing greater than or equal to on both or on both of these. And then, of course, if we do equality searches on owner user ID and post type ID, then score is in order for us here, which means that when we run these two queries, these will be a lot faster in general.

And we don’t have to we don’t have the same parameter sensitivity issues when these things run. One other thing that’s important for these, which, you know, might be important when in queries that you’re tuning, is that since we have our index set up correctly, we also don’t have to deal with sorting this data. Remember, our query from up here a little bit, if you’ll permit me to scroll, is asking.

Remember, we have our equality predicates on these two columns and we have our inequality or range predicates on these two columns. And then we’re ordering by these three columns. Now, ID is still going to be that last key column in the clustered index and the nonclustered index that we have because it’s a clustered primary key.

And then score and creation date are going to be in order because we have searched equality predicates across the leading two columns in the index. So we avoid having to sort data altogether. We have a top in here, but we do not have it’s not a top and sort.

And we do not have an explicit sort to put our data in the order that we care about because the index does that for us, which is pretty cool. So when you’re troubleshooting query performance, whether it’s parameter sniffing or just bad estimates or whatever it is, and you have a key lookup in your plan that is evaluating a predicate like I showed you in the earlier query. And, you know, you get the actual execution plan and you spend a lot of time in that loop.

It is usually worth fixing the predicate part of that lookup by adding whatever column or columns are being evaluated in the predicate to whatever index is being used by the query already. You might also choose to add a new query or add a new index that better satisfies the query as a whole. That’s also an option to you. Or you might want to, you know, replace the current index because maybe it’s just not good for any searches that touch it.

But that’s a much, much more difficult thing to ascertain just from query, tuning a single query. You’d have to know the workload pretty well to make a call like that. But anyway, this is a good thing to fix.

This is like honestly something that, you know, I have to do quite a bit is, you know, fixing up indexes a little bit to make sure that they are, you know, that they, they, they act, they act, they act. And I don’t want to adequately service the queries that are touching them because a lot of the times I’m dealing with queer indexes created by developers from, you know, all walks of life and all levels of experience with SQL Server. Some of them maybe knew what they were doing better than others.

But this is something that I do have to fix quite a bit. So hopefully you can use this knowledge to start fixing things for yourselves. So anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in the next video, which will be about something. I forget what, for some reason, tab number five is hiding from me. So maybe that’ll just be a nice surprise for all of us when I, when I get to that.

So anyway, thank you and goodbye. Bye. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Unpredictable Searches

Indexing SQL Server Queries For Performance: Unpredictable Searches



Thanks for watching!

Video Summary

In this video, I delve into indexing strategies for handling unpredictable searches in SQL Server. Unpredictable searches are essentially any queries where the WHERE clause, JOIN conditions, or selected columns can vary widely—think dynamic SQL within stored procedures or ad hoc queries from front-end dashboards. I emphasize a common pitfall: using double wildcard searches on every column with a parameterized search string, which can lead to significant performance issues due to full table scans and implicit conversions. To address this, I introduce the `RECOMPILE` hint as an effective solution for many scenarios, explaining how it allows SQL Server to use actual parameter values in the query plan, thus avoiding caching plans that might not be optimal. Additionally, I discuss the challenges developers face when dealing with such queries, including the lazy approach of relying on `RECOMPILE`, and highlight upcoming features like Microsoft’s new “Oppo” optimizer for handling optional parameters more efficiently in future SQL Server versions.

Full Transcript

Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Erik Darling here with Darling Data. As you’d well expect at this point, if you’re surprised by any of this, I don’t know what’s wrong with you. Maybe you should start taking notes about your day. You might have some sort of head knock that’s plaguing you in some way. I would suggest getting that checked out by a medical professional as well. At this point, I feel like this is a well-established set of facts. I am Erik Darling, and my company is Darling Data, and we talk about SQL Server. That is our goal. That is our role goal. In today’s video, we are going to talk about indexing for performance with unpredictable searches. You can consider an unpredictable search as anything that generates a query. Intel. Shut up. Intel. God damn it. Anything that generates a query that you don’t know what the WHERE clause is going to be, or maybe you don’t know what the JOIN sometimes are going to be, or what columns are going to select, or something like that.

It could be dynamic SQL in a store procedure, or it could be some ad hoc query generated by a front-end dashboard type thing that you have created, hopefully in a sober and thoughtful way. But that is what an unpredictable search is for me. Now, you have no idea what users are going to ask for, or maybe even what quantity they are going to ask for it in. There are all sorts of crazy things that you have to account for. But of course, before we talk about how you can account for that, we must talk about other accounting matters. More like my accounting matters. We are being honest here.

If you would like to support the content that I create on this channel somewhere in nearer around this channel for the low, low price of $4 a month, you can click the little video description in the video description. Click the link in the video description. There we go. Those are the right, that’s the right chain of words. And you can become a member. $4 a month. That’s about $3.10 for me after YouTube takes out taxes.

If you are just too encumbered by other debts and ransoms and your money is already allocated fully to other endeavors, you can like, you can comment, you can subscribe. There are all sorts of things you can do to let me know that you love me and would die for me. If you look at the things that I talk about on this channel and the first thing you think is, gosh darn it, that Erik Darling sure can make SQL Server faster in exchange for money.

Perhaps I could do one of these things for you because I’m very helpful when it comes to these things. And not only am I helpful, but my rates are reasonable. And while we’re on the subject of reasonable, gosh darn it, have you ever wanted to get 24 hours of SQL Server Performance Tuning training for $150 and not have to pay another $150 every year for the rest of your natural life?

You can do that with my training. It’s amazing. You click on that link, you enter that discount code, and kaboom, you have what I just described. It’s pretty wonderful.

Mouse. No upcoming events 2025, blah, blah, blah. We’ll talk about that later. But now let’s talk about this indexing conundrum that we have. First, I’m going to close event viewer.

I was troubleshooting a blue screen earlier. I know. Do my talents end anywhere? Yes. Pretty much where the computer ends.

The limitation right there. Where the computer stops. No idea what I’m doing outside of that. It’s funny how that works.

So we’re going to talk about one of my favorite things. I’m going to move some of this text down a little bit there so it fits on the screen better. One of my favorite things that I help clients with are big ugly queries. I know. Weird that a performance tuner likes big ugly queries.

Suppose it’s a lot easier to take a big ugly query and chop out all the easy stuff and make it go faster than to take what looks like a perfect query and tune that better. But, you know, some people just don’t know what they’re doing in either case. So that’s why people like me exist.

So one of the most common things that I see people start with looks something like this. What you see my head is mildly infringing upon. Where someone will have a parameter or whatever called search string.

And that search string will be used to double wildcard search every column in a table. This is one of the worst possible things you can do if you need to handle unpredictable searches in SQL Server. I beg, I beg you not to do this.

This will end up terribly. You may, you may look at some of the columns in this list like owner user ID and creation date and last activity date and think, hmm, Eric, I don’t think those are strings. And I’d say, well, you are right, but you’re searching them like strings and all hell’s going to break loose.

It’s not going to be a good time for you or your SQL Server. So, like, you really shouldn’t be doing anything that even looks like this. The problem is somewhat obvious if you’ve been using SQL Server for long enough.

Double wildcard searches and searching columns with mismatching types will lead to not only big scans because of the double wildcard, but also you’re going to be dealing with implicit conversions. Everything, the title column actually is a nvarkar 250 or something column.

So that would at least not result in an implicit conversion right there, but the rest of them you are hosed on. It’s going to be a bad time. What you might think is that using something like some of the built-in functions in SQL Server might be more highly performative, not performant, we don’t say that word in this channel.

I might beep that out later if I can figure out how to beep things out. You might think that it might be more performative. There might be more performativeness if you use car index or pad index, but generally there is not. There are some edge cases where I’ve seen them, like, do somewhat better, but it’s not really worth talking about.

But the problem in, like, the way that you deal with null parameters, if you use, like, isNull or coalesce or anything like that, it’s not a more clever scenario than doing something like this. Both of these end up with the exact same problems unless you use a statement level recompile on your query, right?

Something like that. If you do this, you can get around most of the issues that come out with these unpredictable searches. Most of the time, this will be Mr. Fix-It for you, right?

This will do okay. And for a lot of people, this is the path of least resistance, and this is the simplest thing to do, because the alternative is writing a lot of dynamic SQL to cope with which parameters actually need to be part of the where clause. I’m going to show you an example of just using recompile, it being nice and easy, and as long as you’re not allergic to compiling a query plan every time a thing runs, whether it’s because, you know, you are burned out on CPU anyway, if you have CPU slack, and you’re able to compile queries whenever you want, this is a perfectly fine thing, as long as that, like, the compile time for that query is not awful.

If you, every time you, every time you run this query with a recompile, it runs for a long time, but the time isn’t, like, in the query plan, you might want to check the compilation time to make sure SQL Server isn’t going off on one of its little, one of those little, like, thought cloud staring things. So there is that.

Now, the problem with developers is that they are often quite lazy in the database. They often think that they are far more clever than they are when it comes to things in, doing things in the database. You know, SQL Server is an expensive piece of equipment, but people treat it, it’s kind of like a garbage dump for, like, their worst code.

The recompile advice is generally good enough, but when you use it, again, just pay attention to compile times. There’s like a certain, like, you know, if it’s like, you know, a few hundred milliseconds, screw it. But like, if it’s, you know, getting up into like five, 10 seconds of compile time, it might be, you might have to think about alternate ways of doing this.

SQL Server right now doesn’t, right now doesn’t offer any great programmability or optimizer support for the types of queries I’m talking about. However, it was, it was a thing that came up in the SQL Server 2025 release notes that there is a new optimizer feature, feature called Oppo, which is the optional parameter problem orifice, or I forget what the other PO is for. The optional parameter problem or something.

So it looks like Microsoft is taking some steps to try to address this in SQL Server 2025. So, you know, sooner or later, you will probably see that in Azure SQL DB. And if Microsoft ever decides that it cares about managed instance again, you might even see it come to managed instance.

Microsoft’s track record with getting things cloud first has not been awesome lately. There are signs that there may be a return to that ethos, but who knows? It’s a bit weird.

But anyway, sometimes writing good queries does require extra typing and thinking and that’s often not what developers are famous for. You want to start throwing weird features at it like Hecaton and memory stuff. You feel like you need to partition your table when you don’t.

You want to start, you want to move to Postgres because you swear Postgres will just do better at everything. You’ve heard it’s this magical unicorn and everything’s better there. Or you want to start using NoSQL, you know, stick it all in Elasticsearch or MongoDB or whatever.

Or you decide that maybe you just need to build your own ORM. Start from scratch. It’s 18 to 24 months of work.

Far more interesting than five to 10 minutes of typing some extra characters into SSMS. So, if we have this query and like a store procedure, let’s say, this gets a little more convenient to show this here than like a dumb front end that I would make badly anyway. And we stick a recompile hint at the end of this thing.

This will do okay. Without the recompile hint, this will go really, really poorly. You might notice that the time over here at the end is six minutes and 17 seconds. That is a very long query.

A lot of that is because of a bad memory estimate where the sort does take an additional like six minutes and three seconds. Because when we get up to this filter operator, we’re at 13 seconds, which still isn’t great. Right?

We can see very clearly that we spent a lot of time in other places in the query leading up to that six minute ordeal. But the main problem is without the recompile hint, the predicates in your search, and this doesn’t matter, again, if you use is null, coalesce, whatever other clever arrangement you think you’ve found that makes the optimizer do smarter stuff. That predicate is always going to look something like this.

This is because SQL Server has to cache a plan that’s safe for any outcome of these parameters being null or not. The recompile hint gets around this by allowing for something called a parameter embedding optimization, which allows SQL Server to infer the parameter, take the parameter values and use them in the query as literals. So it doesn’t have to cache a plan that’s safe for anything.

It can just, it can just say, hey, there’s a plan for these values. This is as good as I can do, which may not always be great either. But that depends on a lot of other things like indexes and whatnot. So you should really avoid this sort of thing without the, without the benefit of the option recompile hint, because it will, it will go quite poorly.

If we do use option recompile with this store procedure, things will go generally okay. Now, the thing that I run into a lot is that with store procedures like this, there’s usually a value that people think users will always search on. And they’ll design an index or two that makes sense for those types of queries.

But then as soon as you depart from that, as soon as that one value isn’t involved, things get really, really bad. That’s especially true without the recompile hint, because you cache a plan and SQL Server reuses it and you might have used the entirely wrong index for that query. So for our purposes and for the intent of the query, let’s pretend that we think we’re always going to be searching on owner user ID.

And we know that we’re always going to be ordering by score descending. That’s exactly what our query does up here. We have an equality predicate on owner user ID and we have an order by score descending.

This does get a little bit more complicated if you have dynamic sorting allowed as well. But we’ll talk a little more about that when we get it somewhat further down. So we can design an index that looks like this, that takes care of our immediate equality search and our ordering.

Right. So we can search to whatever user IDs we care about and we have the score descending column in order based on that equality search. And then we can put any secondary search columns over here. Now I’ve done that for the date columns.

These are going to be residual predicates. You cannot see to these because the score column is in the way. Again, we’ll talk about that a little more in a moment. But since post type ID is an incredibly unselective, it’s a very dense column. I have stuck that in the includes because it doesn’t matter so much in this case.

So I already have this index created, I believe. I mean, it should anyway. Yes, I do.

Good. We got an error there. I’m smart. So like if I go and run these two queries with where I do use owner user ID, like the plans that I get from these are perfectly fine. Right.

We get two seeks. The SQL Server does okay enough cardinality estimation from these. No one is upset or hurting from this. And where this query. So like what the parameters that I used for the very slow plan up here.

This were these were actually the creation date and last activity date parameters that you see in here. These things. So that’s what I passed in to get this slow query without the option recompile with option recompile.

SQL Server makes a much better choice overall. Just chooses to scan the clustered index. Doesn’t choose to reuse a plan.

Gets close enough to, you know, okay cardinality estimate. And like, you know, this thing finishing in under a second without it without a good supporting index. Totally reasonable.

Now, where things get tricky, of course, is that you when you start writing queries like this that are unpredictable, you have no idea what set of indexes to roll out because you have no idea what are going to be the most common set of search criteria that people pass in. And it takes kind of a lot of a lot of sort of like logging and analytics to figure out what the most common search things are in like, like, like if they’re slow or not, and then how to index for them. And you can end up with a lot of different permutations of sort of a similar index definition when you when you when you go down that route.

Now, this is a pretty simple thing because we’re just hitting one table, right? We’re just we’re only hitting the post table and we have all sorts of search parameters against that. So, but, you know, if we were to think about this for a query that might touch more than one table, you might be dealing with more than one search element and things might get really tricky.

Because now you have to think about indexing multiple queries and taking into account join keys and stuff like that. There’s all sorts of things that get really, really tough and complicated with this. So, if you don’t have the ability to rather let’s say much like the $4 a month that you could use to become a member of this channel, let’s say you can’t afford to recompile this query every time.

The way to get around that is to write dynamic SQL, the safe parameterized kind that avoids SQL injection and has equivalent plan reuse functionality to any other store procedure that you would run. Because we execute the query using SP execute SQL, which is a stored procedure and you get the type of parameterization and plan reuse and all that other good stuff that you would find in other cases. But you would use that to build a sort of a custom where clause based on what parameters apply to the particular search that you’re running.

So, if we recreate this with all that in mind, Oh, jumped around on me a little bit there. These things will all still be okay.

All right. We’re going to get, you know, pretty much equivalent performance to what we saw with the recompile. We get the two seeks here with the, you know, I mean, we do reuse this plan. So, the cardinality estimate is reused, right?

We can see that there is plan reuse with this. So, this guessed one of 82 and this is 5,000 of 82. So, the guess of 82, rather that cardinality estimate of 82 rows persisted there. And that also lives on with this.

If we executed more than one variation of this, that we would reuse the estimates for that plan, right? So, no big deal here. But again, coming back to what’s difficult with these things is knowing how often they run. Query store is pretty helpful for this sort of thing.

Because, well, if you’re smart and when you write dynamic SQL, you put in a comment with the procedure name that something comes from. You can use my free store procedure, spquickiestore, to search query text for this type of token. And you can find all the queries that run and see how many executions they have, which is a very useful thing.

So, the problem with rowstore indexes is that the order that the keys are defined in defines how queries can access stuff. Now, there are clients who I’ve been working with for years and I’ve been talking about indexing for years. And they’re still unclear on the fact that if you have an index like this that leads with owner user ID, that is not the same as if you have an index where owner user ID is like the second or third or fourth or fifth key column in the index.

Like, by that point, you have lost any useful sorting for searching stuff. Multi-column indexes really are only useful for either things that search for the leading key column or things that search for the leading key column and then other stuff. Right?

Or, you know, ordering and other things. But, you know, if we’re thinking about just being where clause centric here, this is really when, like, when rowstore indexes, multi-key rowstore indexes are very useful because you can seek to owner user ID and then you can seek to score or order by score. And then you can seek to creation date or last activity date or post type ID or ID.

You could, you could seek, you can seek across all the keys, but you have to start seeking with that first key column. Otherwise, it’s just a scan of everything because there’s no helpful, it’s not ordered in a helpful way after that. So if we, if, so like, for example, like with our other query where we need to search on these two columns, we’re not accessing these two columns first via where clauses or anything.

So everything is all out of sorts for those, for those searches. That’s why we ended up scanning the clustered index. So what you generally want to do is, if you can, and there are lots of considerations for this, and this is, you know, we’ve already kind of gotten to the 20 minute mark here.

And somewhat, some of you may feel dissatisfied that I have not gone into all of the potential, you know, things that might come up by using columnstore. But it’s sort of generally creating a non-clustered columnstore index, right? And just because it doesn’t say non-clustered in here doesn’t mean it’s not.

I can’t create a clustered index on the post table. It already has a, I can’t, rather, I can’t create a clustered columnstore index on the post table, because it already has a clustered rowstore index. So it is creating a non-clustered columnstore index.

So in general, if you create a clustered columnstore index that, you know, spans the columns you care about searching in your query, things will end up a lot better, and the unpredictability doesn’t matter as much. The reason it doesn’t matter as much is because columnstore indexes don’t have that column to column dependency that rowstore indexes do.

Each one of these columns is stored in its own little, like, vertical up and down columnar index within the columnstore index. And there’s, like, lots of little mini indexes, sort of, unlike all the different segments and row groups. So you can pick and choose which columns you want to hit.

So if you want to hit owner user ID and creation date, or you want to hit score and last activity date, it doesn’t make any difference to the columnstore index what order you search for things in or anything like that, because each one of these columns has independence from any other column that might precede it in the key of the index, unlike rowstore indexes.

So you can make all of your search query, you can give all of your search queries a single, good, highly compressed, very nice place to search for data from, without having all the problems that, of creating multiple non-clustered rowstore indexes in order to try to satisfy every different permutation of a search. So the way that I love to handle this sort of thing for clients, and as long as, like, you know, the shoe fits with using columnstore in their database, you know, like, I think my biggest consideration is, like, if we’re relying on query parallelism to make things fast and you’re on standard edition, we need to really carefully test the columnstore thing because Microsoft hates people on standard edition, and it limits all of the columnstore parallelism to a DOP of 2.

You cannot exceed a DOP of 2, so if you have a query that runs really fast at DOP 8 using rowstore indexes in standard edition, and you start using a columnstore index and you’re like, wait a minute, my queries got slower, even though I’m using batch mode in the columnstore index.

Well, it’s because your query is limited to a DOP of 2, and you can’t do nothing about that. So the way that me, Erik Darling, with Darling Data, likes to handle unpredictable searches by using columnstore is typically a lot easier and more efficient than creating, like, 17 different rowstore indexes to account for everything that someone might search for.

So we’re going to wrap this one up here. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something, and I hope that you will continue watching.

I forget what the next video is about, but it probably, hopefully it won’t be as long as this one, because, you know, the tongue does get tired. And, you know, there’s a wise man who once said, the reason dogs have so many friends is because they wag their tails, not their tongues.

I guess that explains why I spend so much time alone. Anyway, thank you for watching. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Equality and Inequality Predicates

Indexing SQL Server Queries For Performance: Equality and Inequality Predicates



Thanks for watching!

Video Summary

In this video, I dive into the intricacies of indexing queries for better performance in SQL Server, focusing specifically on equality versus inequality predicates. As we explore common scenarios and problems encountered when designing indexes, you’ll learn how to optimize your queries so they can take full advantage of those indexes. The discussion covers a range of topics, from understanding why certain index designs might not perform as expected due to non-selective columns required in every query, to strategies for rewriting both the index and the query itself to improve performance. By the end of this video, you’ll gain insights that can help you avoid common pitfalls and make more informed decisions when designing indexes for your databases.

Full Transcript

Erik Darling here with Darling Data. We’re going to embark on an adventure together, you and I. We’re going to delve deep into the caverns of how to index queries for better performance. I obviously can’t cover every single scenario, but I’m going to cover many of the most common ones and some of the most common problems that I see when I’m doing my consulting work. And hopefully you will get a at least enough knowledge to be dangerous in your own work. In this first video, we’re going to talk about equality versus inequality predicates, because the way that you design your queries and the way that you design your indexes really has to kind of go hand in hand so that you make sure that your queries actually use your indexes well. And if you have queries that are not performing well, perhaps the way they’re written is the reason why you’re not taking full advantage of those wonderful indexes that you have. Perhaps, you can stop worrying about fragmentation and start worrying about real problems. Be a grand thing. But before we start talking about equality and inequality, I promise we’re not going to go off on any soapbox tangents here about the world at large. But before we do that, if you would like to become a member of this channel and support the channel and all the work that I do to bring you this wonderful content, there is a link at the very top of the video that says become a member in which you can click on that very link.

And you can give me a minimum of $4 a month towards continuing to create this content. It’s a nice thing to do. If you’re in the market for nice things to do that are a little bit less expensive, you may like, you may comment, you may subscribe. You can do all of the above, none of the above. But of course, then you’d be mean. If you just need to be a mean, cruel person. Well, maybe this isn’t the channel for you. We’re all nice people here. If you are in the market for SQL Server consulting help, well, by golly and gosh, I do quite a bit of it. All these things over here to my right. I’m in the market to do them. And if you need something aside from maybe one of these things, well, you can always talk to me and tell me what it is.

And as always, my rates are reasonable. Training. I got some. You need some. It’s about $150 for the rest of your God-given life. Again, you can use that link and use Spring Cleaning the discount code or visit the video description for a link that puts all of these things together for you.

So that you can just go right on ahead and not have to deal with copying and pasting anything. I think by the time these videos start airing, it will be 2025. It will be the beginning of the year. So I guess news about upcoming events will be forthcoming.

Won’t that be lovely for you and me? Because that means I’m getting out of the house. But with that out of the way, let’s do a little disco dance and talk about these inequality and equality predicates. Now, a little preamble on this stuff.

Pretty much since I started reading about indexes, when I started caring about SQL Server performance, the general wisdom has always seemed to have been to you should design your indexes. The key of your indexes to support the most restrictive search predicates first.

That could either be, you know, like a unique column, a unique column of values, or just one that is highly selective in some way. And that’s not terrible starting advice.

If you’re a beginner out there in the world and you see that advice, that is not advice that you need to be terribly suspicious of. You can follow that advice up to a point and probably have pretty well good enough performance for a lot of your queries.

You know, I realize that there’s going to be a lot of questions about, you know, whether you should index for your where clause first or your join clause first or whatever. And the crazy thing is that there are going to be different scenarios where indexing for, like prioritizing different parts of your query for indexing can make a big difference.

But if I were to give you some just basic starter level advice to go along with indexing the most selective columns first, I would say that you should index for your where clause first and then prioritize other parts of the query. That’s not always going to work and that’s not always going to be true because sometimes you’re not going to have a where clause, you’re only going to have a join clause.

And sometimes other parts of the query might come into play as to how you want to design your indexes. Things like, you know, windowing functions, group by, order by, things, you know, columns you’re selecting, very specific restrictive filters on things.

All sorts of things come into play when you’re designing indexes, but we’re going to talk through a bunch of those during the course of the series, so I’m not going to talk too much about those here. One problem that I see pretty consistently is that a lot of databases end up designed with some very non-selective columns that are required for just about every query.

Think about something like soft deletes where most rows are not deleted. Most rows are in play. You might have a status column that has a variety of statuses in it that is still not very selective, but, you know, has outcomes in it that you need to, that most of your queries are going to be filtering specifically for.

Now, leaving filtered indexes specifically out for the moment, a lot of the columns that I see, rather, a lot of the indexes that I see, have those types of columns in them.

I don’t want to say included in them because I don’t want you to think that we’re just talking about included columns because those columns end up in various places in an index depending on how the index is laid out overall.

If it is a filtered index, then they usually end up in the include columns because they’re in the where clause of the index. And having them in the includes gives you a little bit more, gives you a little bit more gusto than leaving them out entirely.

Of course, putting them in the key is fine too, but usually includes is good enough. But, like, for all of these required type columns, whether it’s, you know, you know, like if you have a multi-tenant type database and you have, like, your organization ID or you have, like, the soft delete thing or the status thing that you need to look up, there are all sorts of columns along those lines which are not very selective, which you do need in the index to make your queries fast because you need your data organized by these things.

Some of the index design patterns that stem from that are rather unhealthy. Now, in this post, what I want to look at is both an index change and a query change that can help you out when you have, like, let’s say you have an index that maybe wasn’t designed great, but your query can be rewritten to take better advantage of that index or how to rewrite the index so that whatever way your query is written, you can do pretty well with it.

So I’ve already created two indexes and a constraint on the post table. Sorry. I’ve created two indexes, one on the badges table on name and then user ID, and this index is plenty good enough, and then one on the post table on post type ID and then owner user ID.

Post type ID is taking the place of sort of like a status column because, you know, there are about like eight different, sorry, there’s like eight different post types in the table, so they’re going to, and like some of them are much more selective than others, and the problem is that, like, for some types of posts, an index that leads with post type ID would be very selective, and for other types of posts, an index that leads with post type ID is going to be not very selective at all.

And then I also have a constraint on the post table telling SQL Server that every post type ID that exists in that column is greater than zero and less than nine. All right.

These are hard limits that we have set on the post table. So, in this case, the post type ID column, like I said, it’s going to play the role of our non-selective column that every query is required to have filtered in there.

Granted, that’s not going to be the case with a lot of Stack Overflow queries, but, you know, just go along with me here. Let’s, like, just pretend that all the queries need to find a particular type of thing, but what else you might be searching for depends on what else the module that your users are executing is going to run.

Now, I have this query, and this query is not going to be very fast. I’m just going to walk you through what it does real quick. Let’s start down here a little bit.

So, we are selecting from the badges table, and the reason why we have the index that we have on the badges table is to grab, well, it partially helps with this, but really this part of the query is mostly assisted by the ID column on the users table being the clustered primary key. So, seeking to various user IDs is totally fine and easy.

The reason why we have the index that we have on the badges table is because we start filtering the badges table with an initial, I mean, I know it says in popular question. I guess before there were some more things in there, and I narrowed it down to this to make the demo, like, work in a reasonable amount of time, but this is essentially an equality predicate on popular question.

And then for the post table, this is what we’re doing. And if you remember the index that I created, it was on post type ID and then owner user ID, and then it had score in the includes, which is obviously not going to be a completely ideal index for this situation, but the thing to pay attention to really is what’s going on in the where clause here.

We can just leave this out of the equation for the time being. Now, if we look at the plan that we get for this query, you’ll notice it doesn’t return any results, and I purposely have this not return any results because I want you to see the raw execution time of the query without returning any rows to SQL Server Management Studio.

That’s why I filtered to where that row number column equals zero down here. I want nothing to come back from this. Now, looking, like, more closely at the query plan, like I said before, getting to the user’s table is trivial.

It takes about eight milliseconds of time, but you’ll see that we spend about 32 and a half seconds executing the query. The majority of the time, if you’ll allow me to drag this over, is in this section of the query plan. And in this section of the query plan, we start with a scan on the post table.

And the reason why we start with a scan on the post table is because we don’t have SQL with the inequality predicate, the way we’ve written this, we can’t seek into that index that leads with post type ID. SQL Server ends up building an eager index spool based on this.

Or rather, we could seek into it. SQL Server just doesn’t. SQL Server just says, like, I don’t know. And this is even with that constraint on the table, showing where SQL Server knows that any value less than three is going to be one or two. SQL Server could have converted that to equals one or equals two, but it doesn’t.

So we end up building an index at runtime. And this takes about 21 seconds because there’s about four seconds spent scanning the index. So the majority of the execution time really is in here.

Okay. Now, like, it would be nice if SQL Server did do this, but, like, did do things a little bit more naturally. And, like I said, it could choose to do this, but it doesn’t.

If we look at the estimated plan for this with the 4C Kent on there, this looks a lot friendlier, right? This looks like a lot better of a situation, but it’s not. If I start this running, we are going to be waiting here for a very, very long time.

I have, in the past, attempted to let this thing run to completion, but it runs for about an hour before I give up on it. So this thing will just run and run and run and run and run. This actually doesn’t do any better with the 4C Kent.

Sometimes it is useful to apply various hints to your queries to see why SQL Server doesn’t choose plans. You know, I find things very interesting to do are 4C Kent and option loop join at the end of your query because that will really show you indexing issues with whatever query you’re running. Now, this thing has been running for about 45 seconds, and we haven’t gotten anything back from it.

The main problem with this query is if you look at the index seek, we are doing this. And actually, let me expand this out as well just so it’s a little bit more clear in here. Range columns.

Range columns. So for every time we go into the POST table, we scan for everything that’s less than 3. And the trouble with that in our case is that everything less than 3 is the majority of the table. So every time we go and seek into that index, we end up with a big, big problem.

Having to read through about 17 million rows and then doing a residual predicate on the owner user ID. If you look at the, look at this, we actually have a, do an index seek. Oh, wait, that’s the wrong one.

That’s the user’s table. I’m sorry. Highlighted the wrong thing. I was like, wait a minute. That doesn’t make sense. So we have a seek predicate on a POST type ID less than 3 and then a residual predicate on owner user ID. This is a bad time because every single time we go into this and seek, we seek through 17 million rows and apply a residual predicate on owner user ID.

That is not a very good scenario. If we were searching on a very limited set of POST type IDs, that would be maybe okay. But the way that this thing is going, it’s very much not okay.

So the 4C does show us that this is a bad plan idea. So when we, like I was saying, if you look at, let me start this running. And then we’re going to talk about the distribution of records for POST type ID in the POST table.

I actually have to put that in a new window probably because I’m building an index there. So like I was saying, we have to go through about 11 million records to find anything with a POST type ID of less than 3. 3 is down here with 167 records.

But the POST type IDs that we care about that are less than 3, that’s 1 and 2. SQL Server doesn’t. SQL Server would have to scan through 17 million of those.

Now, what I was saying before, which I think I mumbled up a little bit, was that SQL Server could convert this logically because there’s a constraint on the table that says, like, you know, all of the values in here are greater than 0 and less than 9. SQL Server could logically translate less than 3 with everything that it knows about that data to equals 1 or equals 2.

But it doesn’t do that on its own. Right? It just doesn’t.

It can’t unfurl that logic on us. So I’m creating an alternate index here on owner user ID and then post type ID. And what I want to show you first is that if we get the estimated plan for this, this thing should start using the new index that we created, which is not badges underscore x.

And if we do that, this will actually come back pretty quickly. At least it did at one point. Who knows how long it’ll take now.

But there we go. So that was about 8 seconds. And we can see that this is much improved. All right? This is no longer building an index. This is no longer, what do you call it, building an eager index boolean is no longer taking 30-something seconds to execute this whole part of the loop.

And, you know, having the reverse index where we use the other one, that didn’t cause us, like, this actually does better. Now, if we wanted to write this query in a way that would have it turn out well for our indexes with our original index, what we could do is actually let me tweak this a little bit to say index equals not badges on this.

And if we look at the query plan for this, it is pretty okay. There is still no eager index boole being built in the estimated plan. And because we have separated out our search logic a bit, and we’ve said select this stuff where post type ID equals 1, and select this stuff where post type ID equals 2, we’ve done the optimizer’s job of unrolling less than 3 with the, you know, because, again, with the constraint being greater than 0, less than 9, less than 3 can only mean 1 or 2, or 1 and 2, depending on how you look at it.

But if we run the query and we tell SQL Server to use our original index, not badges, we still do fine with this because SQL Server does two pretty quick seeks, and this ends up taking roundabout the same amount of time as the query with the unchanged less than 3 predicate and the new index that we created that leads with owner user ID.

So with that index, we’re able to seek to each owner user ID and then filter out the post type IDs. Owner user ID is a far less dense column. It’s very, very unique amongst the 17 plus million rows in the post table.

So the highest count for a single user in this version of the database is around 28,000 rows, and seeking to span a range of 28,000 rows for a single user ID and applying a predicate after that is pretty trivial as far as seeks go.

And then another thing that we could do is we could convert the where clause to say where post type ID is in 1, 2, and we could run this, and this should also take roundabout eight seconds. The query plan is slightly different for this one.

6, 7, 8, there we go. Okay, it was a little under 8 seconds, 7.8. Maybe the cache was a little bit better or something on that one. But rather than having two distinct seeks like we had with the query 1, union all, query 2 thing, we just have one seek into this.

And if we look at the seek predicate, we will have actually two seeks in here. One of them will be for post type ID 1, and the other will be for post type ID 2. It’s a little hard to get this with ZoomIt.

Let me see if I can… Yeah. So there’s the first seek for post type ID 1. And then if we expand this one out, we will see the second seek for post type ID 2.

So you can, depending on how your indexes are arranged, you can sometimes change the way that where clauses are written to take better advantage of indexes as they exist now. If you are allowed to change indexes, sometimes that like, but not change queries, like if you’re working with a third party vendor app, and they’re like, no, you can’t change the queries, but you can do stuff with indexes if you’re smart.

One thing that you could do is change the order of some key columns, depending on how the queries are written, in order to have those indexes get used more intelligently by those queries. So there is that.

Anyway, that’s a little foray into inequality and equality predicates in queries, and how those tend to align with the way that you’re in the index key column order, and the way that you design your indexes for those queries to be able to get seeked into.

Sometimes it does work. Sometimes you can write queries so that having a less selective column first works out okay. Other times you have to either rewrite the query or reshape the index so that the intent of your query is better serviced by the key columns in your index.

So with that out of the way, I hope you enjoyed yourselves. I hope you learned something, and I will see you in the next video, which will be about, oh, what’s number three over here? Unpredictable search queries.

So don’t read this yet, because that’s what I’m talking about next. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About OUTPUT

How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About OUTPUT



Thanks for watching!

Video Summary

In this video, I dive into the often-overlooked `OUTPUT` clause in SQL Server queries, explaining its basic functionality and more advanced uses. While the `OUTPUT` clause might seem mundane at first glance, it offers powerful capabilities for simplifying archival processes and maintaining data integrity during merges. I demonstrate how to use `OUTPUT` with both regular DML statements and merge operations, showcasing its ability to track changes in inserted and deleted rows, as well as access values from other tables involved in the modification query. By the end of this video, you’ll understand why it’s a valuable tool for optimizing your SQL Server queries and managing large datasets efficiently.

Full Transcript

Erik Darling here with Darling Data. Actually recording my final video for the evening, which I’m excited about because I can’t wait to take a shower. I know it seems weird, but once you get on these video kicks, you just got to keep going until you’re exhausted either yourself or your content. This is going to be the final video about how to write a video. SQL Server queries correctly. In this video, we’re going to talk about the output clause. Just like CTE, there is very little that you need to actually know about output. If anyone ever tries to say that they have information worth selling about output, they are full of it and once again, should throw something heavy at them. They’re not good people. In fact, they’re terrible people. They’re conning you out of money. This is free information that you should be able to find anywhere because there’s not a whole lot that is very in-depth or very technical about it. If you like this content and you want to support this channel, you can actually spend money on it. I guess I’m kind of reversing course there a little bit. I actually don’t want money for this video. Watch the other videos and if you decide you’re like, wow, this is a great channel, then you can give me four bucks a month by clicking the link in the video description that says, become a member of the channel.

And then you’re a member and then you’re a member and I get $3.10. If you don’t have the four bucks, if you spent it all on the ponies or a pony, you can like, you can comment, you can subscribe and that won’t buy either of us a pony, but it’ll feel good. If you need help with SQL Server, if it’s slow, if it’s annoying, if it’s upsetting you in some way, I am available for hire. You can pay me money. In exchange for that money, I will fix your SQL Server problems. And as always, my rates are reasonable. Speaking of reasonable, reasonable man, the face of a reasonable man here. You can get all of my training for 150 US dollars for the rest of your life. That’s the URL, that’s the coupon code.

These are also helpfully glued together for you in the video description so that you don’t have to do much thinking because you’re probably tired of thinking. I will be out and about in 2025. I don’t know where yet, but when I get there, I’ll tell you. With that out of the way, let’s talk about the laughably mundane output.

Now, one of the most interesting things you can do with the output clause is use it to simplify archival processes. That’s going to be the first demo that I show you. And that’s really, if you stop watching there, I wouldn’t blame you. The rest of it, yeah.

Output isn’t very interesting, but maybe it will be to someone. So, the first thing is you can access the inserted and deleted tables to see modified values. Wow. So, you write an insert or an update or a delete. You can throw the output clause on there.

And you can see what changed. Sort of a helpful thing. If it’s a lot of rows, be careful.

But, you know, when there’s a lot of rows, you should be careful anyway. You don’t need to select star from that 24 billion row table. You’ll probably have a bad time.

Output gets a lot more interesting when you use it with merge. But, of course, then you have to use merge. And if you never write a merge statement, your life will be complete. I’m telling you. You’re not going to miss out on much.

And the third thing about output is you need to be careful what you output data to. If you output to a client, whether it’s SQL Server Management Studio or whatever client you write that accesses SQL Server, you will force the query.

Or rather, the more clear way of putting it is you will disallow a parallel execution plan. The same thing will happen if you use output into a table variable. For very small inserts, updates, and deletes, you probably won’t notice this.

But if you’re using output for something like I’m showing you, you may notice it. Because for archival processes, you are usually getting rid of modifying a lot more rows. Outputting to a temp table or a normal user table does not have that problem.

Only outputting to a client and to table variables. So this is primarily how I use output. And I use it to make archival processes all one statement.

Usually with archival processes, you’re like, Oh, I don’t know, I’m going to update these rows to mark them as deleted. Or I’m going to delete these rows.

And then I’m going to insert these rows into the archive table and then go back and delete them. With output, you can do that all in one go. Because you can nest a delete statement and put that into a derived table. Note I’m not using a CTE here.

Just despite you, CTE lover, 11357. So you can put, you can nest the data modification like delete. You can give it the output clause.

Look at these wonderful rectangles I’m drawing for you. There’s some symmetry in there, baby. And then you can select from that derived table. And you can insert, you can use that to drive an insert into an archive table.

So if let’s say we do this, we’re already in the right database. And I’m going to put this into a transaction so that I can roll it back immediately. And I’m also going to turn on query plans because they’re going to be of some interest to us.

And we do this. Notice in our execution plan, we have a clustered index delete and then a clustered index insert. This is going to the votes table and this is going to the votes archive table.

And if we had, of course, if we had supporting indexes, this would be a lot faster. But you know, whatever. Not really the biggest deal here. And then I have a couple of validation queries where we make sure that the everything that we cared about was deleted from votes and everything that we cared about was deleted into the votes archive table.

And that is the case when we finish the query. These are all gone from the votes table and they only exist in the votes archive table. So a very, very handy way of simplifying archival processes when you’re finally ready to purge some data out of those massively oversized tables of yours.

Now, this is originally from, partially anyway, from a blog post that Aaron Bertrand wrote that I’m stealing the table definition from because it was easy. But thanks, Aaron. You’re a sport. You’re a curling pro. And so what I’m going to do is just for the sake of making it easy to do stuff, I’m going to use a transaction here.

I’m going to show you what’s in the table called my table and then I’m going to do a merge into my table. And the primary thing that I want to show you is that when you use output with merge, you have access to two things that you don’t have access to when you use output with a regular insert or update or delete. One is this special dollar sign column called action. I’m actually, this is probably a fancy word for that that I don’t know.

And the second thing is that you can access data from other tables involved in the modification query to see what values are in those. This can be very handy if you want to do like a delete and then an insert and you want to maintain certain values from the table you’re deleting from in the other table. It’s pretty handy for some stuff like that.

So what I’m going to do is run this whole thing. And really the results of this are what we care about. We don’t care so much about performance or anything. But when we start off, when we started off, our table, my table had the IDs one and four in it.

And when we ended, we had one, two and three in it. And then this was the output, the result of the output clause. And so we have an update where we deleted, where we updated ID one.

We have an insert for ID two and insert for ID three because we only had one and four before. We didn’t have a match for four. So we deleted row four.

Now what’s neat about this is this, that word column. Why did you change colors on me? Let’s correct that. Let’s make sure that’s the right color.

We only use pink in this house. This source dot word is from this. This was not anything that was in the table that we merged into. Right?

These values were not in there. They were only in here, but we could reference those with merge. So there are neat things you can do with output and merge that you can’t do with regular DML statements. So that’s the second thing that you should know about output clauses.

The third, and this is not, I don’t actually have to run any of these. I just have to show you the estimated plans for them. But if I make the target of my output query a temp table like this, we will be able to use at least a partially parallel execution plan here.

Right? There’s nothing stopping us from being able to read from a table in parallel. Of course, modifications like inserts and updates and deletes are always going to be part of a single threaded portion of the query plan.

Right? Those are not allowed to go parallel. Inserts are sometimes allowed to go parallel depending on the right conditions. But in this case, they are not.

And then over here, what I’m going to show you is the same thing, except now going to a table variable. And we’re going to output into that table variable. And this is going to show you the estimated plan again.

And this time we do not have a parallel read portion over here. And if we go to the properties, we’re going to see there, non-parallel plan reason, table variable transactions do not support parallel nested transaction. I don’t know why transaction is, transactions is plural here and singular here.

You will have to ask the summer intern at Microsoft who does all of the SQL Server development these days. The same thing will happen if you use delete and you output to no target except the client. That’s, that’s this, that’s in our case, that’s SSMS.

If we get the estimated plan for this, there will also be no parallel read portion here. And if we go to the properties, we will see a different non-parallel plan reason. This time, DML query returns output to client.

So that one is very clear about what’s, about why it cannot go parallel. And with that, that is literally everything you need to know about the output clause. Can simplify archival processes.

It’s cooler with merge because you can access a column called action, a pseudo column called action, which is probably what it’s actually called, that will tell you if it was an insert and update or delete. And you can reference columns from the source of the merge, not just the target of the merge.

That’s something you can’t do with regular insert or update or delete queries. The third is that you should be very careful what target you use with output. Because depending on how much data you are modifying, you might really want some portion, the read portion of your query plan to happen with a parallel plan.

It could slow down quite a bit if you lose that. So anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I am now going to go bathe and enjoy the company of my family until I go to sleep.

So with that, I’m also wrapping up the how to write query correctly series. It’s a playlist. You can watch them all. You should watch them all, especially if you’re having trouble writing queries correctly. You just might learn a thing or two or like 16, depending on where you’re at in life.

All right. Cool. We did it. Thanks for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.