create temp table sql Archives | Page 33 of 139

Indexing SQL Server Queries For Performance: Eager Index Spools

Posted on January 28, 2025January 22, 2026 by Erik Darling

Indexing SQL Server Queries For Performance: Eager Index Spools

Thanks for watching!

Video Summary

In this video, I delve into the intricacies of eager index spools in SQL Server query plans, particularly focusing on their appearance and impact. I explore why these spools often arise, especially when nested loops joins are involved, and how they can significantly degrade performance without any accompanying missing index requests or optimizer hints. By examining specific examples, including a detailed walkthrough of a problematic query, I highlight the inefficiencies and potential pitfalls of eager index spools, offering practical advice on how to identify and mitigate them for better query performance.

Full Transcript

Erik Darling, right here in the flesh with Darling Data. Darling Data, of course, has no flesh except me. I am the only flesh. In today’s video, we are going to talk more about how to fix query performance issues with indexes. And in this one, we’re going to talk about when an eager index spool arises in your query plan. Now, I’ve talked about eager index spools many times. You’ll find other videos on this channel talking about them. But the things that I want to outline about them, that I find interesting about them, are they are not accompanied by a missing index request of any variety. There’s not going to be any green text in your query plan. And there’s not going to be anything in the DMV saying, hey, I created an index for you while you’re query ran, you should create this index. That doesn’t happen. Two, they are very, very likely to show up on the inner side of nested loops joins when you don’t have a good index, but sometimes even when you do. Three, even if they are building in a parallel plan, the index, the table that you read from to build the index, and building the index only happens on a single thread. So there’s a lot of wasted CPU threads in that whole thing.

And that can also make the index, make the table read in the index build take a very long time. Four, three, five and a half? B, Q, I don’t know, whatever. In parallel plans, eager index spool builds are accompanied by a weight called execsync, E-X-E-C-S-Y-N-C. But in a serial plan, a single threaded plan, they emit no weight, even though we did, we would be waiting on that spool to build. And that was something else. Oh yeah, building spools is highly inefficient. It is essentially a row by row build of the object. And Microsoft has made absolutely no progress in improving that scenario over the years. So there’s a lot of bad stuff about eager index spools, especially when they are built off very large queries.

So if you’re working with a query and you have big tables in your database and you see a nested loops join, and on the inner side of that nested loops join, you see an eager index spool, not a lazy table spool, an eager index spool, you can be pretty sure that either you don’t have a good index to support whatever predicate search you’re doing, whether it’s via join or where clause or a combination of both, or the optimizer is once again taking its nose beers and not using the index that you have. And I’m going to show you an example of that.

So with that out of the way, let’s talk about how you can buy me some regular beers, because I have enough nose problems without anything else. I don’t need your help. If you would like to support my videos, you can sign up for a membership. It is as low as $4 a month. That’s not even enough to get another knuckle tattoo.

You can join the 30 or so other people who have done so and who care enough about my endeavors and my mental well-being to pay me $4. There’s a link right there in the video description for you to do that. How convenient. Make it very convenient for you to hand over money. If you have spent your $4 on beers of any variety, you may like, you may comment, you may subscribe.

I would suggest waiting until you’re sober to do so, though, because the only thing worse than comments are intoxic comments. We don’t want those, do we? Because comments are forever. If you need help with SQL Server, if you feel like SQL Server has indulged in too much of any substance and it is just not working out well for you, I am the best SQL Server consultant in the world, according to BeerGut Magazine, of course.

And if you need any of this stuff done, I am just the nose to do it. And as always, my rates are reasonable. If you would like some very high quality, very sober training, sobering training, maybe.

It’s like these videos. It’s fun. About SQL Server performance, you can get all of mine for about $150 a month. That is the rest of your life.

You don’t need to resubscribe or rebuy. That’s about it. That’s the URL. That’s the discount code. Of course, there is a fully assembled URL for you in the show video description. The show description.

Is this a show? I don’t know. I am showing you things, but… Now, you show up on your screen. It’s not a podcast because I’m not just like, you know, vocal frying into a microphone about dumb stuff. Anyway, more about upcoming events in 2025.

For now, I don’t know. Enjoy your alone time. With that out of the way, let us talk about eager index pools. Now, probably the most fascinating thing to me about eager index pools is how often the optimizer will put them into query plans, often to the detriment of performance.

In a sane world, and by sane, I mean like in a world in which the optimizer kept up with the times, and the optimizer was more aware of current storage trends. I realize that storage in the cloud is probably as awful as the old spinning disk storage that you used to have on your, you know, on-prem SQL servers.

But, you know, at the same time, it would be nice if the optimizer were a little bit more aware of these things and didn’t feel the way it feels about IO patterns sometimes. There’s very little difference between sequential and, or rather, like when you think about like old hard drives, there’s very little difference between like doing this on an SSD and doing this on an SSD. Right?

Because SSDs don’t do this like old hard drives do. Everything is kind of randomly scattered about. Oh, there’s memory sticks anyway. There’s very little like logical order in there. So the optimizer, not really up to date on those things.

And the optimizer will frequently pick nested loops joins plans and put an eager index pool on the inner side of the plan. The outer side is up here. Then there’s the loop join.

And then you come down here. And this is where the spool goes. It’s not a very good time for your queries. In a sane, rational world, like the one that I was trying to describe there, the optimizer would just choose a merge join. Maybe not a merge join.

A hash join. Let’s say the optimizer would opt for a hash join and just throw a missing index request and move on. But no, SQL Server has to do something totally different. Now, there have been a couple weird occasions in my life where I have written queries in a way specifically to get an eager index pool.

And not just for the sake of a demo. There have been times in my life when having an eager index pool built was actually a better thing for the query than anything else. And the way that I ended up doing stuff like that was you can see this join up here.

It’s just a regular old equality predicate joining post to votes. But down here, I have written this as two inequality predicates. A greater than, equal to, and less than, equal to.

And since merge and hash joins require at least one equality predicate in order to work, this gets us a change in the execution plan. Now, I don’t recommend it for this particular query.

For this particular query, it is an absolute disaster. But this is the much better plan where we just have the single equality predicate. But if we rewrite that single equality predicate as two inequality predicates, all of a sudden SQL Server says the only thing I can do is use a nested loops join and have this on the inner side of the join.

So this is what I was talking about. Like in a sane and rational world, SQL Server would encounter a situation where it may have considered this in the past, and it would just do this instead.

Just do a hash join. Say there’s a missing index. Fine. Anyone can get on board with that.

What’s hard to get on board with is SQL Server building a 52 million row index in the middle of your query execution. Again, there were some very special edge cases in which I did that on purpose because it suited the workload better. In this case, not good.

Not a good time. So if I were to let that run and get an actual plan, that would run for about almost two minutes. So we don’t want to do that.

But what’s very funny, and I’m going to explain this query a little bit. Oops, what did I do? Oh, there’s already an object called P1. All right, well, we’re just going to call you P11. We’re not going to stop and mess with that. I guess I was practicing this one too much.

But we have this query here. And what I want to tell you about this query is that this is not a natural join in the Stack Overflow database. The users table, the ID column in the users table is a clustered primary key.

The ID column in the post table is also the clustered primary key. That is not how those tables relate. These tables relate by a column in the post table called owner user ID. The funny thing is that the ID column on the post table, like I said, is the clustered primary key.

So there is an index on it. So when I’m saying, like, joining one clustered primary key to another clustered primary key, you would think SQL Server would say, oh, I can do this in, like, a number of different ways. I can seek.

I can do a loop join and seek to everything that I care about. I could just scan it once and do a hash join. It could also scan it once and do a merge join, but we don’t talk about merge joins. But it doesn’t choose to do that.

It chooses to do this instead. And it chooses to build an eager index pool off of the clustered primary key here. This takes about, well, there’s four and a half seconds here. And this is where I’m going to show you a couple of the things that I mentioned about eager index pools.

One of them is that they read from this table and build the index pool on a single thread. So even though this is a DOP8 query, we ain’t doing that at DOP8, are we? We are doing that at DOP1, 17 million rows.

Two is that, like, so this is in a parallel zone and that happens. So if we take the four and a half seconds it took to read from this table single threaded and subtract it from this, we end up with, what, like 15 seconds or so of time building the eager index pool.

There is no missing index request up here saying, hey, we could use an index. And I forget if exec sync shows up in the query weights. It does.

There it is. This is the exec sync weight that I was talking about that shows up in parallel execution plans. If this query ran at DOP1, there would be no weight indicating that an eager index pool was built. We only get this in the parallel version of the plan.

So this is all highly inefficient, highly ugly. And the worst part is that we already have an index here. So sometimes when I’m tuning queries and I see an eager index pool, I’m like, dummies didn’t make an index.

And then I go look and I’m like, this is a perfectly good index. Eagle server. Toot toot. So sometimes what you have to do is you have to tell SQL Server, no, no, no. Trust me, buddy.

Put the spoon down. We should be seeking instead. So before I move on, I just want to say this query runs for 22.171 seconds in total. But if we run this query with the four secant, this should only take a couple seconds.

Oh, yeah, there’s already an object called P2. We’re going to make it P2.2 then. Again, I practice these too well. Sometimes I’m just too good at my job.

And this takes 1.5 seconds. And it turns out without adding an index or doing anything, we had a perfectly seekable index for SQL Server to use. Look at that.

We have a clustered index seek. Before we had a scan and we had a spool. SQL Server. Your trip to the powder room was ill-advised.

So in a lot of cases, eager index pools are really just irritating missing index requests. Sometimes they are optimizer deficiencies and shortcomings. But the grand scheme of things, I don’t know.

What do I prefer? Four secants or having to wait and sit there and wait while an index creates? I’ll take that four secant any day. So if you’ve already done your job and you’ve already indexed your SQL Server queries for performance and you’re not getting performance, take a look at those query plans.

You might see something awful like that. Now, let’s get this thing running. And we’re going to talk a little bit about this.

So a lot of the queries where I show off bad things that happen, I use either cross-apply or outer-apply because the inner side of nested loops is an interesting place for the SQL Server’s optimizer. It does all sorts of goofy things in there. Parallel nested loops are especially strange birds.

I thought for a bit about doing some videos about parallel nested loops, but I don’t think I could… I don’t think that even with all the time in the world in, like, Paul White’s notes and the SQL Server’s source code that I could do adequate justice to how weird parallel nested loops are. There is simply too much going on there.

But we have this query, and this is another situation where you might see an eager index pool arise. Like, just have a little subquery like this with an aggregate in it. And if we look at this query plan, and let’s just drag this way over here, something quite similar happened, right?

Here is our eager index pool. Here is our post table. So essentially the same thing happened.

Now, this is the index that we currently have on the post table, right? It’s on post type ID, then parent ID, then owner user ID. The part of the query where we got an eager index pool was on this.

So SQL Server is essentially complaining that we don’t have an index that helps with this. And again, for whatever reason, SQL Server could have just chosen a hash join or a join. But it didn’t.

It chose a nested loops join. It told us nothing about an index that might be missing. It just created that index down here. Now, the normal way of solving this problem is, or, you know, mostly what I do is I right-click on the eager index pool. And I look at two sections.

I look at the seek predicates. And this is what I would create as the key of an index to fix this problem. In this case, SQL Server wants an index on parent ID, right? We can see the parent ID column reference here, which makes sense because we are correlating the outer query with the inner query on parent ID, right?

Look at that. How nice. And the other thing you want to look at is, let’s go back to the query plan, and let’s get those things back up.

The other thing you want to look at is the output list, because the output list will tell you what columns you will probably need to include in your index. So look at the seek predicate. That’s the key of your index.

Look at the output columns. Those are the include columns in your index. 99% of the time, this will be good enough. You may have to think about things a little bit differently if you have, like, a top one with an order by or something, because the order by column might need to go in the key of the index, too. But for most cases, this will be good enough for SQL Server, at least to use the index.

I don’t know if that’s going to make your query as fast as you want it to be, but that’s what you could do. Now, there are ways that we could go about fixing this, of course. Let’s say we wanted to create this index to fix it.

We could say create this index on parent ID and then score, right? And that’s essentially what SQL Server was asking for in the eager index pool plan. It wanted a key column on parent ID.

Fine. It wanted us to include score and ID. Well, we have to explicitly reference score, because score is just a regular column in the table. We don’t need to explicitly reference ID in here, because ID is the clustered primary key on the table.

So that automatically gets built into any nonclustered indexes that we create. And now if we run this query, let’s just look at the estimated plan real quick. Make sure that we are spool free.

Look at that. We have no more eager index spool. And now if we run this query for real, it will not take 20 or 30 odd seconds to run. Oh, it should be a lot faster.

I don’t know what this thing is doing now. Ah, there we go. For some reason, that took 10 seconds. I’m going to have to have a talk with SQL Server. Did some…

There was some… Oh, see? This is the problem. This is the problem. This is why I hate… This is why I hate parallel merge joins. Look at that. Isn’t that ugly?

Look at this. I bet this repartition streams has an order by, and we have introduced intratread dependencies into our parallel plan. This is a terrible idea for a SQL Server.

If you see situations like this, where you have anything that preserves or requires ordered operators around SQL Server and parallel execution plans and your SQL Server queries, run screaming. So let’s look at…

This one has an order by on. Does this one have? It does. We have two order preserving exchanges in a row. Under worst circumstances, this could have been really ugly. We could have had exchange spills.

We could have seen spills here and here and maybe… Actually, no, that’s about it. I doubt that one would have, but you could have seen these spill and have real big problems. This one probably came very close to it.

So be very mindful of that sort of thing. I’m going to go a little off script here. And what I’m going to do is just to see if my hypothesis is correct. I’m going to throw a hash join hint on this query because…

Actually, hash join loop join because I have a feeling SQL Server would tell me it couldn’t create a query plan just doing hash joins here. But let’s just see if this goes any better.

Maybe it will. Maybe it won’t. We’re going to find out together. All right. So yeah, this one took five seconds. All right. Notice that we don’t have terribly painful things happening in here anymore.

And now we get a very helpful bitmap up here. And this, rather than taking almost 10 seconds, this is down to five seconds. So a little off script, but, you know, if I don’t do that sometimes, I don’t know, I might appear a bit stodgy here.

But anyway, that’s one way of dealing with it. Another way of thinking about this is let’s actually let’s drop this index because what I want to do is show you something a little bit different than just indexing.

So let’s think about what this query is doing. We’re looking for this, right? The thing is we’re looking for answers.

But in the query logic, we care about showing you back up here. We care about where the answer score is greater than the max score on this. The thing is, the thing is we know something SQL Server doesn’t.

We know something about our data. And what we know about our data is that only queries with a post type ID of two can be an answer.

And we already have an index where post type ID is the leading key column. So if we were to rewrite our query to do this instead, we wouldn’t need that other index, right?

So if we actually take a minute to consider our query a little bit, this looks like a very similar plan to the one we had before just without the spool down in here. Now we have another index seek in there. So if we run this query and we wait, I can’t promise that this is going to be as good and fast as our hash join query.

It’s probably about a second slower. But we look at the query plan. So this finished in about 6.2 seconds. The hash join plan was about five or so seconds. But we really do improve this area in here, right?

There’s no more eager index pool. And this time we took advantage of our index on post type ID and parent ID. And we have sort of a double hop seek where we have parent ID and post type ID being seeked into.

So sometimes when you’re trying to fix eager index pools, if you don’t have a good index in place at all for the query, you may need to create one. If you already have a good index in place, you probably need to use a force seek hint.

And if you have an index that like a nonclustered index that’s like is just a little bit off from what you’re trying to do, pay attention to the way that your query is written because you might be missing some valuable information to give SQL Server in order to be able to use that index more effectively. In this case, we used our domain knowledge about the data in the stack overflow database to say, hey, we only care about comparing answer scores to other answer scores. We need to just find answers in the inner side of that aggregate that we’re comparing to.

So with that out of the way, I guess that’s the whole video. Cool. It only took 22 minutes and some odd seconds.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope to see you. Well, I hope you see me in the next video. If you don’t, well, it was nice. You might be dead.

That’s real sad. Yeesh. I don’t like the sound of that. Anyway, thank you for watching. Goodbye.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Fixing non-SARGable Predicates

Posted on January 27, 2025January 22, 2026 by Erik Darling

Indexing SQL Server Queries For Performance: Fixing non-SARGable Predicates

Thanks for watching!

Video Summary

In this video, I discuss the challenges of dealing with non-sargable predicates in SQL Server queries and how rearranging indexes can help improve performance. I share a practical example where a computed column is wrapped in a function within a WHERE clause, leading to suboptimal query execution. By creating an index that allows for better seeking, we demonstrate how to mitigate these issues without altering the underlying stored procedure, which might be immutable due to vendor restrictions or other constraints. This approach showcases a clever workaround using SQL Server’s indexing capabilities to optimize queries even when direct code changes are not possible.

Full Transcript

Erik Darling here. Lo and behold, with Darling Data. No acquisition letters yet. I applied to Y Combinator and I said, hey, anyone want to buy this thing? So far, someone sent me a bottle of lube and a can of pork beans. Those are the only offers I’ve gotten so far. I don’t really know what to do with that. I’ll probably just keep making these videos in the meantime. So anyway, we’re going to sally forth with our series on indexing to improve query performance in your SQL Server queries. And in today’s video, we’re going to talk about how you can rearrange indexes to fix non-sargable predicates. I know that there’s, I don’t know, maybe depending on what your reading habits are or what your interests are in general, you may have seen blog posts about using computed columns to do this. We’re going to talk about computed columns a little bit later in this series because I had a comment about computed columns that now has set me ablaze.

So this is not that, this is not about using computed columns. So if you are expecting that, well, stick around because we’re going to do something different. Maybe you’ve never seen it before. Again, depends on your educational background a little bit for SQL Server things. But before we do that, we of course need to talk about your financial background and your financial backing of Darling Data Enterprises. If you would like to support the content that I create on this channel, there is a link down in the video description that says become a member. And by clicking on that link, you can give me four bucks a month to support this channel, which after taxes is a little over $3. So who is the real winner?

I don’t know. If you, you know, I realize that it’s Christmas and while some of you might feel giving in the realm of $4, you may have already bought your, your, your Nana or your aunt or I don’t know, a step parent, some, some dollar scratch offs or maybe a couple of $2 scratch offs. And that, that, that $4 has evaporated from your wallet, from your financial world. In that case, you can always like or comment or subscribe.

I’m happy to see numbers of any sort go up except blood, blood pressure and cholesterol. It’s the only ones we like to see either go down or stay even. It’s the only things we care about.

If you need help with SQL Server, I am famous for processing. You’re performing Christmas miracles. So if you need any of this stuff done, if your SQL Server is slow, I don’t know. Do you sell a lot of things on Christmas?

Probably not. But you know, it is the holiday season when sales do tend to spike. So if you have, having SQL Server problems, I am a consultant that fixes them and my rates are reasonable. So there’s really no reason for you to go anywhere else or talk to anyone else because they won’t, they won’t do as good of a job as me.

If you want some great SQL Server training all about performance tuning and other, well, really just performance tuning. I was going to say other things, but I pretty much stick to the performance tuning stuff because I don’t want to spread myself too thin. If I start pretending that I know about availability groups, I feel like you would see through that charade pretty quickly.

But you can get all of mine. It’s about 24, 25 hours of content for the rest of your life for about 150 USD when you use Yonder coupon code. The fully assembled link to perform all of these Christmas miracles is also in the video description.

So you don’t have to do too much work or typing because I’ve seen, I’ve seen your typing. Not impressive. Mavis Beacon would be incredibly depressed if she saw you typing.

Rolling in her, rolling in her CD-ROM drive. Because she is. More news on events as 2025 progresses for now.

You must live in darkness. Sorry about that. But with that out of the way, let’s talk about fixing non-sargable predicates with indexes. Now, I think I dropped this index.

Yes. Well, if I didn’t before, I did now. So I’m going to start creating this and then I’m going to start talking. So I don’t, you don’t have to sit there and wait for an index to create. That’s just rude and crude and vile.

And for some reason, ZoomIt is taking its sweet time. All right, let’s move some of this stuff around a little bit. So it’s all on the same screen. We don’t need you struggling to read. I know I’ve, again, much like you’re typing, I’ve seen your reading and you are, you are not very good at either one.

So as a consultant who gets called in to deal with performance issues to this very day, despite the bevity of, be, be, be, be, be, be, be, be, be, be, be, be, be, be, be. Despite the bevy of knowledge distributed across the internet widely and freely that when you start wrapping columns in your where clause and your join clause and functions, even the built-in ones in SQL Server, you are bound to at some point have that cause some performance issue, big or small, it will creep up on you.

And the bigger your data gets, the worse these performance issues will get. I fix these things constantly. It is like an unwritten rule of my consulting engagements that I will be pulling is null and coalesce and substring and replace and all the other things out of joins and where clauses where people are just done, just plain dumb things with them.

And what, but what, what sucks is, gosh, the, the, I think the number of actually qualified third party vendors who make products that, that, that interface with SQL Server is, is very, very small. Even Microsoft’s own, like software products that touch SQL Server, the, some of the worst code and database design and indexing I have ever seen in my life. It’s like the people who make software at Microsoft are here.

And the database team at Microsoft is here. And there is like, just like this, like this silo bunker wall between them. Like there’s just no way to meet them.

Like even the people who work on like entity framework, like they just do things that like, if anyone with a reasonable sense of databases saw what they were doing, they would just hit them with a four by four. They would like hacksaw Jim Duggan, just whack them.

I don’t know where they get this stuff from. It, it, it, it never ceases to, I think they do they just, I think they just want to keep me in business. That’s what it feels like.

I mean, if that’s the case, thank you. I appreciate your contributions, but man, it’s astounding to watch. Now, this is not a giant big fireworks demo. This is, this is just to prove a point that you can, you can make a difference even if you can’t change the code, which was, I think was the point that I was going to make when, when I started talking about vendors and Microsoft and entity framework is that sometimes you can’t change the query.

Sometimes there are things and things that you are not allowed to change. Now, vendors might disallow you from changing store procedures. That might say that puts you out of support.

You might not be able to figure out how to fix whatever entity framework query is being generated. You might, you know, like you might have zero, like queries might come from like binaries or DLLs or something built into your software. And you might not be able to do anything with, with the change that, right?

Like, what are you going to do? You’re, you’re, you’re hosed on that. You can’t change that. The vendor has to change that. And the, you know, the vendor won’t do anything. Won’t lift a finger to help you.

Who are you? Anyway, let’s say we have this index, right? And we’re going to, we’re going to pretend that the store procedure below is completely immutable. We are not allowed to change it.

We, we, we’re going to, we’re going to, we’re going to, we’re going to straddle a fine line here and change an index. All right. We’re going to be, there’s a secret that you and I are going to have to keep. We are, we might be violating a EULA somewhere.

I don’t know. But this is the index that we have currently on community owned date and score. This is on the post table. In case you, in case, again, I know you’re, I know how you’re reading is. That’s the post table.

And let’s pretend that this queer, and I’m going to handicap this query a bit rather. I’m going to, I’m going to say, I’m going to hobble this query a bit. I’m going to, I like, I like misery as a movie and I like, I like the term hobbling. So we’re going to say, we’re going to hobble this query, but I like misery much better than golf.

In golf, you get a handicap and misery. You get your ankles broken, the sledgehammer, but we’re going to hobble this query a little bit to make sure that SQL Server can’t use batch mode to do anything better.

And we’re going to limit this to max.stop one so that I can sort of show you just how profound a difference you can get with these changes, because you know, who knows, maybe you had some, maybe you hired a crappy consultant who told you to set cost threshold for parallels to like 350 or something for no reason.

Just a wild guess. They were like, Oh, those C, look at all those CX weights. Let’s, let’s fix that. Because this is the kind of help you get at unreasonable rates. When you pay Erik Darling with his reasonable rates, you get good advice.

When you pay other people, I don’t know what you get. I’ve seen the results though. They’re not good. So we have this store procedure and this store procedure is going to break a cardinal rule of querying.

It is going to wrap the community owned date. There’s the lead column in our index and the coalesce function. We’re going to, I don’t know why people do things like this. They think that like, what do you think nulls are going to throw an error?

You think SQL Server can’t handle nulls? It’s your stupid programming languages that throw those null exceptions. It’s not SQL Server. But people do dump stuff like this all the time.

Don’t ask me why. I didn’t, I didn’t make them do it. But when we run this query and we look at the execution plan, we’re going to be unhappy, aren’t we?

We’re not going to be thrilled with the performance of this thing. It takes 1.5 seconds and it scans our index. All right. We have, we have a perfectly good index for seeking into the, the, the community owned date column, but yet we don’t.

We do not seek into this index. We scan the entire thing. It takes a second and a half and we’re, we don’t like that. So what we’re going to do, because we’re, we’re allowed, we’re, we’re, we’re going to pretend that we’re allowed to do this.

We can’t change the store procedure. We can’t just take that coalesce. Like we can make a copy of the store procedure where the coalesce is removed and we could run it side by side with that, with the other store procedure. And we could send the results to a software vendor and they would say, no, no, we’re not helping you, which is what happens all the time because software vendors suck.

But if, but let’s just say we were going to change this index and we’re going to use one of my favorite index options in the world, drop existing equals on, right? Just slide this index in place, drop it on in, which, which is good for us.

So, uh, now when we run this query or sorry, when we run this store procedure, it does not take a second and a half. It takes about a hundred milliseconds because now we can seek into the index for the score, to the score column.

And we just have that stupid residual predicate, right? We now have a SQL predicate on score and we just have the awfulness that is the coalesce function, uh, running its case expression on the community owned date column. And that’s a residual predicate, but we can, we can at least seek to the scores we care about.

And that’s, that’s the good part for us, right? We no longer have to worry about, um, all that, uh, scanning of the index because of the coalesce function. Now we can seek to the, the, the, the scores that we want and then evaluate the community owned date second.

And there’s a lot less over, or there’s a lot less pain in that. There’s a lot less suffering than that. It’s far less overhead because we have already sought, you’ve already done a seek to the limited number of values we care about and applying that residual predicate is not a big deal. If the score column were on a, the score come at a less selective predicate, you know, might be, might be a seek of the whole table plus a residual predicate of the whole table.

That would potentially be less, that would potentially be equally as unpleasant as just a scan of the whole table. But, uh, we’re going to pretend that we’re, we’re a little bit smarter than that. So anyway, this is how you can re, uh, arrange your indexes to beat non-sargable predicates.

Uh, often SQL Server will do the, the smart thing and choose better indexes when it can seek because it thinks, you know, it’s like, Ooh, look at this nice cheap seek I can do. Uh, SQL Server being the cheapskate that it is, will often choose the smarter index in this scenario.

Even if you had another index hanging around, um, if like, you know, in this case, we change an index. You like, you know, if you, you, you, you run the SQL Server that your software runs on, uh, that started the vendor software runs on, you might just create a new index.

And then, you know, when you have to do a software upgrade, you just quietly drop or disable the, the custom indexes you’ve created. So the, the, the, the installer doesn’t explode and say, how dare you try to do better than us? You know, like, cause you’re all idiots.

You know, I don’t know. I don’t understand how you get paid to do what you do. I don’t understand how, how your software company makes hundreds of millions of dollars a year. The product is garbage.

Ah, well, what can you do? SQL Server is $7,000 a core and it can’t even handle a column wrapped in a function still. So, one wonders how, how deep the rot goes sometimes, I suppose. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. Uh, I hope that you, uh, are, are happy and healthy and feeling loved and appreciated by all those around you. I don’t know.

Maybe you have, maybe you have a cute dog or something. Hmm. Like, licks your hand and take it for a walk. Sounds nice, right? Smoke a cigar.

Anyway, uh, I’m gonna go record something else now. I think I have another video. Uh, apparently I have 50,000 videos to record. So, um, I’m gonna try to get that done. Anyway, thank you for watching.

Goodbye. Bye.

Going Further

Indexing SQL Server Queries For Performance: Predicate Key Lookups

Posted on January 24, 2025January 22, 2026 by Erik Darling

Indexing SQL Server Queries For Performance: Predicate Key Lookups

Thanks for watching!

Video Summary

In this video, I dive into the world of indexing for performance optimization, specifically focusing on fixing predicates in key lookups. I explore how these lookups can become problematic when queries suffer from parameter sensitivity issues—those pesky “sensitive Sally parameters” that make query plans unpredictable and inefficient. The goal is to reduce the time it takes for a user’s query to return results, ensuring every second counts. Throughout the video, I also highlight ways to support my channel through membership, offering training and consulting services for SQL Server performance tuning, making sure you have all the tools needed to tackle these challenges yourself.

Full Transcript

Erik Darling here. Darling data. I’m a little fuzzy here. Let’s unfuzz myself. There we go. Now you can see everything in crisp, clear, high definition detail. In today’s video, we’re going to talk about indexing for performance, specifically around fixing predicates in key lookups. Now, I’m going to talk about exactly what all that means. when we get to the demos. But it is generally a sensible approach to take when index tuning, because the last thing you want to do every time you make a trip back to the clustered index is evaluate if a predicate passes one of your filters. You don’t want to do that. It’s not good for performance at all. It’s a bad time, especially when you have queries that suffer from parameter sensitivity issues. Which, oh boy, don’t we all have those. sensitive Sally parameters. But before we get into that, let’s talk about things that I am sensitive to. Like money. I do need it for things. If you would like to become an appreciative member of this channel, and support my endeavors in making this content for you, you can go to the video description and click the link next to become a member. And you can become a member for as little as $4 a month, of which I will see $3.10 post taxes. If you have spent all your money on dirty women and booze or whatever else you spend money on. Grass seeds, lawn mowing equipment, great outdoors, I don’t know, hiking gear. You can like, you can comment, you can subscribe. Always to keep me from bashing my head against the wall. It’s a great time.

If you need help with SQL Server because you are having problems like the ones that I describe in these videos. Or maybe you don’t know what kind of performance problems you’re having and you’re just looking for someone to untangle all that for you. Oh gosh darn it, I am the best consultant in the world. At least for SQL Server. For other things, probably not so much. I can do all of this stuff. And as always, my rates, they’re reasonable. You can get all of my training for $150 just about for the rest of your life. If you use that discount code at that training site up there, it’s a very good deal. You should you should get it. It’s not just for Black Friday. It’s for every day. So aren’t you lucky? No upcoming events until later in 2025. I’ve said this a million times. We’ll talk about this stuff later. And with that out of the way, let’s get on to talking about these here predicate key lookups. So our goal as query tuners, as people who attempt to make things faster in exchange for money, is to reduce the amount of time it takes from a user pushing a button until that user gets a result. I don’t care what other metric you reduce or increase in that endeavor. You could make logical reads go up or down. You can make CPU go up or down. You can make memory go up or down.

But as long as the duration, the wall clock time, the tick, tick, tick, tick, tick, tick, tick, tick that it takes before that query returns results to someone gets shorter, I think you’ve done a pretty good job. Sometimes we’ll have to make queries go parallel to get faster. Sometimes we’ll have to use more memory to avoid spilling to disk or something. Sometimes you might have to use more space in tempTB by putting some result in a temp table first. Sometimes we need to add indexes to our database in order to make queries faster. These are all tradeoffs. All things that represent tradeoffs when you are query tuning.

Yes, sometimes you have to give up a little of this stuff in order to make the query faster. It’s okay. This is what databases were built to do. Deal with these tradeoffs. The only real exclusion to any of this stuff is, of course, a query rewrite. If you can just change the way that a query is written fundamentally in order to make it faster, great. You have sacrificed very little except your time and knowledge and fingers and typing and probably some portion of your humanity.

You know, like I’m not going to tell you to add a hundred indexes to a table to make some weird queries faster, but sometimes you do. Sometimes if a table has like no, like just a clustered index on it and you’re like, well, gosh, we’re not just searching from that thing. You’re, you’re okay. You have some freedom to add other indexes in. Other times you have to use some judgment.

You may have to consider what indexes are already there first. Sometimes altering an existing index is a better choice. Sometimes there are a lot of unused or already duplicative indexes you could clean up before adding another one to the pile. There are many things that you could consider that you could use the old jogging noggin and you could figure out first.

Now, like I said, there are various things in queries that are, that force the SQL Server to do dumb things. Some of those dumb things are things like local variables, table variables, non-sargable predicates, overly complicated joins and where clauses and stuff. There are all sorts of things that, you know, you can untangle for free that don’t involve you adding another thing to the database, whether it’s an index or whether it’s a temp table or something like that.

There are all sorts of things you can do that, that don’t have much of a trade off. But changing those things might change other stuff. Like you might get a parallel plan. You might use more memory, something like that. So, you know, I tend to, you know, I like rewriting a query, but sometimes all the query rewriting in the world isn’t going to change the state of the database in a way that makes that query any faster.

Now, key lookups represent a choice made by the optimizer between scanning a clustered index where all of the columns are available in the table. Sometimes it helps to, instead of thinking of things as a clustered index, to think of a table as a clustered table if a clustered index exists on that table. But the clustered index makes all, has all of the columns available in it.

And sometimes SQL Server thinks that it is cheaper to do one big scan of the table, have all the columns available, and have, be able to touch all of the data that it needs from one single place than it is to seek or scan into a nonclustered index also on the table, and then go back to the clustered index in order to retrieve columns that are not part of that nonclustered index to satisfy the remainder of the query. Typically, lookups get chosen when there’s a relatively small number of rows that are expected, but, you know, all sorts of things play into this.

And sometimes the number of rows that are expected are not the number of rows that end up making that transversal in reality. That traversal in reality. I guess I should have said. I don’t know what a transversal is. All right. Ah, screw it. Bad estimates in general.

And, of course, self-inflicted bad estimates. You know, again, local variables, table variables, non-soluble predicates, things like that, are a very way to eat, all things that can contribute to you having those problems.

But we’re going to look at an issue with parameter sensitivity around lookups. So, I think a lot of the times fixing lookups to avoid just columns you’re selecting is almost a last resort for me. There’s a lot of other things I would rather do than that.

But there’s only so much time in this video that I can talk about those other things. So, we already have this index created on the post table. And let’s pretend that this index was created by some long gone developer in the past.

And this query was great. Sorry, this index was great either, I don’t know, for the query below before we added more stuff to it, or for a different query altogether.

This index made perfect sense. And since this index made perfect sense for some other query, this query comes along and says, hey, you make perfect sense to me too. Now, there are a couple of things in this query that you might notice.

If you look up at the index definition, it is on score and then creation date and then post type ID. And that does take care of most of what we’re doing in here. It even takes care of most of what we’re doing in here.

Now, just make sure that everyone peeps this in their head. If you have a clustered table or a clustered index on a table, right, it’s the same deal. The table is clustered.

The clustered index key column or columns will be part of every nonclustered index you create. In a non-unique nonclustered index, they are an additional key column. In a unique nonclustered index, they are an included column.

So the ID column on the post table is the clustered primary key. So this is part of this index up here. You can just pretend it’s right at the end here.

So this does take care of most of what we care about. Now, the other thing you might notice is that I am doing select star. The fun thing is, it doesn’t matter how many columns are involved in the select or how many columns are involved in the lookup that you might do.

Every lookup has the exact same optimizer cost. It doesn’t matter if you’re selecting one column or a thousand columns. Every optimizer estimate for that lookup will have the same cost.

It doesn’t matter how many columns you select. So if we run this query, rather run the store procedure with a set of parameters that will touch a small number of rows, this will run relatively quickly.

Right? 326 milliseconds, but we have this lookup over here. And you might look at these numbers under the lookup and think, gosh darn, that looks funny. 7 of 20844467.

That’s 2.2 million something numbers. And the reason why that estimate looks so funny is because this lookup has a predicate. Now, every lookup will have at least a couple things in it.

Well, actually, that does depend a little bit. Every lookup will have at least this one thing in it. This seek predicate, which you can kind of see is sort of like a self-join because we’re joining the post column to itself on the ID column.

That is why the clustered index key column or columns end up in your nonclustered indexes. So SQL Server can do neat stuff like this. It can take that ID and it can look up rows in the clustered index from the nonclustered index.

Now, the two things in a lookup that may or may not be there. One is an output list. If you are just applying a predicate, actually, that might show up in the output list as well.

This might always be there. But the important thing is that this one has a predicate up at the top. So you have the output list, which are columns that are being selected.

And you have the predicate, which are additional filters that are being applied when we make a round trip back to the clustered index. So for every row that comes out of this nonclustered index here where we seek into it, every row that comes out, since this is a loop join, it’s one row. Boop, boop, boop, boop, boop, boop, boop, boop.

One row comes out and then we evaluate another predicate there. So we’re able to do our initial set of filtering. We’re able to seek to some of the filters we care about here and apply a predicate to some other filters here. I guess there’s a little bit of overlap in there just because of the state of things, but sort of beyond the point of this.

But the seek here does allow us to get to some of the rows that we care about. But then we have to filter out additional rows that we care about. Look how much this reduces things by.

Well, we get 327,567 rows out of here. That means we make that many round trips down to the clustered index here to do those lookups. So we made 327,567 round trips.

And at the end of all of those round trips, we ended up with only seven matching rows came out of that. That’s why for the rest of the query, we only have those seven rows here. So we were able to do some of the filtering with the nonclustered index, but not all of the filtering.

So, you know, like one thing that’s sort of important to figure out is what’s going to happen when we put it, we pass in a different set of parameters. So the set of parameters we passed in up here were a score of zero, which is we’re going to find every score greater than zero, which is most all of them. We’re looking for this is a creation date.

So just the last couple months of 2013, which for the stack overflow 2013 database is actually pretty tiny, too. And post type ID of one. Well, there’s about 6 million questions in the post table.

So, you know, this this thing doesn’t really filter stuff out. This thing does a pretty good job of filtering stuff out. And then this this is this is what we evaluate in the lookup because this owner user ID column is not in our index. So if we run this for a more set of parameters that has more rows come out of that initial nonclustered index seek, this is where lookups start to get painful.

Because what was a pretty sensible plan that ran in like 300 something milliseconds for that initial set of parameters does not do well with this one. We end up with a lot more rows. Now, keep in mind, we have the same cardinality estimate here that we did in the last plan.

In the last plan, this was high. In this plan, the number of rows that we get out is 1884% higher than our estimate. So we make a lot more trips back to the where is it actual number of rows.

We make a lot of a lot more trips back to the a lot more round trips back to the clustered index in order to get what we want from this thing. Right. So that remember, that’s still one row down here and then into here. Right. So one row at a time.

We do that and we end up with twenty seven thousand eight hundred and sixty nine matching rows from here. And that’s why we have. But you can see like the majority of the time is that we have about a second there, about a second and a half there. So that’s three and a half seconds total.

And then a little additional time in the nested loop. And then for some reason, the gather streams has a lower number than this. And then we go into the sort and we spend about twenty five more milliseconds there. One thing that’s important to do is when you’re looking at parallel plans, try not to focus too much on the execution time of exchanges.

They are a weird black hole of mystery and the code and all this. The code and all the operator timing stuff is very strange around them. So try not to look at those too much.

Anyway, if knowing what we know about indexing, we could do one of two things. We could change our index and we could add owner user ID as a last key column here. This would at least allow us to do all our filtering from one single index.

If you’re able to do this and get good enough performance, don’t sweat it. That’s totally fine. Adding one more column to the end of the key of an index is not going to change anything for any other queries that touch this index. But if we apply a little bit of domain knowledge, right, we might want to think about either adding a new index that better suits this query overall.

Or we might want to think about maybe if we have really good domain knowledge and like we’re like this index sucks and it, you know, never helps anything or rather it makes everything slow or blah, blah, blah, blah, blah. Every query that touches this index is awful. We might just we might just replace the index entirely with one that better satisfies what we’re searching for.

So owner you in our query owner user ID and post type ID are equality predicates and then scoring creation date are inequality or range predicates. We’re doing greater than or equal to on both or on both of these. And then, of course, if we do equality searches on owner user ID and post type ID, then score is in order for us here, which means that when we run these two queries, these will be a lot faster in general.

And we don’t have to we don’t have the same parameter sensitivity issues when these things run. One other thing that’s important for these, which, you know, might be important when in queries that you’re tuning, is that since we have our index set up correctly, we also don’t have to deal with sorting this data. Remember, our query from up here a little bit, if you’ll permit me to scroll, is asking.

Remember, we have our equality predicates on these two columns and we have our inequality or range predicates on these two columns. And then we’re ordering by these three columns. Now, ID is still going to be that last key column in the clustered index and the nonclustered index that we have because it’s a clustered primary key.

And then score and creation date are going to be in order because we have searched equality predicates across the leading two columns in the index. So we avoid having to sort data altogether. We have a top in here, but we do not have it’s not a top and sort.

And we do not have an explicit sort to put our data in the order that we care about because the index does that for us, which is pretty cool. So when you’re troubleshooting query performance, whether it’s parameter sniffing or just bad estimates or whatever it is, and you have a key lookup in your plan that is evaluating a predicate like I showed you in the earlier query. And, you know, you get the actual execution plan and you spend a lot of time in that loop.

It is usually worth fixing the predicate part of that lookup by adding whatever column or columns are being evaluated in the predicate to whatever index is being used by the query already. You might also choose to add a new query or add a new index that better satisfies the query as a whole. That’s also an option to you. Or you might want to, you know, replace the current index because maybe it’s just not good for any searches that touch it.

But that’s a much, much more difficult thing to ascertain just from query, tuning a single query. You’d have to know the workload pretty well to make a call like that. But anyway, this is a good thing to fix.

This is like honestly something that, you know, I have to do quite a bit is, you know, fixing up indexes a little bit to make sure that they are, you know, that they, they, they act, they act, they act. And I don’t want to adequately service the queries that are touching them because a lot of the times I’m dealing with queer indexes created by developers from, you know, all walks of life and all levels of experience with SQL Server. Some of them maybe knew what they were doing better than others.

But this is something that I do have to fix quite a bit. So hopefully you can use this knowledge to start fixing things for yourselves. So anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in the next video, which will be about something. I forget what, for some reason, tab number five is hiding from me. So maybe that’ll just be a nice surprise for all of us when I, when I get to that.

So anyway, thank you and goodbye. Bye. Bye.

Going Further

Indexing SQL Server Queries For Performance: Unpredictable Searches

Posted on January 23, 2025January 22, 2026 by Erik Darling

Indexing SQL Server Queries For Performance: Unpredictable Searches

Thanks for watching!

Video Summary

In this video, I delve into indexing strategies for handling unpredictable searches in SQL Server. Unpredictable searches are essentially any queries where the WHERE clause, JOIN conditions, or selected columns can vary widely—think dynamic SQL within stored procedures or ad hoc queries from front-end dashboards. I emphasize a common pitfall: using double wildcard searches on every column with a parameterized search string, which can lead to significant performance issues due to full table scans and implicit conversions. To address this, I introduce the `RECOMPILE` hint as an effective solution for many scenarios, explaining how it allows SQL Server to use actual parameter values in the query plan, thus avoiding caching plans that might not be optimal. Additionally, I discuss the challenges developers face when dealing with such queries, including the lazy approach of relying on `RECOMPILE`, and highlight upcoming features like Microsoft’s new “Oppo” optimizer for handling optional parameters more efficiently in future SQL Server versions.

Full Transcript

Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Erik Darling here with Darling Data. As you’d well expect at this point, if you’re surprised by any of this, I don’t know what’s wrong with you. Maybe you should start taking notes about your day. You might have some sort of head knock that’s plaguing you in some way. I would suggest getting that checked out by a medical professional as well. At this point, I feel like this is a well-established set of facts. I am Erik Darling, and my company is Darling Data, and we talk about SQL Server. That is our goal. That is our role goal. In today’s video, we are going to talk about indexing for performance with unpredictable searches. You can consider an unpredictable search as anything that generates a query. Intel. Shut up. Intel. God damn it. Anything that generates a query that you don’t know what the WHERE clause is going to be, or maybe you don’t know what the JOIN sometimes are going to be, or what columns are going to select, or something like that.

It could be dynamic SQL in a store procedure, or it could be some ad hoc query generated by a front-end dashboard type thing that you have created, hopefully in a sober and thoughtful way. But that is what an unpredictable search is for me. Now, you have no idea what users are going to ask for, or maybe even what quantity they are going to ask for it in. There are all sorts of crazy things that you have to account for. But of course, before we talk about how you can account for that, we must talk about other accounting matters. More like my accounting matters. We are being honest here.

If you would like to support the content that I create on this channel somewhere in nearer around this channel for the low, low price of $4 a month, you can click the little video description in the video description. Click the link in the video description. There we go. Those are the right, that’s the right chain of words. And you can become a member. $4 a month. That’s about $3.10 for me after YouTube takes out taxes.

If you are just too encumbered by other debts and ransoms and your money is already allocated fully to other endeavors, you can like, you can comment, you can subscribe. There are all sorts of things you can do to let me know that you love me and would die for me. If you look at the things that I talk about on this channel and the first thing you think is, gosh darn it, that Erik Darling sure can make SQL Server faster in exchange for money.

Perhaps I could do one of these things for you because I’m very helpful when it comes to these things. And not only am I helpful, but my rates are reasonable. And while we’re on the subject of reasonable, gosh darn it, have you ever wanted to get 24 hours of SQL Server Performance Tuning training for $150 and not have to pay another $150 every year for the rest of your natural life?

You can do that with my training. It’s amazing. You click on that link, you enter that discount code, and kaboom, you have what I just described. It’s pretty wonderful.

Mouse. No upcoming events 2025, blah, blah, blah. We’ll talk about that later. But now let’s talk about this indexing conundrum that we have. First, I’m going to close event viewer.

I was troubleshooting a blue screen earlier. I know. Do my talents end anywhere? Yes. Pretty much where the computer ends.

The limitation right there. Where the computer stops. No idea what I’m doing outside of that. It’s funny how that works.

So we’re going to talk about one of my favorite things. I’m going to move some of this text down a little bit there so it fits on the screen better. One of my favorite things that I help clients with are big ugly queries. I know. Weird that a performance tuner likes big ugly queries.

Suppose it’s a lot easier to take a big ugly query and chop out all the easy stuff and make it go faster than to take what looks like a perfect query and tune that better. But, you know, some people just don’t know what they’re doing in either case. So that’s why people like me exist.

So one of the most common things that I see people start with looks something like this. What you see my head is mildly infringing upon. Where someone will have a parameter or whatever called search string.

And that search string will be used to double wildcard search every column in a table. This is one of the worst possible things you can do if you need to handle unpredictable searches in SQL Server. I beg, I beg you not to do this.

This will end up terribly. You may, you may look at some of the columns in this list like owner user ID and creation date and last activity date and think, hmm, Eric, I don’t think those are strings. And I’d say, well, you are right, but you’re searching them like strings and all hell’s going to break loose.

It’s not going to be a good time for you or your SQL Server. So, like, you really shouldn’t be doing anything that even looks like this. The problem is somewhat obvious if you’ve been using SQL Server for long enough.

Double wildcard searches and searching columns with mismatching types will lead to not only big scans because of the double wildcard, but also you’re going to be dealing with implicit conversions. Everything, the title column actually is a nvarkar 250 or something column.

So that would at least not result in an implicit conversion right there, but the rest of them you are hosed on. It’s going to be a bad time. What you might think is that using something like some of the built-in functions in SQL Server might be more highly performative, not performant, we don’t say that word in this channel.

I might beep that out later if I can figure out how to beep things out. You might think that it might be more performative. There might be more performativeness if you use car index or pad index, but generally there is not. There are some edge cases where I’ve seen them, like, do somewhat better, but it’s not really worth talking about.

But the problem in, like, the way that you deal with null parameters, if you use, like, isNull or coalesce or anything like that, it’s not a more clever scenario than doing something like this. Both of these end up with the exact same problems unless you use a statement level recompile on your query, right?

Something like that. If you do this, you can get around most of the issues that come out with these unpredictable searches. Most of the time, this will be Mr. Fix-It for you, right?

This will do okay. And for a lot of people, this is the path of least resistance, and this is the simplest thing to do, because the alternative is writing a lot of dynamic SQL to cope with which parameters actually need to be part of the where clause. I’m going to show you an example of just using recompile, it being nice and easy, and as long as you’re not allergic to compiling a query plan every time a thing runs, whether it’s because, you know, you are burned out on CPU anyway, if you have CPU slack, and you’re able to compile queries whenever you want, this is a perfectly fine thing, as long as that, like, the compile time for that query is not awful.

If you, every time you, every time you run this query with a recompile, it runs for a long time, but the time isn’t, like, in the query plan, you might want to check the compilation time to make sure SQL Server isn’t going off on one of its little, one of those little, like, thought cloud staring things. So there is that.

Now, the problem with developers is that they are often quite lazy in the database. They often think that they are far more clever than they are when it comes to things in, doing things in the database. You know, SQL Server is an expensive piece of equipment, but people treat it, it’s kind of like a garbage dump for, like, their worst code.

The recompile advice is generally good enough, but when you use it, again, just pay attention to compile times. There’s like a certain, like, you know, if it’s like, you know, a few hundred milliseconds, screw it. But like, if it’s, you know, getting up into like five, 10 seconds of compile time, it might be, you might have to think about alternate ways of doing this.

SQL Server right now doesn’t, right now doesn’t offer any great programmability or optimizer support for the types of queries I’m talking about. However, it was, it was a thing that came up in the SQL Server 2025 release notes that there is a new optimizer feature, feature called Oppo, which is the optional parameter problem orifice, or I forget what the other PO is for. The optional parameter problem or something.

So it looks like Microsoft is taking some steps to try to address this in SQL Server 2025. So, you know, sooner or later, you will probably see that in Azure SQL DB. And if Microsoft ever decides that it cares about managed instance again, you might even see it come to managed instance.

Microsoft’s track record with getting things cloud first has not been awesome lately. There are signs that there may be a return to that ethos, but who knows? It’s a bit weird.

But anyway, sometimes writing good queries does require extra typing and thinking and that’s often not what developers are famous for. You want to start throwing weird features at it like Hecaton and memory stuff. You feel like you need to partition your table when you don’t.

You want to start, you want to move to Postgres because you swear Postgres will just do better at everything. You’ve heard it’s this magical unicorn and everything’s better there. Or you want to start using NoSQL, you know, stick it all in Elasticsearch or MongoDB or whatever.

Or you decide that maybe you just need to build your own ORM. Start from scratch. It’s 18 to 24 months of work.

Far more interesting than five to 10 minutes of typing some extra characters into SSMS. So, if we have this query and like a store procedure, let’s say, this gets a little more convenient to show this here than like a dumb front end that I would make badly anyway. And we stick a recompile hint at the end of this thing.

This will do okay. Without the recompile hint, this will go really, really poorly. You might notice that the time over here at the end is six minutes and 17 seconds. That is a very long query.

A lot of that is because of a bad memory estimate where the sort does take an additional like six minutes and three seconds. Because when we get up to this filter operator, we’re at 13 seconds, which still isn’t great. Right?

We can see very clearly that we spent a lot of time in other places in the query leading up to that six minute ordeal. But the main problem is without the recompile hint, the predicates in your search, and this doesn’t matter, again, if you use is null, coalesce, whatever other clever arrangement you think you’ve found that makes the optimizer do smarter stuff. That predicate is always going to look something like this.

This is because SQL Server has to cache a plan that’s safe for any outcome of these parameters being null or not. The recompile hint gets around this by allowing for something called a parameter embedding optimization, which allows SQL Server to infer the parameter, take the parameter values and use them in the query as literals. So it doesn’t have to cache a plan that’s safe for anything.

It can just, it can just say, hey, there’s a plan for these values. This is as good as I can do, which may not always be great either. But that depends on a lot of other things like indexes and whatnot. So you should really avoid this sort of thing without the, without the benefit of the option recompile hint, because it will, it will go quite poorly.

If we do use option recompile with this store procedure, things will go generally okay. Now, the thing that I run into a lot is that with store procedures like this, there’s usually a value that people think users will always search on. And they’ll design an index or two that makes sense for those types of queries.

But then as soon as you depart from that, as soon as that one value isn’t involved, things get really, really bad. That’s especially true without the recompile hint, because you cache a plan and SQL Server reuses it and you might have used the entirely wrong index for that query. So for our purposes and for the intent of the query, let’s pretend that we think we’re always going to be searching on owner user ID.

And we know that we’re always going to be ordering by score descending. That’s exactly what our query does up here. We have an equality predicate on owner user ID and we have an order by score descending.

This does get a little bit more complicated if you have dynamic sorting allowed as well. But we’ll talk a little more about that when we get it somewhat further down. So we can design an index that looks like this, that takes care of our immediate equality search and our ordering.

Right. So we can search to whatever user IDs we care about and we have the score descending column in order based on that equality search. And then we can put any secondary search columns over here. Now I’ve done that for the date columns.

These are going to be residual predicates. You cannot see to these because the score column is in the way. Again, we’ll talk about that a little more in a moment. But since post type ID is an incredibly unselective, it’s a very dense column. I have stuck that in the includes because it doesn’t matter so much in this case.

So I already have this index created, I believe. I mean, it should anyway. Yes, I do.

Good. We got an error there. I’m smart. So like if I go and run these two queries with where I do use owner user ID, like the plans that I get from these are perfectly fine. Right.

We get two seeks. The SQL Server does okay enough cardinality estimation from these. No one is upset or hurting from this. And where this query. So like what the parameters that I used for the very slow plan up here.

This were these were actually the creation date and last activity date parameters that you see in here. These things. So that’s what I passed in to get this slow query without the option recompile with option recompile.

SQL Server makes a much better choice overall. Just chooses to scan the clustered index. Doesn’t choose to reuse a plan.

Gets close enough to, you know, okay cardinality estimate. And like, you know, this thing finishing in under a second without it without a good supporting index. Totally reasonable.

Now, where things get tricky, of course, is that you when you start writing queries like this that are unpredictable, you have no idea what set of indexes to roll out because you have no idea what are going to be the most common set of search criteria that people pass in. And it takes kind of a lot of a lot of sort of like logging and analytics to figure out what the most common search things are in like, like, like if they’re slow or not, and then how to index for them. And you can end up with a lot of different permutations of sort of a similar index definition when you when you when you go down that route.

Now, this is a pretty simple thing because we’re just hitting one table, right? We’re just we’re only hitting the post table and we have all sorts of search parameters against that. So, but, you know, if we were to think about this for a query that might touch more than one table, you might be dealing with more than one search element and things might get really tricky.

Because now you have to think about indexing multiple queries and taking into account join keys and stuff like that. There’s all sorts of things that get really, really tough and complicated with this. So, if you don’t have the ability to rather let’s say much like the $4 a month that you could use to become a member of this channel, let’s say you can’t afford to recompile this query every time.

The way to get around that is to write dynamic SQL, the safe parameterized kind that avoids SQL injection and has equivalent plan reuse functionality to any other store procedure that you would run. Because we execute the query using SP execute SQL, which is a stored procedure and you get the type of parameterization and plan reuse and all that other good stuff that you would find in other cases. But you would use that to build a sort of a custom where clause based on what parameters apply to the particular search that you’re running.

So, if we recreate this with all that in mind, Oh, jumped around on me a little bit there. These things will all still be okay.

All right. We’re going to get, you know, pretty much equivalent performance to what we saw with the recompile. We get the two seeks here with the, you know, I mean, we do reuse this plan. So, the cardinality estimate is reused, right?

We can see that there is plan reuse with this. So, this guessed one of 82 and this is 5,000 of 82. So, the guess of 82, rather that cardinality estimate of 82 rows persisted there. And that also lives on with this.

If we executed more than one variation of this, that we would reuse the estimates for that plan, right? So, no big deal here. But again, coming back to what’s difficult with these things is knowing how often they run. Query store is pretty helpful for this sort of thing.

Because, well, if you’re smart and when you write dynamic SQL, you put in a comment with the procedure name that something comes from. You can use my free store procedure, spquickiestore, to search query text for this type of token. And you can find all the queries that run and see how many executions they have, which is a very useful thing.

So, the problem with rowstore indexes is that the order that the keys are defined in defines how queries can access stuff. Now, there are clients who I’ve been working with for years and I’ve been talking about indexing for years. And they’re still unclear on the fact that if you have an index like this that leads with owner user ID, that is not the same as if you have an index where owner user ID is like the second or third or fourth or fifth key column in the index.

Like, by that point, you have lost any useful sorting for searching stuff. Multi-column indexes really are only useful for either things that search for the leading key column or things that search for the leading key column and then other stuff. Right?

Or, you know, ordering and other things. But, you know, if we’re thinking about just being where clause centric here, this is really when, like, when rowstore indexes, multi-key rowstore indexes are very useful because you can seek to owner user ID and then you can seek to score or order by score. And then you can seek to creation date or last activity date or post type ID or ID.

You could, you could seek, you can seek across all the keys, but you have to start seeking with that first key column. Otherwise, it’s just a scan of everything because there’s no helpful, it’s not ordered in a helpful way after that. So if we, if, so like, for example, like with our other query where we need to search on these two columns, we’re not accessing these two columns first via where clauses or anything.

So everything is all out of sorts for those, for those searches. That’s why we ended up scanning the clustered index. So what you generally want to do is, if you can, and there are lots of considerations for this, and this is, you know, we’ve already kind of gotten to the 20 minute mark here.

And somewhat, some of you may feel dissatisfied that I have not gone into all of the potential, you know, things that might come up by using columnstore. But it’s sort of generally creating a non-clustered columnstore index, right? And just because it doesn’t say non-clustered in here doesn’t mean it’s not.

I can’t create a clustered index on the post table. It already has a, I can’t, rather, I can’t create a clustered columnstore index on the post table, because it already has a clustered rowstore index. So it is creating a non-clustered columnstore index.

So in general, if you create a clustered columnstore index that, you know, spans the columns you care about searching in your query, things will end up a lot better, and the unpredictability doesn’t matter as much. The reason it doesn’t matter as much is because columnstore indexes don’t have that column to column dependency that rowstore indexes do.

Each one of these columns is stored in its own little, like, vertical up and down columnar index within the columnstore index. And there’s, like, lots of little mini indexes, sort of, unlike all the different segments and row groups. So you can pick and choose which columns you want to hit.

So if you want to hit owner user ID and creation date, or you want to hit score and last activity date, it doesn’t make any difference to the columnstore index what order you search for things in or anything like that, because each one of these columns has independence from any other column that might precede it in the key of the index, unlike rowstore indexes.

So you can make all of your search query, you can give all of your search queries a single, good, highly compressed, very nice place to search for data from, without having all the problems that, of creating multiple non-clustered rowstore indexes in order to try to satisfy every different permutation of a search. So the way that I love to handle this sort of thing for clients, and as long as, like, you know, the shoe fits with using columnstore in their database, you know, like, I think my biggest consideration is, like, if we’re relying on query parallelism to make things fast and you’re on standard edition, we need to really carefully test the columnstore thing because Microsoft hates people on standard edition, and it limits all of the columnstore parallelism to a DOP of 2.

You cannot exceed a DOP of 2, so if you have a query that runs really fast at DOP 8 using rowstore indexes in standard edition, and you start using a columnstore index and you’re like, wait a minute, my queries got slower, even though I’m using batch mode in the columnstore index.

Well, it’s because your query is limited to a DOP of 2, and you can’t do nothing about that. So the way that me, Erik Darling, with Darling Data, likes to handle unpredictable searches by using columnstore is typically a lot easier and more efficient than creating, like, 17 different rowstore indexes to account for everything that someone might search for.

So we’re going to wrap this one up here. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something, and I hope that you will continue watching.

I forget what the next video is about, but it probably, hopefully it won’t be as long as this one, because, you know, the tongue does get tired. And, you know, there’s a wise man who once said, the reason dogs have so many friends is because they wag their tails, not their tongues.

I guess that explains why I spend so much time alone. Anyway, thank you for watching. Thank you for watching.

Going Further

Indexing SQL Server Queries For Performance: Equality and Inequality Predicates

Posted on January 22, 2025January 22, 2026 by Erik Darling

Indexing SQL Server Queries For Performance: Equality and Inequality Predicates

Thanks for watching!

Video Summary

In this video, I dive into the intricacies of indexing queries for better performance in SQL Server, focusing specifically on equality versus inequality predicates. As we explore common scenarios and problems encountered when designing indexes, you’ll learn how to optimize your queries so they can take full advantage of those indexes. The discussion covers a range of topics, from understanding why certain index designs might not perform as expected due to non-selective columns required in every query, to strategies for rewriting both the index and the query itself to improve performance. By the end of this video, you’ll gain insights that can help you avoid common pitfalls and make more informed decisions when designing indexes for your databases.

Full Transcript

Erik Darling here with Darling Data. We’re going to embark on an adventure together, you and I. We’re going to delve deep into the caverns of how to index queries for better performance. I obviously can’t cover every single scenario, but I’m going to cover many of the most common ones and some of the most common problems that I see when I’m doing my consulting work. And hopefully you will get a at least enough knowledge to be dangerous in your own work. In this first video, we’re going to talk about equality versus inequality predicates, because the way that you design your queries and the way that you design your indexes really has to kind of go hand in hand so that you make sure that your queries actually use your indexes well. And if you have queries that are not performing well, perhaps the way they’re written is the reason why you’re not taking full advantage of those wonderful indexes that you have. Perhaps, you can stop worrying about fragmentation and start worrying about real problems. Be a grand thing. But before we start talking about equality and inequality, I promise we’re not going to go off on any soapbox tangents here about the world at large. But before we do that, if you would like to become a member of this channel and support the channel and all the work that I do to bring you this wonderful content, there is a link at the very top of the video that says become a member in which you can click on that very link.

And you can give me a minimum of $4 a month towards continuing to create this content. It’s a nice thing to do. If you’re in the market for nice things to do that are a little bit less expensive, you may like, you may comment, you may subscribe. You can do all of the above, none of the above. But of course, then you’d be mean. If you just need to be a mean, cruel person. Well, maybe this isn’t the channel for you. We’re all nice people here. If you are in the market for SQL Server consulting help, well, by golly and gosh, I do quite a bit of it. All these things over here to my right. I’m in the market to do them. And if you need something aside from maybe one of these things, well, you can always talk to me and tell me what it is.

And as always, my rates are reasonable. Training. I got some. You need some. It’s about $150 for the rest of your God-given life. Again, you can use that link and use Spring Cleaning the discount code or visit the video description for a link that puts all of these things together for you.

So that you can just go right on ahead and not have to deal with copying and pasting anything. I think by the time these videos start airing, it will be 2025. It will be the beginning of the year. So I guess news about upcoming events will be forthcoming.

Won’t that be lovely for you and me? Because that means I’m getting out of the house. But with that out of the way, let’s do a little disco dance and talk about these inequality and equality predicates. Now, a little preamble on this stuff.

Pretty much since I started reading about indexes, when I started caring about SQL Server performance, the general wisdom has always seemed to have been to you should design your indexes. The key of your indexes to support the most restrictive search predicates first.

That could either be, you know, like a unique column, a unique column of values, or just one that is highly selective in some way. And that’s not terrible starting advice.

If you’re a beginner out there in the world and you see that advice, that is not advice that you need to be terribly suspicious of. You can follow that advice up to a point and probably have pretty well good enough performance for a lot of your queries.

You know, I realize that there’s going to be a lot of questions about, you know, whether you should index for your where clause first or your join clause first or whatever. And the crazy thing is that there are going to be different scenarios where indexing for, like prioritizing different parts of your query for indexing can make a big difference.

But if I were to give you some just basic starter level advice to go along with indexing the most selective columns first, I would say that you should index for your where clause first and then prioritize other parts of the query. That’s not always going to work and that’s not always going to be true because sometimes you’re not going to have a where clause, you’re only going to have a join clause.

And sometimes other parts of the query might come into play as to how you want to design your indexes. Things like, you know, windowing functions, group by, order by, things, you know, columns you’re selecting, very specific restrictive filters on things.

All sorts of things come into play when you’re designing indexes, but we’re going to talk through a bunch of those during the course of the series, so I’m not going to talk too much about those here. One problem that I see pretty consistently is that a lot of databases end up designed with some very non-selective columns that are required for just about every query.

Think about something like soft deletes where most rows are not deleted. Most rows are in play. You might have a status column that has a variety of statuses in it that is still not very selective, but, you know, has outcomes in it that you need to, that most of your queries are going to be filtering specifically for.

Now, leaving filtered indexes specifically out for the moment, a lot of the columns that I see, rather, a lot of the indexes that I see, have those types of columns in them.

I don’t want to say included in them because I don’t want you to think that we’re just talking about included columns because those columns end up in various places in an index depending on how the index is laid out overall.

If it is a filtered index, then they usually end up in the include columns because they’re in the where clause of the index. And having them in the includes gives you a little bit more, gives you a little bit more gusto than leaving them out entirely.

Of course, putting them in the key is fine too, but usually includes is good enough. But, like, for all of these required type columns, whether it’s, you know, you know, like if you have a multi-tenant type database and you have, like, your organization ID or you have, like, the soft delete thing or the status thing that you need to look up, there are all sorts of columns along those lines which are not very selective, which you do need in the index to make your queries fast because you need your data organized by these things.

Some of the index design patterns that stem from that are rather unhealthy. Now, in this post, what I want to look at is both an index change and a query change that can help you out when you have, like, let’s say you have an index that maybe wasn’t designed great, but your query can be rewritten to take better advantage of that index or how to rewrite the index so that whatever way your query is written, you can do pretty well with it.

So I’ve already created two indexes and a constraint on the post table. Sorry. I’ve created two indexes, one on the badges table on name and then user ID, and this index is plenty good enough, and then one on the post table on post type ID and then owner user ID.

Post type ID is taking the place of sort of like a status column because, you know, there are about like eight different, sorry, there’s like eight different post types in the table, so they’re going to, and like some of them are much more selective than others, and the problem is that, like, for some types of posts, an index that leads with post type ID would be very selective, and for other types of posts, an index that leads with post type ID is going to be not very selective at all.

And then I also have a constraint on the post table telling SQL Server that every post type ID that exists in that column is greater than zero and less than nine. All right.

These are hard limits that we have set on the post table. So, in this case, the post type ID column, like I said, it’s going to play the role of our non-selective column that every query is required to have filtered in there.

Granted, that’s not going to be the case with a lot of Stack Overflow queries, but, you know, just go along with me here. Let’s, like, just pretend that all the queries need to find a particular type of thing, but what else you might be searching for depends on what else the module that your users are executing is going to run.

Now, I have this query, and this query is not going to be very fast. I’m just going to walk you through what it does real quick. Let’s start down here a little bit.

So, we are selecting from the badges table, and the reason why we have the index that we have on the badges table is to grab, well, it partially helps with this, but really this part of the query is mostly assisted by the ID column on the users table being the clustered primary key. So, seeking to various user IDs is totally fine and easy.

The reason why we have the index that we have on the badges table is because we start filtering the badges table with an initial, I mean, I know it says in popular question. I guess before there were some more things in there, and I narrowed it down to this to make the demo, like, work in a reasonable amount of time, but this is essentially an equality predicate on popular question.

And then for the post table, this is what we’re doing. And if you remember the index that I created, it was on post type ID and then owner user ID, and then it had score in the includes, which is obviously not going to be a completely ideal index for this situation, but the thing to pay attention to really is what’s going on in the where clause here.

We can just leave this out of the equation for the time being. Now, if we look at the plan that we get for this query, you’ll notice it doesn’t return any results, and I purposely have this not return any results because I want you to see the raw execution time of the query without returning any rows to SQL Server Management Studio.

That’s why I filtered to where that row number column equals zero down here. I want nothing to come back from this. Now, looking, like, more closely at the query plan, like I said before, getting to the user’s table is trivial.

It takes about eight milliseconds of time, but you’ll see that we spend about 32 and a half seconds executing the query. The majority of the time, if you’ll allow me to drag this over, is in this section of the query plan. And in this section of the query plan, we start with a scan on the post table.

And the reason why we start with a scan on the post table is because we don’t have SQL with the inequality predicate, the way we’ve written this, we can’t seek into that index that leads with post type ID. SQL Server ends up building an eager index spool based on this.

Or rather, we could seek into it. SQL Server just doesn’t. SQL Server just says, like, I don’t know. And this is even with that constraint on the table, showing where SQL Server knows that any value less than three is going to be one or two. SQL Server could have converted that to equals one or equals two, but it doesn’t.

So we end up building an index at runtime. And this takes about 21 seconds because there’s about four seconds spent scanning the index. So the majority of the execution time really is in here.

Okay. Now, like, it would be nice if SQL Server did do this, but, like, did do things a little bit more naturally. And, like I said, it could choose to do this, but it doesn’t.

If we look at the estimated plan for this with the 4C Kent on there, this looks a lot friendlier, right? This looks like a lot better of a situation, but it’s not. If I start this running, we are going to be waiting here for a very, very long time.

I have, in the past, attempted to let this thing run to completion, but it runs for about an hour before I give up on it. So this thing will just run and run and run and run and run. This actually doesn’t do any better with the 4C Kent.

Sometimes it is useful to apply various hints to your queries to see why SQL Server doesn’t choose plans. You know, I find things very interesting to do are 4C Kent and option loop join at the end of your query because that will really show you indexing issues with whatever query you’re running. Now, this thing has been running for about 45 seconds, and we haven’t gotten anything back from it.

The main problem with this query is if you look at the index seek, we are doing this. And actually, let me expand this out as well just so it’s a little bit more clear in here. Range columns.

Range columns. So for every time we go into the POST table, we scan for everything that’s less than 3. And the trouble with that in our case is that everything less than 3 is the majority of the table. So every time we go and seek into that index, we end up with a big, big problem.

Having to read through about 17 million rows and then doing a residual predicate on the owner user ID. If you look at the, look at this, we actually have a, do an index seek. Oh, wait, that’s the wrong one.

That’s the user’s table. I’m sorry. Highlighted the wrong thing. I was like, wait a minute. That doesn’t make sense. So we have a seek predicate on a POST type ID less than 3 and then a residual predicate on owner user ID. This is a bad time because every single time we go into this and seek, we seek through 17 million rows and apply a residual predicate on owner user ID.

That is not a very good scenario. If we were searching on a very limited set of POST type IDs, that would be maybe okay. But the way that this thing is going, it’s very much not okay.

So the 4C does show us that this is a bad plan idea. So when we, like I was saying, if you look at, let me start this running. And then we’re going to talk about the distribution of records for POST type ID in the POST table.

I actually have to put that in a new window probably because I’m building an index there. So like I was saying, we have to go through about 11 million records to find anything with a POST type ID of less than 3. 3 is down here with 167 records.

But the POST type IDs that we care about that are less than 3, that’s 1 and 2. SQL Server doesn’t. SQL Server would have to scan through 17 million of those.

Now, what I was saying before, which I think I mumbled up a little bit, was that SQL Server could convert this logically because there’s a constraint on the table that says, like, you know, all of the values in here are greater than 0 and less than 9. SQL Server could logically translate less than 3 with everything that it knows about that data to equals 1 or equals 2.

But it doesn’t do that on its own. Right? It just doesn’t.

It can’t unfurl that logic on us. So I’m creating an alternate index here on owner user ID and then post type ID. And what I want to show you first is that if we get the estimated plan for this, this thing should start using the new index that we created, which is not badges underscore x.

And if we do that, this will actually come back pretty quickly. At least it did at one point. Who knows how long it’ll take now.

But there we go. So that was about 8 seconds. And we can see that this is much improved. All right? This is no longer building an index. This is no longer, what do you call it, building an eager index boolean is no longer taking 30-something seconds to execute this whole part of the loop.

And, you know, having the reverse index where we use the other one, that didn’t cause us, like, this actually does better. Now, if we wanted to write this query in a way that would have it turn out well for our indexes with our original index, what we could do is actually let me tweak this a little bit to say index equals not badges on this.

And if we look at the query plan for this, it is pretty okay. There is still no eager index boole being built in the estimated plan. And because we have separated out our search logic a bit, and we’ve said select this stuff where post type ID equals 1, and select this stuff where post type ID equals 2, we’ve done the optimizer’s job of unrolling less than 3 with the, you know, because, again, with the constraint being greater than 0, less than 9, less than 3 can only mean 1 or 2, or 1 and 2, depending on how you look at it.

But if we run the query and we tell SQL Server to use our original index, not badges, we still do fine with this because SQL Server does two pretty quick seeks, and this ends up taking roundabout the same amount of time as the query with the unchanged less than 3 predicate and the new index that we created that leads with owner user ID.

So with that index, we’re able to seek to each owner user ID and then filter out the post type IDs. Owner user ID is a far less dense column. It’s very, very unique amongst the 17 plus million rows in the post table.

So the highest count for a single user in this version of the database is around 28,000 rows, and seeking to span a range of 28,000 rows for a single user ID and applying a predicate after that is pretty trivial as far as seeks go.

And then another thing that we could do is we could convert the where clause to say where post type ID is in 1, 2, and we could run this, and this should also take roundabout eight seconds. The query plan is slightly different for this one.

6, 7, 8, there we go. Okay, it was a little under 8 seconds, 7.8. Maybe the cache was a little bit better or something on that one. But rather than having two distinct seeks like we had with the query 1, union all, query 2 thing, we just have one seek into this.

And if we look at the seek predicate, we will have actually two seeks in here. One of them will be for post type ID 1, and the other will be for post type ID 2. It’s a little hard to get this with ZoomIt.

Let me see if I can… Yeah. So there’s the first seek for post type ID 1. And then if we expand this one out, we will see the second seek for post type ID 2.

So you can, depending on how your indexes are arranged, you can sometimes change the way that where clauses are written to take better advantage of indexes as they exist now. If you are allowed to change indexes, sometimes that like, but not change queries, like if you’re working with a third party vendor app, and they’re like, no, you can’t change the queries, but you can do stuff with indexes if you’re smart.

One thing that you could do is change the order of some key columns, depending on how the queries are written, in order to have those indexes get used more intelligently by those queries. So there is that.

Anyway, that’s a little foray into inequality and equality predicates in queries, and how those tend to align with the way that you’re in the index key column order, and the way that you design your indexes for those queries to be able to get seeked into.

Sometimes it does work. Sometimes you can write queries so that having a less selective column first works out okay. Other times you have to either rewrite the query or reshape the index so that the intent of your query is better serviced by the key columns in your index.

So with that out of the way, I hope you enjoyed yourselves. I hope you learned something, and I will see you in the next video, which will be about, oh, what’s number three over here? Unpredictable search queries.

So don’t read this yet, because that’s what I’m talking about next. Anyway, thank you for watching.

Going Further

How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About OUTPUT

Posted on January 21, 2025January 22, 2026 by Erik Darling

How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About OUTPUT

Thanks for watching!

Video Summary

In this video, I dive into the often-overlooked `OUTPUT` clause in SQL Server queries, explaining its basic functionality and more advanced uses. While the `OUTPUT` clause might seem mundane at first glance, it offers powerful capabilities for simplifying archival processes and maintaining data integrity during merges. I demonstrate how to use `OUTPUT` with both regular DML statements and merge operations, showcasing its ability to track changes in inserted and deleted rows, as well as access values from other tables involved in the modification query. By the end of this video, you’ll understand why it’s a valuable tool for optimizing your SQL Server queries and managing large datasets efficiently.

Full Transcript

Erik Darling here with Darling Data. Actually recording my final video for the evening, which I’m excited about because I can’t wait to take a shower. I know it seems weird, but once you get on these video kicks, you just got to keep going until you’re exhausted either yourself or your content. This is going to be the final video about how to write a video. SQL Server queries correctly. In this video, we’re going to talk about the output clause. Just like CTE, there is very little that you need to actually know about output. If anyone ever tries to say that they have information worth selling about output, they are full of it and once again, should throw something heavy at them. They’re not good people. In fact, they’re terrible people. They’re conning you out of money. This is free information that you should be able to find anywhere because there’s not a whole lot that is very in-depth or very technical about it. If you like this content and you want to support this channel, you can actually spend money on it. I guess I’m kind of reversing course there a little bit. I actually don’t want money for this video. Watch the other videos and if you decide you’re like, wow, this is a great channel, then you can give me four bucks a month by clicking the link in the video description that says, become a member of the channel.

And then you’re a member and then you’re a member and I get $3.10. If you don’t have the four bucks, if you spent it all on the ponies or a pony, you can like, you can comment, you can subscribe and that won’t buy either of us a pony, but it’ll feel good. If you need help with SQL Server, if it’s slow, if it’s annoying, if it’s upsetting you in some way, I am available for hire. You can pay me money. In exchange for that money, I will fix your SQL Server problems. And as always, my rates are reasonable. Speaking of reasonable, reasonable man, the face of a reasonable man here. You can get all of my training for 150 US dollars for the rest of your life. That’s the URL, that’s the coupon code.

These are also helpfully glued together for you in the video description so that you don’t have to do much thinking because you’re probably tired of thinking. I will be out and about in 2025. I don’t know where yet, but when I get there, I’ll tell you. With that out of the way, let’s talk about the laughably mundane output.

Now, one of the most interesting things you can do with the output clause is use it to simplify archival processes. That’s going to be the first demo that I show you. And that’s really, if you stop watching there, I wouldn’t blame you. The rest of it, yeah.

Output isn’t very interesting, but maybe it will be to someone. So, the first thing is you can access the inserted and deleted tables to see modified values. Wow. So, you write an insert or an update or a delete. You can throw the output clause on there.

And you can see what changed. Sort of a helpful thing. If it’s a lot of rows, be careful.

But, you know, when there’s a lot of rows, you should be careful anyway. You don’t need to select star from that 24 billion row table. You’ll probably have a bad time.

Output gets a lot more interesting when you use it with merge. But, of course, then you have to use merge. And if you never write a merge statement, your life will be complete. I’m telling you. You’re not going to miss out on much.

And the third thing about output is you need to be careful what you output data to. If you output to a client, whether it’s SQL Server Management Studio or whatever client you write that accesses SQL Server, you will force the query.

Or rather, the more clear way of putting it is you will disallow a parallel execution plan. The same thing will happen if you use output into a table variable. For very small inserts, updates, and deletes, you probably won’t notice this.

But if you’re using output for something like I’m showing you, you may notice it. Because for archival processes, you are usually getting rid of modifying a lot more rows. Outputting to a temp table or a normal user table does not have that problem.

Only outputting to a client and to table variables. So this is primarily how I use output. And I use it to make archival processes all one statement.

Usually with archival processes, you’re like, Oh, I don’t know, I’m going to update these rows to mark them as deleted. Or I’m going to delete these rows.

And then I’m going to insert these rows into the archive table and then go back and delete them. With output, you can do that all in one go. Because you can nest a delete statement and put that into a derived table. Note I’m not using a CTE here.

Just despite you, CTE lover, 11357. So you can put, you can nest the data modification like delete. You can give it the output clause.

Look at these wonderful rectangles I’m drawing for you. There’s some symmetry in there, baby. And then you can select from that derived table. And you can insert, you can use that to drive an insert into an archive table.

So if let’s say we do this, we’re already in the right database. And I’m going to put this into a transaction so that I can roll it back immediately. And I’m also going to turn on query plans because they’re going to be of some interest to us.

And we do this. Notice in our execution plan, we have a clustered index delete and then a clustered index insert. This is going to the votes table and this is going to the votes archive table.

And if we had, of course, if we had supporting indexes, this would be a lot faster. But you know, whatever. Not really the biggest deal here. And then I have a couple of validation queries where we make sure that the everything that we cared about was deleted from votes and everything that we cared about was deleted into the votes archive table.

And that is the case when we finish the query. These are all gone from the votes table and they only exist in the votes archive table. So a very, very handy way of simplifying archival processes when you’re finally ready to purge some data out of those massively oversized tables of yours.

Now, this is originally from, partially anyway, from a blog post that Aaron Bertrand wrote that I’m stealing the table definition from because it was easy. But thanks, Aaron. You’re a sport. You’re a curling pro. And so what I’m going to do is just for the sake of making it easy to do stuff, I’m going to use a transaction here.

I’m going to show you what’s in the table called my table and then I’m going to do a merge into my table. And the primary thing that I want to show you is that when you use output with merge, you have access to two things that you don’t have access to when you use output with a regular insert or update or delete. One is this special dollar sign column called action. I’m actually, this is probably a fancy word for that that I don’t know.

And the second thing is that you can access data from other tables involved in the modification query to see what values are in those. This can be very handy if you want to do like a delete and then an insert and you want to maintain certain values from the table you’re deleting from in the other table. It’s pretty handy for some stuff like that.

So what I’m going to do is run this whole thing. And really the results of this are what we care about. We don’t care so much about performance or anything. But when we start off, when we started off, our table, my table had the IDs one and four in it.

And when we ended, we had one, two and three in it. And then this was the output, the result of the output clause. And so we have an update where we deleted, where we updated ID one.

We have an insert for ID two and insert for ID three because we only had one and four before. We didn’t have a match for four. So we deleted row four.

Now what’s neat about this is this, that word column. Why did you change colors on me? Let’s correct that. Let’s make sure that’s the right color.

We only use pink in this house. This source dot word is from this. This was not anything that was in the table that we merged into. Right?

These values were not in there. They were only in here, but we could reference those with merge. So there are neat things you can do with output and merge that you can’t do with regular DML statements. So that’s the second thing that you should know about output clauses.

The third, and this is not, I don’t actually have to run any of these. I just have to show you the estimated plans for them. But if I make the target of my output query a temp table like this, we will be able to use at least a partially parallel execution plan here.

Right? There’s nothing stopping us from being able to read from a table in parallel. Of course, modifications like inserts and updates and deletes are always going to be part of a single threaded portion of the query plan.

Right? Those are not allowed to go parallel. Inserts are sometimes allowed to go parallel depending on the right conditions. But in this case, they are not.

And then over here, what I’m going to show you is the same thing, except now going to a table variable. And we’re going to output into that table variable. And this is going to show you the estimated plan again.

And this time we do not have a parallel read portion over here. And if we go to the properties, we’re going to see there, non-parallel plan reason, table variable transactions do not support parallel nested transaction. I don’t know why transaction is, transactions is plural here and singular here.

You will have to ask the summer intern at Microsoft who does all of the SQL Server development these days. The same thing will happen if you use delete and you output to no target except the client. That’s, that’s this, that’s in our case, that’s SSMS.

If we get the estimated plan for this, there will also be no parallel read portion here. And if we go to the properties, we will see a different non-parallel plan reason. This time, DML query returns output to client.

So that one is very clear about what’s, about why it cannot go parallel. And with that, that is literally everything you need to know about the output clause. Can simplify archival processes.

It’s cooler with merge because you can access a column called action, a pseudo column called action, which is probably what it’s actually called, that will tell you if it was an insert and update or delete. And you can reference columns from the source of the merge, not just the target of the merge.

That’s something you can’t do with regular insert or update or delete queries. The third is that you should be very careful what target you use with output. Because depending on how much data you are modifying, you might really want some portion, the read portion of your query plan to happen with a parallel plan.

It could slow down quite a bit if you lose that. So anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I am now going to go bathe and enjoy the company of my family until I go to sleep.

So with that, I’m also wrapping up the how to write query correctly series. It’s a playlist. You can watch them all. You should watch them all, especially if you’re having trouble writing queries correctly. You just might learn a thing or two or like 16, depending on where you’re at in life.

All right. Cool. We did it. Thanks for watching.

Going Further

How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About CTEs

Posted on January 20, 2025January 22, 2026 by Erik Darling

How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About CTEs

Thanks for watching!

Video Summary

In this video, I dive into the world of Common Table Expressions (CTE) in SQL Server, providing a comprehensive yet critical analysis of their use and limitations. Erik Darling from Darling Data shares his extensive experience and expertise to show that while CTEs are useful for certain scenarios, they often come with significant drawbacks, such as redundant queries and performance issues due to single-threaded recursion. He emphasizes the importance of good formatting and comments over the supposed readability benefits of CTEs, encouraging viewers to focus on more effective query writing techniques like APPLY operators for simpler tasks.

Full Transcript

Erik Darling here with Darling Data, and in this video we’re going to continue on with our How to Write Queries Correctly series. And in this one I’m going to tell you, I’m going to teach you everything you need to know, literally everything you need to know about CTE for free. If anyone ever tries to convince you that they have something worth selling you, for you to learn about CTE, laugh in their face, throw something heavy at them. This is it. This is all you need to know about CTE.

Alright? So, before we do that, if you would like to spend $4 a month on this channel, you can. There’s a link in the video description where you can click that says, become a member, and then you become a member. And I get $3.10 a month after taxes. If you are just, no way, $4.00, you can like, you can comment, you can subscribe.

All very valid ways to show your undying love for me. Consulting! I do it. A lot of it. And according to my wife I need to do more of it, because it keeps me in my office. So, if you want to make my wife happy, you can hire me to do any one of these things, and more, with your SQL Server or servers.

And as always, my rates are reasonable. If you would like to get some very high quality, very low cost SQL Server training, none of it about CTE, at least, you know, not in a very elementary way, because I would never charge you for that. Save the good stuff for training.

You can get all of mine for $150 US dollars for the rest of your life. It’s not quite free, but if you live a long time, it’s close to it. There is also a link for that stuff in the video description.

You know, 2025, I’ll be on the road again for now. Give me a break. So, with that out of the way, let’s talk about the laughability of CTE. So, one, they’re just inline views, and you need them for some types of queries.

Unlike derived tables, you can reference them multiple times in a query. But this is where they start to cause problems. You can write modifications against them, which is mostly useful for doing ordered deletes.

For some reason, you can’t put a top with an order by and a delete. It’s very strange. Sorry, a delete with an order by and a top, a delete, and an order by. It’s weird. You can write recursive ones, but it’s really annoying.

And if you find yourself having to write recursive CTE a lot, you should probably consider a different data model. You can often do things a lot more cleanly with apply than you can with CTE. CTE require you to stack things one above the other and keep doing things.

With apply, you can just have a nice, neat query where you just have stuff very nice and streamlined and inlined without having multiple steps that someone has to read through and remember the column names and the CTE names. It’s all very aggravating.

So, like I said, the big gap that CTE were meant to fill in in SQL Server is with derived tables, you might notice some red squiggles here. That’s because we can’t talk to the derived expression x more than once.

Right? We just can’t do that. It just doesn’t work for us. Even if I name that correctly.

There we go. Even if I name that correctly, we still get red squiggles. And if I try to run this, it’ll say invalid object name. Next one, who are you? I don’t know.

But we can do that with CTE. The problem, as always, with CTE is every time you re-reference them, SQL Server runs the query in them again. If you look at this, we touch the user’s table not once, but twice.

And if we quote this in and we add a third join to our correctly named CTE, we will now see that we touch the user’s table three times. A one, a two, a three.

So, like I said, the thing that, like, Microsoft was like, cool, we have CTE now. Now you can reference them multiple times. No way to materialize them.

Other database engines give you that. Microsoft, as of this recording, does not. Maybe that’s a secret thing in SQL Server 2025. I don’t know. Couldn’t tell you.

I don’t know how much work goes into CTE. Microsoft is clearly busy with very important things involving fabric and AI that will go absolutely nowhere. So, we have that to look forward to.

Thanks. So, this is what I was talking about with deletes. You cannot do this, right? You can’t have that in there. But we can have that in here.

If we add an order by, u.id, this is completely valid where this is not, right? This will say, bleh, no. No ordering.

No ordering for u. This thing, though, this will give us a nice ordered delete, and we don’t have to worry about anything in there. There’s no sort in this because I’m ordering by the id column. I think when I was messing with this before, I think I deleted the order by by accident.

But if we were to order by a column that is not supported by an index, you would see it gets sorted there. When we use id, that is the clustered primary key of the users table. So, it’s already ordered for us.

So, we can delete things nice and orderly this way. Sometimes, you need a CTE. You could also do this with a derived table. My style guide says it doesn’t matter because for a query this simple, it’s just not that big of a deal, right?

You generate a row number, and you want to filter out to where that row number equals something. There’s no way in SQL Server to do that all in one step, right? You can’t do that all in one go for some reason.

And SQL Server, Microsoft makes you write a two-step query and generate everything. And then at the very end of that query, filter out everything in there, right? So, that’s a lot of fun, right?

Sometimes, you need to use them because Microsoft won’t implement useful things that make your life easier. You get big data clusters, synapse, managed instance. Why would anyone want these things?

They’re garbage. They’re stupid things. And, of course, you can write recursive CTE using CTE. Of course, you couldn’t write a recursive CTE without a CTE because there’s no just plain recursive.

You need that. But, you know, I did go a little bit more into performance details on this in the other video in this series about CTE, which you can find if you look at the playlist.

It’s all in there. But you start off with this part. And this is where you write the anchor portion of your CTE. And this is where you write the recursive portion of your CTE. And if you run this query, well, actually, let’s look at the query plan.

I forget. I don’t think I have indexes that make this good. So this would run for a long time. One of the big problems of CTE is that, recursive CTE, rather, is that the recursive part of the query is forced to run single-threaded.

And it can be very, very slow if you don’t have good indexes in place to support it. And another thing is that usually you’re going to see a lot of nested loops inside of your recursive CTE. And if you don’t have good indexes, you’re going to see eager index spools inside of here.

And that is a surefire sign that they are going to be terribly, terribly slow. So not a lot to say about the recursive CTE. Again, if this is something you need to do frequently, you probably need a better data model.

Now, for anyone who says they make queries more readable, again, for like the 10 billionth time, they don’t. They don’t make them more readable. They don’t make them more understandable.

What makes them, what makes queries readable is good formatting. And what makes queries understandable is good commenting. Writing a CTE does absolutely nothing to make a query either one of those things. It does not help.

There is still no explanation for what you did. There is still just wild formatting where you, for some reason, it’s like select and then like columns and then from is on the same line. And then join is on the same line, but on is down.

And like just format your queries and people will find them readable and make comments on your queries and people will find them understandable. It’s a fantastic thing. It’s a wonderful thing to do.

Otherwise, you’re just writing gobbledygook. So one thing that some people will do with CTE is stuff that they could do with apply very easily. Like, let’s say this is something that Itzik talks about sometimes in his training is that like just people who want to do simple things.

Like just get the year from a date or date time or whatever column. Date, date time to take time 1 million 17th. Yeah, fire.

Big city living folks, big city living. And then let’s say you wanted to get the following year, you could write another CTE. And then you could get all this stuff.

And then, I mean, granted, this is just sort of a goofy example, but it gets to the mentality of people who use CTE for just everything. Because they have it in their goofy heads that they think it’s going to make the query readable and understandable for everyone. Where you could just do something very simple like this and just cross apply some values to that year.

And then you have that expression available to you in the select list and your query becomes a lot more compact and you don’t need 17 different steps to express yourself or figure out or write your logic out. You can do things very easily and more compactly when you use better forms of query writing. So this is literally everything you need to know about CTE.

There is nothing magical or mystical about them. There is nothing fantastic about them. Perhaps someday materialization will come to SQL Server, but we don’t have that now. Fabric did just get Azure SQL database.

Wow. Hold your applause. No, I mean it. Hold your applause.

Anyway, thank you for watching. I hope you enjoyed yourselves. Honestly, I hope you learned nothing. Because this is all very generic information about CTE, but it is all the information you need about CTE. Alright.

I will see you in the next video about the output clause, which will be literally everything you need to know about the output clause. Alright. Have a great day. It’s pretty salty.

It’s clear!

Going Further

How To Write SQL Server Queries Correctly: Case Expressions

Posted on January 17, 2025January 22, 2026 by Erik Darling

How To Write SQL Server Queries Correctly: Case Expressions

Thanks for watching!

Video Summary

In this video, I dive into the intricacies of case expressions in SQL Server queries, explaining why they can be problematic for performance tuning and optimization. I share insights from my experience as a database professional and highlight how seemingly innocent use of case expressions can lead to complex query plans and significant execution times. By walking through examples and referencing Aaron Bertrand’s blog post on the “Dirty Secrets of the Case Expression,” I aim to help you understand why it’s crucial to be mindful of where and when you use these constructs in your queries, especially in joins and where clauses.

Full Transcript

Erik Darling here with the old Darling Data. Having a lot of fun. A lot of fun. I don’t know how anyone doesn’t have fun with databases. They’re just the most amusing things one could possibly conceive of. Barrel of laughs. Today’s video, we’re going to get ever closer to the close of our series about how to write queries correctly. And in this one, we’re going to cover case expressions. Not case statements. Case is not a statement. Case is an expression. You’re just going to have to deal with that fact. Or else, I don’t know, I’ll drop a bag of hammers on you. If you’re that gung-ho about calling case a statement, it might have no effect on your skull. It’s probably rock solid anyway. If you like this content, and you’re like, wow, where would I be without this content? Well, for four bucks a month, you can support this content. And I can, you know, carry the two. I’ll make about about $3.10 after taxes on that.

Give or take. So, that’d be nice. If you don’t have $4 to turn into the alchemy of taxation to hand over to me, you can like, you can comment, you can subscribe. And there’s no tax on that. That is free love. If you are in need of SQL Server Consulting. If the things that I say in these videos resonates with you, and you think that perhaps my wit and wisdom would be of some benefit to the SQL Servers that you care and feed, well, you can hire me to do all sorts of things. And as always, my rates are reasonable. I also have training. And you could have that training for about $150 for the rest of your life. Not a subscription. You can go to that link and plug in that coupon code. And through, again, magical transformations. Actually, reverse taxation. You actually get money off of things. I guess that’s a tax on me.

Boy, this is a rough mental awakening for me. Me giving you discounts is a tax on me. That’s messed up. But you can get all of my training for, yeah, that money, that URL, that code. It’s amazing. No upcoming events because end of year, family, things like that. I’ll be around in 2025. With all that out of the way, let’s talk about these here case expressions. Now, one thing that you should know off the bat is that green screens later in the day are hard. That’s why I have strange things happening behind this armpit. Do not look behind the armpit.

Is that some things internally, even though they appear to be built-in functions that SQL Server has, the things that turn pink when you type them. Some of those things are just wrappers for case expressions. Translate is not one of them, though. When I was first looking into this, I half expected translate to be a case expression, but it is not. If we look at the query plan and we look at the compute scalar, the defined thing in here will not say anything about a case expression.

But if we dig into some of these other things, like if, and we look at the execution plan here, and we look at the compute scalar, we will see that internally, if, is just doing a case expression. You can see where that starts right here. So if is just a case expression. This is one of the reasons why when I start having to warn people about using functions and where clauses and joins, I also need to warn them about using case and where clauses and joins, because guess what?

They both have the same performance problems. It’s amazing. It doesn’t matter if you explicitly write case. Some of the functions that you might write will use case anyway. Behind the scenes, they’ll just magically crop up there for you. Another one of them is coalesce. If we do this, and we look at the execution plan, and we look at the predicate that gets applied to the table, that is a case expression too.

Case when is not null, then, blah, blah, blah, end. See? Coalesce. Inside, just a case expression. Choose, similarly, is a case expression. If we look at the query plan here, and we will look at the compute scalar, we will once again see the defined values show us a case expression.

So, if you can, because you’re the one writing queries, you should probably not put things like if, and coalesce, and choose, and your where clause, because they are case expressions, and they will have performance problems. Other built-in functions, like is null, and replace, and stuff like that, they are not case expressions inside, but they have very similar performance problems.

All of a sudden, using indexes becomes difficult for a SQL Server. Aaron Bertrand, who, I don’t know, if you ask me, I don’t talk to him enough. If you ask him, I talk to him too much.

So, there’s two sides to every coin. Either, you know, I don’t know what Canadian coins are called. But, these, he has a blog post anyway, called, The Dirty Secrets of the Case Expression.

Thank you, Aaron, for not calling it a case statement. And this is an example that he has in his blog post. There are more if you use the search engine of your choosing. I don’t care what it is.

And you search for Aaron Bertrand, Dirty Secrets of the Case Expression. You will find his post. This is the most amusing one that I think exists on there. Is this, when you declare a variable called i, and you say, case when i equals one, then one, else the minimum of one divided by zero.

And you get a divide by zero error, even though this part never should have been explored, right? Because this equals one. If you rewrite that a little bit, and you say, else select min, that will work.

Case figures that one out. Case doesn’t go that far along in that. But it’s an interesting thing, where case does not always short circuit where and when you think it might. All sorts of things can happen in there.

Now, let’s talk about performance, though. Because that’s what we care the most about. We are, for the most part, performance tuners. We’re pretty good at it at this point, I think.

At least I hope, anyway. That’s what we get paid for. I see a lot of queries with a lot of problems. And a lot of them look something like this.

And notice that there’s a join with a case expression here. All right, so there’s a, we’re going to join depending on if this, right? Remember, so like if one equals, ah, gosh darn it.

If one equals case when this, then one. And when this condition, then one. But if we don’t meet those conditions, then zero. And then another thing where one equals case when these things are, these conditions are true than one.

Else zero end. And I’m not actually going to run this query now. Because we’re already around the eight minute mark.

And this query takes a full minute and one second to execute. The query plan is not terribly interesting. Just note that everything is fine until we try to apply that expression right here. All right, a whole bunch of bad things get involved at the join.

And this portion of the plan eats up the majority of the time. It would be nice if I could draw like different shapes with ZoomIt. But I guess making two rectangles isn’t going to, you know, isn’t that much of a challenge for me.

It’s not that difficult. It would just be nice if I could like draw better in general. If I could draw better, I would probably have a different career, honestly.

But this query plan is, of course, a disaster. And you should not emulate this kind of query. This query is no good.

For the sake of those who may, I don’t know, be watching and not listening, maybe you take visual cues better than you take verbal cues. We are going to say no.

No. No. If I was better at art, I would make this a skull. But I’m not very good at art.

So, er, er, er. And we’ll get some things here that hopefully don’t look too much like other things. Eh.

We’re getting good at this. We’re getting better at this. Every day. Every day. We want to avoid this. This is bad. Okay. This is not what you want to do with your queries. This is frowny face.

Bad for performance. But, but. Okay. If all this is not enough clue, I don’t know what to do for you. We are, we are, I’m out of ideas.

So, one way that you could rewrite this query would be something like this. You don’t have to do this. But this, to me, this just looks kind of nice. We’re going to put the query, the predicates that we care about for the users table in here.

Right? Because that’s good. We care about that. And we’re going to cross apply to a union all. And if you are, you’ve been following this series, separating out queries like this.

Very common. Very useful query pattern. Even if it doesn’t solve the precise performance problem that you’re after, it can certainly illuminate you as to why you are having a performance problem with one or both of these queries or sets of, the set of predicates in here.

Perhaps it’s an indexing issue or something else that’s just driving things amok. But when we run this query, this thing will let me highlight things. This returns instantly.

Right? We no longer have to wait a very long time. We no longer have a giant parallel plan with a spool in it and join predicates figured out at the, or rather, predicates figured out at the join clause.

We have an apply nested loops plan. Again, we can tell that the apply nested loops kicked in because we have this outer references thing in the nested loops join, which means that our join predicates were sent out as seek predicates to here and here.

So, I mean, obviously that’s the, somewhat obviously, because it’s the same index name. We hit the same index twice and seek into it and all is well. And we get the, back the result that we care about immediately. So, when you’re writing queries, putting things like case expressions and joins and where clause may seem like a convenient vehicle for you to express the logic that you care about, but it is not, often not a very convenient vehicle for SQL Server to optimize.

Again, this is really, when you’re writing queries where performance matters, which you should write every query like performance matters because someday it might, even if it doesn’t today, someday performance might be of critical essence, critical importance to this query or to whoever’s running this query.

And you don’t want it to be slow. And you don’t want like one of those comments up at the top where it has your full name and a date and like all that stuff in there. And where you’re like, you know, you make notes about things that you swear work, but don’t actually work.

So, these may seem very convenient for you, but these are very bad for SQL Server. Much better ways exist to write queries that SQL Server can interpret into a much better, much faster execution plan for you.

So, one thing that I talked about at the, well, not at the beginning of this post, obviously, because this is, this is the culmination of several things is handling, like using or. So, the case, this video on case expressions really ties in with the, the two videos on where, where, where clauses with or in them and join clauses with or in them.

Uh, they have very similar performance implications and sometimes even query plan implications. So, one thing that I haven’t covered in these that I’m going to cover in a longer format about dynamic SQL is how to handle cases where you want to, um, have optional parameters passed in, uh, to a store procedure.

Sometimes they are null. Sometimes they are used and there are different ways to sort of deal with that. Um, but please, for, at least for the time being, keep case expressions out of your joins and your where clauses and, uh, be, be careful, be very diligent in how you write case expressions, even in your select list, because they may not shortcut when you expect them to.

SQL Server may do some additional exploring in those case expressions that you are not aware of until something weird happens. So, be careful there.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video on, uh, CTE. I know that I already did one in this series, but, um, the, the next two videos on common table expressions and output are going to be literally everything you need to know about CTE and output.

Do not buy anything from anyone purporting to teach you about case expression, about common table expressions and output. There is nothing worth paying for to learn about them. Okay?

Spend no money on these things. They’re not, they’re not worth it. Anyway, I will see you in those videos. Au revoir.

Going Further

How To Write SQL Server Queries Correctly: Joins With OR Clauses Part 2

Posted on January 16, 2025January 22, 2026 by Erik Darling

How To Write SQL Server Queries Correctly: Joins With OR Clauses Part 2

Thanks for watching!

Video Summary

In this video, I dive into the intricacies of joins with OR clauses in SQL Server, specifically using examples from Stack Overflow’s database design to illustrate common pitfalls and suboptimal query plans. We explore how these join conditions can lead to inefficient execution plans, such as constant scans and nested loops joins, which often result in unnecessary row counts and sorting operations. I also discuss potential solutions, including query rewrites that leverage EXISTS clauses or CROSS APPLY operators, demonstrating how these changes can significantly improve performance. By the end of this video, you’ll understand why normalization is crucial for avoiding these issues and learn techniques to optimize your queries when OR clauses are necessary.

Full Transcript

Erik Darling here with Darling Data. All excited to talk more about joins with OR clauses. It’s going to be a real grand day for all of us here. This is, of course, part two of the joins with OR clauses videos. The first one, we talked about some simple stuff. It’s kind of set up, you know, the things that SQL Server can and can’t do when there are OR clauses in the table with the simple join. Now we’re going to talk about more complicated stuff and, of course, how you can fix the more complicated stuff. And, of course, because I like talking about it, we’re going to talk about shortcomings in SQL Server’s Query Optimizer. Because it’s got those. Lord knows. Anyway, before we do that, if you would like to slap four bucks a month on my rent check, you can click the link down in the video description below and you can do that. Maybe it’ll make you feel good.

You know, or maybe not. I don’t know. Maybe, hopefully, you just forget about it for a while. That’s what we’re all hoping for. If four bucks a month is not what you have in mind for me, you can like, you can comment, you can subscribe. All sorts of great things. If you are in need of consulting for your SQL Server, all of these things, I am the best in the world at them.

Anyone who tells you otherwise is a dirty, rotten liar. And as always, my rates are reasonable. Training. I’ve got it. 24 or so hours of it. Beginner, intermediate, expert. All of those things. You can get it for about 150 US dollars. That’s the link. That’s the coupon code.

And they are helpfully glued together for you also in the video description. Lucky, lucky you. No more, no upcoming events until 2025, probably later in the year. But for now, we can just go talk about what we came here to talk about, which is joins with or clauses. So joins with or clauses are one of my biggest pet peeves because the optimizer does have space to do this better.

It just isn’t there, right? It just doesn’t do it. There’s just no help for you when a join has an or clause in it. There’s not like a special thing you have to do or there’s not a hint. There’s not like a trace flag. All there is is query rewrites. And those query rewrites are annoying because SQL Server’s optimizer could figure it out.

Just doesn’t. It’s lazy. Very lazy, lazy optimizer sometimes. So I’ve created some very generous indexes on the votes table and two on the post table. And what we’re going to look at is the query pattern here.

Now, it is a little bit confusing when the first time you see it, because it is, well, I mean, not the most straightforward thing in the world. But I probably should have started this running before I started talking too much, right? OK, seven seconds. Not too bad.

But here’s the execution plan. And it is one of it’s this is one of those query plan patterns that I try to train people to look for, because this query plan pattern is always a sign that something bad is happening in your query. And that something bad is going to be a join with an or clause.

So we’re doing I mean, it’s a little bit of a funny query here, but it’s one that you have to do with the Stack Overflow database. Because, you know, everyone who gives credit to Stack Overflow for like when it first started being like the cool, smart, like, whoa, rogue band of geniuses don’t like they’re doing everything on the cutting edge of cool and fun. They were crap at database design.

Crappy, just awful. Should have hired someone who knew what who knew what they were doing like this much because the post table has all the posts in it. It has all the questions. It has all the answers. It has all the everything else.

And what’s what’s really crappy. Oh, my God, is like the version of the post table that you see in like the Stack Overflow data explorer and the version of the post table that you see, if you download the data dump, is not nearly reality. If you saw that, if you saw the actual table in there, it is so much wider and more denormalized and awful.

It would it would it would make you even if you know nothing about normalization, it would make your head spin. But we have to join the post table to itself because questions. Right. Those are these things. When you join them, you need to join the post table to itself to find which questions have accepted answers.

Great. Good. Good plan. Self joins. Hey, I like I like those sometimes. Sometimes you do have to you do have to join yourself. But then when we join to the votes table, what we’re saying is where we need where the the either the question ID equals the post ID in the votes table or the accepted answer ID equals the post ID in the votes table.

So we’re looking to get votes on both the question. That’s the ID and the answer, which is the accepted answer ID. Right. And we I don’t know through this. I forget why I threw this on there. I think it tidied things up a little bit. Maybe maybe maybe I didn’t put that on there in seven seconds or in like 70 seconds.

But then we’re looking for where p dot post type ID equals two. Right. And we’re looking for where p two dot accepted answer ID is greater than zero. So we’ve done everything that we we can there. Now, what makes this challenging or rather like.

What the optimizer could do is something like this. It could unroll the join from being this set of conditions to being this set of conditions. Right. Like it could do that because it is fairly equivalent. It could also do this. But it kind of doesn’t.

It could be a good answer. So what’s really interesting, but sort of difficult to describe well to the optimizer is that any given question can only have one accepted answer. So there’s no many to many relationship with that join.

So these are things that the optimizer could do, but it doesn’t. And this is the query plan pattern that I need to warn you about. And this is the query plan pattern that I want to show you is this is what happens.

So we do two seeks into the post table. Right. This finds this does our join. Right. This finds our questions or rather sends our answers with the questions of accepted answers and all that other good stuff. And then this is the join to the votes table. Now, what I want you to keep an eye on is the number of rows that come out of this three eight one eight four nine seven.

OK, remember that number because you’re going to see it again. And right now way over here in the query plan, this is this is most commonly what you will see in a query plan that has a bad join with an or clause in it, which means any join with an or clause. You’re going to see a constant scan.

With three eight one eight four nine seven rows come out of it and then another constant scan with three eight one eight four nine seven rows come out of it. That is the results of this twice. Right. So we have one set for the P dot ID column in the post table and one set for the accepted answer ID column in the post table. Right.

We have that many rows for both after we join those two tables together. That’s how many rows are produced. So each one of those columns gets produced by this by these constant scan operators. Notice these don’t touch a table. These sort of come out of nowhere, but they they really get fueled from here.

So then SQL Server says I have to put those two results together. It essentially does like a union union all sorry. So this number doubles right.

At least my math is correct. And then it sorts all that data. It puts all of those rows in order. So that it can merge duplicates. And it note the estimates in here are all pretty wonky.

Right. Of one of one of two of two of two. So we’re SQL Server. Expects two rows to come across all this stuff.

It we get way more than that. Now, I think like Microsoft could start a little bit earlier and like doing all this stuff by just like trying to get better estimates out of this part of the plan, because then maybe it wouldn’t choose a nested loops join for this part of the plan.

It’s also entirely possible that this is an optimizer limitation and that SQL Server could not possibly do any other kind of join here. But then we seek into the votes table and we we seek on this expression one oh two four. Right. Or sorry.

We have a between thing. Right. Because we merge the interval. So we go and find greater than expression one oh two three and less than expression one oh two four. That’s going to be the the start and end points of the constant scan results after we’ve merged the intervals in.

So we every time we loop in, we go and we find those that range of columns. Now, it’s bad enough that we have like this fake result set. Right. From the from the constant scans that we loop and find stuff in here.

But then like there’s an sort of this other nested loops join here that also like takes this stuff comes out of here and goes through all this. Right. So it’s like this loop, like even though it starts here, it kind of like really starts over here. So this all ends up taking about eight seconds or seven and a half seconds, which is pretty slow for a query.

And like I said, SQL Server could unroll all this stuff and do it differently. Now, it’s what we can’t. It’s hard to describe this to the optimizer without a lot of really difficult constraining that every question, if it has an accepted answer, can only have one accepted answer.

If we were to use the parent ID column instead of accepted answer, it would be a slightly different story because one question can have many answers, but only one of them can be the answer. So this is a great reason to normalize your data. This is one place where the Stack Overflow data design team failed.

Right. Off like just big thumbs down there was every question and every post are in the same post table. I do have some stuff about what happens if you split those out and I do have some other stuff about what you do if you put the body column on another table. But I don’t know. I feel like that’s worth money.

So that might not go on YouTube. Now, part of the reason why this optimization space is difficult is because you would have to you would have to unroll the query to look like one of these two things. Right. Where bounty amount is no and exists.

Select this on this. But this will give you identical results to what you got before. But without all that crazy stuff going on in it, this finishes in about 207 milliseconds.

And that’s, you know, arguably a lot better. Right. And this is this filter is a startup expression predicate. So this is not the awful kind of filter where all the rows come out and then try to pass the filter.

This filter only lets rows passed if if if if if anything meets that criteria. So that’s a nice thing in there. Right. This is all good stuff. SQL Server handles this really well.

And because the where clause in here is really only for one set one set of things, it turns out a lot better. Now, you could also rewrite that in this way, which is even more complicated, where you could use sort of a double exists with a union all. Right. So rather than write this or clause, you could write this in and you could get good results back from this one.

This query plan, it’s a little bit different. It takes about 385 milliseconds. There’s one filter at the end here, which I am not crazy about.

Where this is not null or this is not null. Usually this is the kind of stuff that I worry about in query plans. But in this case, it’s OK.

We get a couple adaptive joins over here between the post table. Well, sorry, between the votes table and the post table. Right. So that all turns out OK. And then if we needed columns from the inner part.

So like we talked about exists and not exists and joins and all that stuff. One thing that I talked to you about was that when you use exists, you can’t project columns out of it. So if this were a more complicated query, you know, like like for a lot of queries, I use count big just to make sure that I’m getting like the right number of rows back.

I just like basic starting point. Are these results correct? One one thing that, you know, like normal queries, people want to see data, right? They don’t just want to count.

They’re not just like, oh, well, yeah, just if you just show me counts, I’m happy. They want to see like the actual stuff in there. So if you need to rewrite a query like this and you need to project rows out, cross supply is a really handy way of getting the same performance. But but you can project columns out of the cross supply.

I don’t know why there’s a space in here. That’s that’s clearly clearly someone is sabotaging me. But if we write the query like this with the cross supply, it’s also pretty quick. 345 milliseconds.

And if we needed columns to come out of the this part of the query, we could get them. So if we like so like you can see, like we count the P dot ID column that comes out of the apply in here, which is union all in here. So we’re counting.

We’re able to reference the P dot ID column there, where if we try to do this up here, we couldn’t. So are these queries bigger and longer and more complicated? Yes.

But sometimes that’s what you that’s what you need to. That’s the kind of query you need to write for the optimizer to fully understand what you’re doing. It’s so, you know, one thing that I say quite frequently with clients and in training and to just random passerby on the street is anything you do that makes your job easier, make SQL servers job harder. Sometimes things that you do that make your life easier.

SQL servers like, oh, yeah, well, I’m going to make my life easy to take this query plan. When you start writing things out more verbosely, sometimes you can give SQL Server a better understanding of what you need to do, what data you need, what data you’re trying to touch. And SQL Server can come up with better plans based on that.

Shorter, shorter, simpler queries are not always faster queries. You know, there are a lot of people who seem to feel that way and they write these tiny little compact queries there. Some of them just on a single line, very long lines.

But like writing simple, compact queries is good for simple, compact logic. When your logic gets very complicated, sometimes very short queries that just, you know, try to compact everything very small. It’s not cold golf.

Sometimes the optimizer can’t do a lot with those. Sometimes you have to be you have to write out more specifically what you’re trying to do. And sometimes your queries might look like this. Sometimes your queries might have extra touches of tables in them.

But you can’t like don’t be afraid of trying this stuff because you might find queries perform much better when you are much more explicit to SQL Server about what you’re after. So don’t just stop at simple, complex query because you’re happy with simple, complex query. Get just the performance.

Look at the query plan. See what happens when you spread things out a little bit. Give SQL Server a little room to think and breathe. You don’t have to smother it all the time with these weird queries. Now, one thing that I’ve talked about a few times, maybe not in this video, but in the last video, before I decided to split this up into two videos, that’s why there might be something up in there that seems a little weird, but is case expressions and joins and where clauses.

We’re going to talk about that next. That is why this is 12 and the little tab next to it says 13 case expressions. So we’re going to talk about those next.

And we’re also going to look at some rewrites for those. Some of those look similar to these other than others of them look similar for two other things that we’ve talked about in the series. But it helps reinforce that you’re not writing SQL Server queries correctly and you need to start.

So, sorry to break it to you, but you know what? At least it’s not terminal. Just temporary.

It can be cured. Don’t worry. Without surgery, hopefully. Yeah, hammer. Alright, cool. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you over in the next video about case expressions.

And you’re going to learn a lot. So watch it and don’t skip stuff. You heard?

Alright, goodbye.

Going Further

How To Write SQL Server Queries Correctly: Joins With OR Clauses Part 1

Posted on January 15, 2025January 22, 2026 by Erik Darling

How To Write SQL Server Queries Correctly: Joins With OR Clauses Part 1

Thanks for watching!

Video Summary

In this video, I delve into the intricacies of joins with OR clauses in SQL Server, demonstrating how they can sometimes lead to unexpected behavior and performance issues. Starting with a simple inner join setup between comments and posts tables, I illustrate how SQL Server infers join conditions from the WHERE clause, making the query efficient when both columns are involved. However, introducing an OR condition that splits the join across different tables causes SQL Server to struggle, resulting in slower execution plans. To address this, I show a workaround by adjusting the OR clause to involve only one table, allowing SQL Server to seek into indexes more effectively and improve performance. This video is part one of a series on joins with OR clauses, setting up the groundwork for exploring more complex join scenarios like left joins in future parts.

Full Transcript

Erik Darling here with Darling Data. Got a barbell next to a dumbbell. Anyway, today’s video we’re going to talk about joins with or clauses. I’m actually going to do this in two pieces. The first piece just kind of sets it up a little bit and gets some of the easier stuff out of the way. And then the second part we’ll go into more detail. And I tried to do it in one and it just went on too long. Even for me, I actually apologized to my own self for my length for once. So that was a fun change of pace. If you would like to support this channel by giving a dollar sign a finger mustache, you can click the link down in the video description. For as little as $4 a month, you can support me by giving a dollar sign a finger mustache. You can click the link down in the video description for as little as $4 a month. And I’ll see you next time.

Training. You probably need it. If you’re here, you definitely need it. You’re probably not getting enough of it. It’s like vitamins. You can get all of mine for about $150 by using that coupon code at that website up there. That’s about it. There’s also a link for that down in the video description. It’s very good stuff. Paul White Tech reviewed it. So I don’t know. Maybe that’ll convince you. No upcoming events until 2025. Ba-ba-ba-ba-ba-ba. With that out of the way, let’s talk about joins with OR clauses. So SQL Server has funny ways of handling OR expressions sometimes. Sometimes it can infer certain things depending on how you set your query up. And other times it cannot.

So I think a good way to start this off is by showing you this query. The top one is just there to get a quick result back. But we have a join between comments and posts. And the join is on p.ownerUserId equals c.userId. And we have a where clause to say where ownerUserId equals 22656. If we run this query, SQL Server is able to infer that we meant that we only want to see userId 22656 or we only care about ownerUserId 22656.

Because what happens is SQL Server tells us we don’t have a join predicate. We get this stupid little warning right here. And it says, yeah, guess what? No join predicate. Shame on you. You’re an idiot. The number of times I’ve seen this in a query plan and been like, huh, do we really not have a join predicate?

And been like, oh, no, no, no, we totally have a join predicate is pretty high at this point. Microsoft summer interns. What would you do without them? But if we look a little bit more closely at the plan, we will see a predicate on this side of the join that says where ownerUserId equals 22656.

And we will see a predicate on this side of the join that says where userId equals 22656. And so all of the results that come back from the post table and from the comments table have a join condition or rather have a value in their join column that is the same. So there is actually no need to do anything with that.

Right. We don’t need to do any additional join conditioning. We have met the condition of our join via two when right when we touch the indexes. So that’s where SQL Server is kind of good at stuff.

You know, we touch the indexes on both sides. We do the filtering right there. And then everything that we need to mush together is the same value. They’re all it’s two sets of 22656.

And there’s like, what do you join there? It’s already joined for you. We have we have implied the join via the where clause. Where the optimizer is somewhat less good at figuring these things out is when, of course, when you get separate columns involved.

Right. So if we look at this query. Right.

And I have these quoted out for the time being for dramatic reasons. We’re joining comments to votes on c.postid equals v.postid. And we’re going to say where c.postid equals 838 or v.userid equals 831.

Right. And if we run this query. Do do.

It takes a few seconds. We get 42 rows back. SQL Server has chosen the most adaptive of joins. We scan here for about a second and a half. We scan here for about half a second.

And in total, we take about 2.6 seconds to figure this all out. Now, if we quote out these four sequence and we rerun this, SQL Server says, there is no way we could do that. It is impossible.

Why have you tried to do this to me? You are trying to bend the rules of space time. You have divided by zero. You are not fit for this planet.

You think you’ve seen too many Doctor Strange movies. But this is kind of a funny thing because when you look at what SQL Server does, look at the execution plan. And then the entire query is evaluated here at the join.

We have this whole or clause thing in here. Now, what we can do is, since we know something about our data that SQL Server doesn’t, is that if we run this and we say where we flip that so that we say where both of these columns come from the votes table. Right?

Right? Now we’re saying where V is this on both. We get the same results back, right? This is 41. 41. Da, da, da, da, da, da, da, da, da, da, da, da. And this is 41. But much more, a much quick, much more quick 41.

Now SQL Server is able to seek into two indexes on the votes table and a third index on the comments table. And we are able to get back our results. The only change I’m making is that I’m using the post ID column from the votes table instead of the post ID comment from the comments table.

Right? So before it was c.post ID and v.user ID. Now it’s v.post ID and v.user ID.

So when you comment on or you vote on something, the post ID is logged. Right? You’re going to, since we’re joining on post ID, right, it’s going to be the same for both.

We didn’t change the v.user ID. If the v.user ID were the problem, that would not give us correct results. Or rather, that would pose a different set of issues.

Sort of like what we talked about with the where or clause stuff in the last one. But if we just change v.post ID to 138, then from c.post ID equals 138, then we’re getting all the same stuff back because we’re joining on post ID. Right?

That’s all we care about. But for some reason, this or clause getting slightly different tables involved really screws SQL Server up. So when you’re writing queries, these are the little things that you need to tweak and experiment with to make sure that you are not missing anything obvious.

A lot of times when I talk about query complexity, people start picturing that big, you know, monolithic query that spans 17 scrolls of SS through SSMS. And if you make the font in SQL Server Management Studio like 4%, it’s still just like it doesn’t fit on one screen. That, of course, is a different brand of hell.

But complexity in queries can arise in many small interesting places. You know, in clauses, or clauses. Like anything that adds a little bit of drama to your queries can certainly make things somewhat strange in there.

So that’s the set things up. Now, this was fairly simple because this was an inner join with just a where clause that had an or for just, right, a simple inner join, right? Just one table to another.

The where clause was only on a couple columns with one or condition. What we’re going to look at next is more complicated join scenarios with left joins and stuff. So we have that to look forward to.

Anyway, I hope you enjoyed yourselves. I hope you learned something. And I will see you in part two of Joins with Orr clauses when we will get deeper into the subject. And I will show you more awful things that can happen and how you can fix them.

Because what’s the point in just showing you bad things that can happen if I don’t show you how to fix the bad things? It doesn’t seem like a very, it doesn’t seem like very good training if all I do is tell you that bad things happen. It sounds like, it sounds like other training I’ve seen.

Bad things happen. How do you fix it? It’s hard. Yeah, you know, life’s hard. Screw it.

All right. Thanks for watching.