When Subquery Query Plans Are Suboptimal In SQL Server (And How To Fix Them)

Video Summary

In this video, I return after a three-week hiatus to Paris, sharing my experiences with you as I recorded content before embarking on my trip. The video delves into the nuances of suboptimal subquery plans in SQL Server, explaining how these plans can lead to excessive row returns and poor query performance. I walk through an example where a query initially returns 4.8 million rows, far more than necessary for practical use, highlighting the issues with row goals and estimated versus actual row counts. By tweaking the query and experimenting with different hints and index usage, we explore how to optimize these plans for better performance, demonstrating that sometimes subqueries can be effectively tuned rather than outright avoided.

Full Transcript

Erik Darling here with Darling Data, and this is my first video back after a three-week hiatus in Paris. And I know that it was seamless to you because I recorded three weeks’ worth of content before going away. Intel is back up to its old tricks. Not being able to find drivers. Some things never change, I guess. But if I look any different or I sound any different, if I have perhaps developed a slight French accent, that’s why. So anyway, today’s video, we’re going to get gently back into things and talk about when sub-query plans are suboptimal. I know, happens to the best of us, especially when you have to deal with SQL Server being its silly self. But anyway, let’s carry on with a little bit about how you and I can cooperate together. If you would like to become a member of this channel, a paid member, you can join like 50 other people who have been so kind as to donate $4 a month to the content that I produce here. There’s a link right down in the video description where you can do that. Very easy, very clickable, like me. If $4 a month is just too rich for your blood, wherever your blood may originate from, you can like, you can comment, you can subscribe, and you can make different numbers go up in different ways.

that make me a happy, healthy human being. If you need help with SQL Server, if the things that I talk about during these videos, ring a bell, and you’re like, wow, my SQL Server could use some help in that area. I do all of these things and more. And as always, my rates are reasonable. If you would like some more reasonable stuff from me, you can get all of my training content, performance tuning training content, I don’t have anything else, for about $150 USD. It’s all available at that URL up there, with that discount code down there. And of course, as usual, there’s a very clickable link down in the video description for you to do all that. Upcoming events, there is still SQL Saturday, New York City 2025. Taking place May 10th at the Microsoft offices in Times Square. So if you haven’t been robbed by someone dressed as a Sesame Street character in a while, or someone dressed as Spider-Man in a while, come to Times Square, come to Times Square, get your money’s worth, or lose your money’s worth, I don’t know, whatever. There’s probably a joke about congestion pricing in there somewhere. I just don’t know what it is yet.

But anyway, let’s talk about these suboptimal subquery plans. Now, I’m going to show you the starting version of this query, because there are some important things in there for you to understand about it. But this is not going to be the final query that we end up with. Now, this query runs for about 12 seconds, and it returns 4.8 million rows, which is far more rows than you will ever need to return to an end user for them to do anything reasonable with. Way too many, right? Way too many, right? It’s an insane number of rows. Why would anyone ever need that many rows for anything, right? Unless you’re going to put it in an Excel file and do something else with it, right? It’s just an absurd number of rows.

So this is obviously not a very good example of when subqueries have problems, because we return a lot of rows, and stuff like this part of the plan are completely sensible. So these two index seeks into the votes table, which have absolutely correct estimates, right? Absolutely. We had 100% on both of those, come from this part of the query, right? We have an upvote count and a downvote count. And of course, upvotes are signified by a vote type of 2, and downvotes are signified by a vote type of 3.

So, like, this is the part of the plan that we’re going to concentrate on for now, or rather for most of the video. But then I just want you to understand down in here is where the rest of the stuff happens. So, like, in the rest of the query, we have another subquery that goes to the badges table, and another subquery that goes to the comments table.

And down here in the query, we join users to posts, right? So there’s some stuff going on in here. But all of that, like, the two things that hit the votes table are up here, right? So, like, the order that we wrote the query in is not the order that SQL Server arranged the query plan in, which is going to come in handy for us to understand later.

And then down in this part of the query, this is where all the rest of the stuff happens. We hit badges, we hit users, we hit posts, and we hit comments, right? And SQL Server, for better or worse, has chosen parallel merge joins for all of these things.

But we don’t necessarily care about this part. Most queries that you’re going to have to tune are going to limit rows in some way, whether it’s by top or offset fetch. And for the sake of typing, we’re going to change that up here to top 5,000.

Oh, not 54,000, top 5,000. And that’s going to be nice and sensible. The thing is, when we change the query to the top 5,000 version, things don’t necessarily get all that much better.

So let’s run this, and let’s wait very patiently for SQL Server’s suboptimal optimizer to come up with a query and query plan for us and return some results. So the query that I ran that returned 4.8 million rows, the query execution time was about 12 seconds. And we had a big parallel plan with lots of joins and stuff.

In this plan, SQL Server has changed its mind about parallelism. It has not changed its mind about this part of the plan, though. Notice we still have a merge join.

The two subqueries that hit votes are still up here. The other funny thing that has now happened to our query is we have very, very bad estimates in here. But these are not bad estimates in the typical sense.

They’re not like SQL Server had bad statistics or out-of-date statistics. Of course, I just created these indexes, so the statistics are full scan up to date. The poor estimates here come from row goals.

Now, whenever you do something like top or offset fetch, sometimes exists or not exists. There are all sorts of things in SQL linguistically that add something called a row goal to your query plan. And if we right-click on these operators, what you’re going to see, not in the thing up here.

This is where it doesn’t show you anything useful. But down in here, we have an estimated rows without row goal. And this estimate is correct, right?

See, this is the right number of rows. What SQL Server made a bad estimate on is how many rows it would have to read out of here in order to get, in order to meet the row goal that we set by saying top 5,000. So, and it did that for both of these, right?

So, if we look at this one, this one will also have an estimated rows without row goal. And this is estimated rows without the row goal. And that is absolutely correct. But where things fell off was, of course, between the estimates and the actuals.

That’s a little bit easier to see in here, where you have the number of rows hiding somewhere. And then the estimated number of rows. We have the actual number of rows read up here.

Sorry about that. Got a little bit lost in my own thoughts. And then we have the estimated number of rows to be read. But this is, of course, without the row goal, which it doesn’t say. But the estimated number of rows is 3,300.

But the actual number of rows is 2, 1, 4, 1, 8, 1, 9. That is a seven-digit number of rows. So, we read a lot more rows than we thought we would. The row goal is kind of messed up in that.

But none of the other index accesses in here, if we look at the properties, at least, yeah. So, this one does. This one does not.

This one does not. And this one should not either. So, none of these have the estimated rows of the row goal. The post one does. So, where things get interesting is that we got a, probably when we shouldn’t have got a serial execution plan there.

Now, I’ve created an index on the post table, which it should have used. Because this index fully covers everything that we want to do. But SQL Server costed this index out of existence. Now, let’s look at the estimated cost for this plan.

Right? SQL Server thinks it would cost 48 query bucks. So, my cost threshold for parallelism is 50. So, this costs just under the cost threshold for parallelism.

So, SQL Server was like, ah, not quite parallel plan territory. Also, it didn’t use my nice index on the post table. On the post table, it uses the clustered primary key.

Right? So, for some strange reason. And that strange reason can be revealed by adding a hint to the post table. Now, the index that I created up here is called P.

So, we’re going to, down here, we’re going to say with index equals P. And then, we’re going to rerun this. And I’m going to show you something kind of interesting.

Because sometimes these are the wrestling matches that you have to have with the optimizer in order to get queries to perform well. Now, this query plan is parallel. And this takes, well, I don’t know.

Let’s just say about half as long. The last one was like 9 point something seconds. This is 5 and a half seconds. So, this one’s about twice as fast. Not quite. I realize my percentage math is not great, especially on the spot.

But if we look over here, the part that SQL Server got this query expensive enough to be eligible for a parallel plan was using the nonclustered index. And SQL Server deciding that it needed to resort the data that came out of that index to make this join more efficient.

So, it put the data that came out of our nonclustered index in order by the ID column in the post table. So, it reordered the data.

And now this query plan, actually, I forgot to show you this, has a cost of 596 query bucks. So, this is why SQL Server didn’t choose the nonclustered index in the first place. SQL Server thought that choosing the nonclustered index and sorting that data would make things too expensive, was way more expensive than just using the clustered index.

Of course, we can see a big difference in execution time here, right? So, again, about twice as fast when we get a parallel plan. We still have problems up here, though.

The problems that we have up here are that even though we’re doing seeks into these, right, we’re still doing seeks into our index, the row goal is still making a bad guess at the number of rows it’s going to take to satisfy the row goal set by the top 5,000, okay?

So, one thing that you can do if you want to affect how SQL Server is doing things and you’re very confident not only in your indexes but also in the plan shape that you’re after generally is you could say option loop join.

And what SQL Server will do is give us the plan shape that we would really want from queries that perform sub-queries. In other words, we don’t want SQL Server doing this part of the plan before we’ve done the part of the plan that actually does all our filtering, right?

We don’t want SQL Server doing this stuff. So, if we run this query again with the option loop join hint, we’re going to get the plan shape that we’re after. The plan shape that we’re after comes up here where all the stuff that we’re like doing our initial filtering on, right?

Like this is really the crux of our query. Selecting from users, joining the posts, and then filtering the post table on these creation date columns. So, this is really where we want SQL Server to start and narrow down the rows as much as possible.

And this is the plan shape that we’re after where taking that result, we now do loop joins across all of this stuff. And this query takes about one second, right?

That’s much better than the five seconds that it took when SQL Server chose a bad plan shape. And that’s much faster than the, well, I guess it was the same plan shape. So, we’re going to say this is a much better plan shape generally for our query, right?

And I forget, actually, you know what? We’re going to do a little experiment here. We’re just going to see what happens when I take that index hint out of the mix.

Let’s just do a little experiment because I actually forget what happens. So, this one takes actually 1.6 seconds. So, this one’s slowed down by about half a second.

And SQL Server goes back to scanning the clustered index on the post table. So, we really, but we still get a parallel plan, right? So, at least as far as query tuning goes, this is still better than what we were getting before. I’m going to leave this index hint in because I generally like the speed that I get with this index hint.

If you are not confident that the query plan shape should only ever use loop joins and you’re thinking to yourself, well, I want the optimizer to be free to choose other types of joins. One thing that you can do is set more row goals.

Now, let’s think logically for a minute about how many rows a subquery can return. One, especially when we’re doing an aggregate like count or sum or average or min or max or something.

Those are all things that have, that really just return one row. If a subquery ever returned more than one row, well, guess what? Throw an error. So, what we can do is rather than say SQL Server, you always have to use loop joins. We can say, SQL Server, we want more row goals.

Give us our row goals. And if we put a top one in both of these, SQL Server will finally see the light and will give us a query, give us the query plan shape that we want without having to say you always have to use a loop join.

Now, SQL Server does naturally choose loop joins here. But now, if we look down to where we access the votes table, we do have a top for both of these, right? It’s a tiny little top here and a tiny little top here.

And we have the same index seeks that we got before. Now, granted, the estimates in this query is still off a little bit, right? Like, I mean, we’re under by 10% here. We’re over by 43% here.

We’re over by, wow, like 1,700% here. We were, you know, over by 5,500% here. We’re over by 24% here.

Math, math, math, math, math. But we get a fast enough query with a good enough plan shape that we want. So whenever you’re tuning queries or whenever you’re, you know, having to deal with someone who’s like, wow, subqueries always suck.

Never use subqueries. Sometimes they just don’t know enough about databases to be able to say that. And they don’t know enough about query tuning to be able to, you know, make that make that judgment across the board about every subquery ever.

Sometimes you just need to give subqueries a little bit of help. So in the future, whenever you’re looking at LinkedIn or whatever other social media you like, and, you know, some, you know, data engineers like starts posting one of those, you know, LLM lists of top 10 things not to do with a query.

And you see things like avoid distinct and avoid select star and like the same list of things and like avoid subqueries, use CTE. You can just firmly know in your head that they are complete ding dong idiots, regardless of which giant company they work for.

Specifically, probably meta, which has like the highest quantity of idiot data engineers on the planet. You can say to yourself, hey, you’re wrong about all that stuff. Because Erik Darling taught me that you’re all wrong duty heads.

So you have that to look forward to. And you can, you can point them to this video where I will, I will gladly, I will gladly fight with them. Anyway, sometimes some queries just need a little bit of help.

Sometimes you have to grease the wheels a little bit, but you can always make a query better and faster. And it does not always involve taking subqueries out or adding in CTE or whatever other nonsense people tell you about on the internet.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I hope that you will pick fights with data engineers online because it’s fun to do because they’re all cock-a-doodie heads.

Anyway, cool. All right. Goodbye.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.