Replacing Subqueries With Conditional Aggregates In SQL Server

Video Summary

In this video, I delve into the world of optimizing SQL Server queries by replacing multiple subqueries with conditional aggregates. You’ll learn how to transform complex and repetitive subquery logic into more efficient join operations, significantly improving query performance without altering your database schema. Alongside practical examples, I walk you through a real-world scenario where we compare the execution plans and results before and after optimization, demonstrating not only speed improvements but also ensuring correct results. Whether you’re looking to enhance your SQL Server skills or just curious about how to make your queries run faster, this video offers valuable insights and tips that can help you become a more efficient database consultant.

Full Transcript

Erik Darling here with Darling Data. And we’re going to take a little break from talking about store procedures in this video. Though you might see this pattern in store procedures. Bats was sick of hearing me go on and on about store procedures and he told me in no uncertain terms, we should talk about something else today because it is Friday. It’s time to talk about something slightly different. So this is, today we’re going to talk about replacing sub queries. Sub queries, notice sub queries there is pluralized. Not, I’m not talking about replacing a single sub query because SQL Server’s optimizer does convert sub queries to joins, but multiple sub queries where you’re calculating almost kind of the same thing over and over again with conditional aggregates. All right. So, um, rather than making multiple trips with like the same query or very similar queries, uh, to a set of tables, you’re just going to replace that with some joinery. Um, but before we do that, um, but before we do that, um, we are going to talk about how you, you could replace money in my bank account with money, uh, from your bank account. You can do that in a number of, of very sweet, kind, tender ways. Uh, if you would like to support my endeavors to bring you this fantastic SQL Server content, uh, right, right in the video description, I want to make very clear the video description holds the answers to all of your link clicking questions. Uh, you can see, sign up for a channel membership and for as few as $4 a month, you too could feed a starving SQL Server consultant. Uh, if you like my content, but you don’t have $4 a month, um, I don’t know, I guess tax season is upon us. Uh, you can like, you can comment, you can subscribe. And if you would like to ask me questions, if you would, if you think, hmm, this, this Erik Darling seems to know a thing or two about SQL Server, I might, I have a question about SQL Server. You can follow this link, which is again, is in the video description.

And you can, you can get right to where you, you ask me the questions. If you need more help with SQL Server, uh, once again, I am the best SQL Server consultant in the world, uh, with the, the exception of one small island state in the, in the Pacific, which I’m, I’m tired of naming, saying by name. Uh, I don’t know. I don’t know. I don’t know. I’m gonna feel good about myself. Uh, I, uh, health checks, performance analysis, hands-on query, index, server, everything tuning. Uh, if you’re having a performance emergency, or if your developers need training because you keep having performance emergencies, well, golly and gosh, my rates are reasonable.

Uh, if you would like to buy some training from me, you can also get that for a very reasonable rate. Uh, but 150 US dollars for life. You can get 24 hours of it. Again, a fully assembled link for you to make that purchase is down in the video description. God, I make this so easy for you. I don’t understand why, uh, why you act like it’s difficult.

Upcoming events we have, and Bats is very excited about this because, uh, Bats is actually going to get some Pez put back in him for, for this one. Uh, we have SQL Saturday 2025 taking place, uh, in New York City. That is a New York City version of SQL Saturday. Uh, Bats is terribly empty right now. Uh, on May the 10th with a pre-con, uh, given by Andreas Volter on May the 9th.

Uh, you should, you should obviously attend both because it will do you a lot of good. Get out into the world to visit, maybe, maybe not a new place, maybe a familiar place. Uh, maybe see some familiar faces, maybe make some new friends. I mean, who, who knows?

You take public transit in New York City and just, you know, you could, you could meet all sorts of fascinating, interesting creatures. But, but, but, but, but, but, but, but, let’s go talk about replacing subqueries, multiple subqueries, with conditional aggregates. So I’ve been seeing this pattern in a lot of client code, like, over the years.

And for some reason it just never dawned on me. Let’s talk about it. I don’t know why. It’s a weird thing. Because it’s a thing that I’ve spent a lot of time rewriting in various, various situations, in various ways.

And, I don’t know, maybe I’m just finally sick of doing it. But, here we are going to talk about how, how, what it looks like and how you can fix it. Uh, so the first thing I’m going to do is just, um, clean up this temp table stuff.

And then I’m going to dump some rows into the temp table. Uh, you see we get 13,265 rows. And, uh, so what, this is what I see a lot is, um, like maybe for some reason, like you had all these subqueries in the initial insert.

And you were like, oh, this insert’s taking forever. Oh, I hate this. Oh, good.

Can, this is all messed up. And, so like, I don’t know, for some reason, someone thinks it’s a great idea to set, like, do like an initial insert to get some primary key values. And then come back and either update the table and update all the remaining columns.

Or, depending on how tricky the logic is, maybe spread that out over multiple updates. So, like, just as an example, like the logic for, come on, zoom it. Five times I hit control and one before this thing responded and zoomed.

Uh, like, like, like stuff like this can be kind of tricky to, to figure out for some people. So, like, sometimes I’ve seen, like, like other updates get pushed in, or rather other, um, aggregates like this get pushed into separate updates. So, you’re not just updating the table once after you insert some rows.

It’s just like, like multiple, multiple update steps, which just drags on and on and on and on and on. Um, you know, especially with the bigger temp tables, right? Right. So, uh, the insert for, I’m assuming it being a real, real problem.

Uh, the insert for this did not take very long at all. Just a hundred and seventy something milliseconds. Uh, not bad, right? It’s fine.

But then, uh, what we’re going to do is we’re going to do the update. And then we’re just going to sanity check the update by looking at John Skeet stats. Because John Skeet stats, uh, are a great way to sanity check most things. So, let’s run this.

And we’re going to wait for this update to finish. Because that’s, that’s just what we have to do. We don’t have a lot of choices. We must wait for the update to complete so that we can judge its efficacy, don’t we? All right.

So, this is the query plan for that. And this is not really the great kind of query plan. Like, this is, you know, granted, we could probably do some indexing work on this to make things a little bit faster. But, uh, let’s just say, I don’t know, maybe we, maybe we just want this thing to be faster without creating indexes.

And, you know, this, like, you know, having to deal with all that stuff. Maybe we just want to rewrite the query. And we want things to be magically faster without, like, changing the structure of the database.

Maybe we’re not allowed to for some reason. But, anyway, uh, this whole thing takes about 9.1 seconds, right? Not a good situation.

Nine seconds. We are unhappy with nine seconds. But what I want to do before we, before I forget, is grab John Skeet’s stuff from up here. And let’s just drop this into a little comment here.

And if you’ll bear with me for just a moment, I’m going to line some of these columns up with the magic of tabs. So that we have something that we can look at quickly a little bit later. So what you’re usually better off doing is something like this, right?

So let’s just clear the data out of that table. And what we’re going to do is we’re going to insert everything into the table all at once. But the way that we’re going to handle the aggregates is with this sort of conditional case logic, right?

So for the top question score, we’re going to say get the max score where post type ID equals one. Same thing for answer score. For total question score, we’re going to do that with sum instead of max.

And then for questions and answers, we’re going to do the same thing with sum. But rather than using the score column, we’re just going to use the number one, right? So every time post type ID is one, we’ll add one to whatever we’re summing up.

And the same thing for when post type ID equals two, then we’ll add that. Otherwise, we are adding zero to the total, right? And when we do this, we only need one join to the user’s table.

The thing is, this does have to be a left join because an inner join would restrict rows where people don’t have stuff in there. And there might be people who, for some reason, have a reputation of over 100,000 or over, sorry, over 10,000, right?

That’s a five digit number. So the comma goes there. And you might, so like this might actually get people out, right? Like maybe they have a high reputation for something else. I don’t know, their test accounts or something.

I don’t know. I don’t know. I don’t know what to tell you there. But when I tested this and I had an inner join there, we had fewer rows ended up in the 10 table.

And I said, that’s not good. Eric, you can’t show that to people. You can’t go out in public like that, not wearing pants. So yeah, this does have to be a left join because remember, subqueries do not eliminate rows, right?

So when you do a subquery like we did for these up here, these all end up as left joins, right? These are all left joins because they’re not filtering queries, right? There’s no rows.

No, these are not allowed to filter rows out from the results, right? So these just find the columns in the temp table and do an update based on what’s in there.

They don’t like remove rows from the temp table. They don’t like, there’s nothing gets restricted. So let’s, I think I already truncated this, but let’s do that once more just in case. And remember this, that whole thing above took about nine something seconds, right?

9.1 seconds plus the hundred milliseconds to get stuff to insert it into the temp table initially. And so let’s run this.

And this should run just in a, just about a second and a half or so. And I want to grab this from up here and just drop this down here so we can do a comparison to make sure things are correct.

Let’s make a little note, original result. And now let’s go look at the execution plan. And of course, even without doing any index tuning, we bring this down to about one, like about a second and a half, right?

1.6 seconds down from 9.2 ish seconds just by like eliminating the multiple, like, like hitting the post table multiple times and just doing a single join with that conditional logic.

Now, if we go visit the results and we say copy with headers, oops, for some reason I hit control and E rather than control and R and I don’t need to execute that thing all over again. That would be no fun.

But let’s line this stuff up and let’s just make sure our results are accurate or rather let’s just make sure our results match because that’s a pretty good sign that we tuned the query and we got things correct.

And if you look across here, everything lines up just perfect. And so this is not only faster, but yields correct results. Two very important things when it comes to query tuning. Some people might tell you that they tuned a query, but then you look at the results and they are completely different.

Of course, you do need to be a little bit careful. And when you, if you tell someone that the results are different because all sorts of things can make results look like they’re wrong, but they’re actually correct. I think, you know, and maybe, I don’t know if that’s worth a different video, but like, you know, like obviously different query plans, serial versus parallel, even like running the same parallel plan over and over again.

If you’re not ordering by something deterministic, the, you know, the results are going to come back in all different, like, like all different ways, right? If you’re ordering by a non-deterministic column where you just don’t have an order by on the, the comparing query at all, like the results might look all different.

Like you might have all different IDs coming back in different orders, but like the results for those IDs are all correct across. And then like, you actually do have the same set of IDs for both, just not in the same order.

So whenever you’re comparing things like that, you know, of course intersect and accept are very useful for that sort of exercise, but also just make sure that you’re ordering the final results when you’re comparing them to make sure that like, you know, like other things aren’t interfering with the result order that, so that they look like they’re wrong.

But anyway, this is a great way to replace multiple subqueries with conditional aggregates. Make your SQL Server queries go faster, get correct results and look like a big hero, right? Go from 9.2 second query to a 1.6 second query.

Gosh, you’ll, you’ll look, you’ll look real good in your next performance review. Don’t you? Don’t you?

Sparkling. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you, well, like I said, this video drops on Friday, so I do hope you have a great weekend.

And I will see you, see you Monday with an office hours video, so adios.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.