A Fun Plan Shape With Aggregates In SQL Server

A Fun Plan Shape With Aggregates In SQL Server


Video Summary

In this video, I delve into a fascinating query optimization scenario that caught me off guard due to its clever and efficient plan shape. While exploring a query on the Stack Overflow database for an advanced databases course taught by Andy Pablo at Carnegie Mellon University, I stumbled upon this intriguing SQL Server query transformation. The optimizer’s ability to unnest subqueries in a way that avoids redundant table scans is particularly noteworthy. This optimization not only demonstrates the power of SQL Server’s query optimizer but also piques my curiosity about how other database systems handle similar scenarios. I discuss the specific trace flags and hints involved, as well as the unique plan shape that results from these optimizations. By sharing this insight, I hope to inspire further exploration into query optimization techniques across different databases.

Full Transcript

Erik Darling here with Darling Data, and we’re going to be looking at what I think is a really fun plan shape in today’s video. And I came across this because I was watching a video series. So a little bit, little background is a really smart fellow named Andy Pablo who works for Carnegie Mellon University. He is the professor of databaseology there. And he, every, you know, semester, he does like an intro to databases course and advanced databases course. He’s running running running one right now on, uh, query optimization and they’re all, they’re all available for free on YouTube. So if you list, look, look up like CMU database group, you’re bound to come across that stuff on YouTube. Uh, and he was, there was a video where he was talking about like, uh, you know, different query optimizers and like stuff that they’re good at and stuff that they’re maybe not good at. And, um, one of the things he said, is that like, uh, like Postgres and Oracle are, or some collection of databases are better at SQL Server, uh, at this one specific like type of query, like unnesting subquery thing. And I was like, whoa, I’m the one who gets to say Microsoft’s not good at stuff here. Not, not you. I can say whatever he wants, but, uh, yeah. So I was like, hmm, well, this, this, I gotta see. So I went and I, I wrote what I thought was like a pretty reasonable approximation of the query using the Stack Overflow database.

Now I was quite surprised by the plan shape because it wasn’t bad. And, uh, I think it actually has kind of a cool optimization in it. So we’re going to talk about that. Um, before we get into all that stuff though, uh, you, you may have noticed this slide has finally changed for the first time in like months. And the change here is that, uh, I will be, uh, accepting office hours style questions anonymously. Uh, if you go to that link right there, this, this will be in the video description.

Of course, like everything else is like when in doubt, refer to the video description. This will be in the video description. So if you, uh, would, if you would like to ask me anonymous questions that you would like answered on YouTube, this is given, keep in mind, this is not private consulting help. If you need that, you have to hire me and pay me. But if you want to ask an anonymous question, I will do my very best to answer it here on YouTube. Uh, but aside from that, all the same stuff applies. If you would like to support my endeavors with this channel, you can sign up for a membership again. Link for that is in the video description.

If you do not have the four bucks a month that it costs to show me a little bit of cup of coffee, love, uh, you can like, you can comment, you could subscribe. And now, now you can take up more of my time by asking me questions too. So lucky you, it’s your big day. Uh, it’s your turn to shine. You absolute champion. Uh, if you need help with your SQL Server, I am the best SQL Server consultant in the world. If anyone tells you different, they are liars, they are charlatans, they are frauds. Uh, do not, do not buy their line of nonsense.

Uh, so if you need help, these are my services and my rates are reasonable. If you would like some other reasonable stuff from me, like say SQL Server training, you can get all of mine. It’s about 24 hours of content. Uh, again, refer to the video description for, for, for more links. Uh, you can get all 24 hours of mine, uh, for about 150 US dollars. It is the best deal on the internet as far as I’m concerned, uh, upcoming events. Uh, I’ll have more of these soon.

I’m waiting on dates to get locked down for some stuff, but right now, SQL Saturday at New York City is happening on May 10th, 2025 at the Microsoft offices in Times Square. Uh, it’s going to be a lot of fun. Um, I, I’d like to say something funny, like I’m signing autographs, but, uh, A, no one wants my autograph and B, I’m not signing autographs. So we’re just going to have to deal with that. Anyway, let’s get on and let’s look at this fun plan shape.

Now, uh, I’ve got some stuff down at the bottom and the stuff down at the bottom, if we talk about this from the bottom up, uh, we’ve got some, we’ve got a couple of trace flags there. Um, one of them, 8619 is there. So we’ll output in the messages tab, uh, applied transformation rules to the query. Because I want to show you one of those, uh, we’ve got 3604, which will means that stuff will get output to the console, the messages tab.

I’ve got a recompile hint. So SQL Server has to apply rules every single time. And I’ve got the query compat level set to 160, not because you need 160, not because you need the new cardinality estimator for this plan shape to happen. And only because when I use 160, it runs in batch mode and it runs a little bit faster than it does in row mode.

And sometimes time is of the essence, isn’t it? But the plan shape that I want to show you today is, uh, from this section right in here. Now, uh, you know, you have this section of the query up here, and this is just a simple join from the post table to the votes table.

And then down here, you have this thing saying where the score in the post table is equal to the max score in the post table, right? So this is another reference to the post table. We have one, two references to the post table here. Now, what I think is really interesting about this query plan optimization is that when you look at it, you might be expecting that we will have, uh, two references to the post table in the query plan, right?

Because after all, we’re selecting from the post table up here. And we are saying where the score in the post table is equal to the max score in the post table. So you would expect to see where you expect to have to touch the post table once for the from, and then again for this from, but you, you actually don’t have that.

If you go and look at the execution plan, you have this, right? You touch the post table once here, you touch the votes table once here, and there is no other reference to the post table in here.

SQL Server applies kind of a neat transformation where, um, it grabs the top one with ties in here, and it sorts, uh, the data for that by score descending here. So we pull out all of, um, we pull out all of the, the score table stuff here, and that’s, that’s all, that’s actually all that we need and all that we use in the plan, right?

We don’t, we don’t have to actually do anything. SQL Server takes the, takes this query and transforms it into just a couple aggregates. And then way over here, uh, we have a filter, but, and you would think maybe, maybe this filter is where we’re applying this, but, but we’re not.

The thing that we’re filtering on is for the count. So like part of the query that I have down here at the bottom is where count big is greater than one, right? And that’s what this filter is applying.

We are not actually applying any filter regarding that, comparing the min score to the max score, right? Like that, like, or rather, sorry, just comparing the score to the max score. We don’t actually need a separate access of the post table to do this.

Like the reason why this plan shape caught me off guard is because A, it’s pretty good. It’s a pretty smart one, right? It’s, and this is not new. Uh, Paul White blogged about, uh, something similar 15 years ago.

This is not a new addition to SQL Server. This is not a new thing, but, uh, you can see the rules that got applied over here. And, uh, where is, uh, believe it’s this one, uh, that does that, uh, that does the, the, the, this is the rule behind responsible for the, the transformation, uh, that does this.

Right? So we actually just kind of aggregate this data once and then get the min from, from that as well. So SQL Server is able to do something really smart here and like not have to touch the post table twice.

And this got me thinking like, what are these other query engines doing? That’s better than this. Like I’m, I’m now, I’m now I’m just absolutely curious, furiously curious is what these other things do. So I don’t know.

I’m going to have to, maybe I have to go figure that out. Cause I, cause I’m, I’m just very interested now. Now there are a couple other things that I want to show you here. Uh, one of them is this is obvious.

This would obviously change the meaning of the query because this is correlating, uh, where the owner user ID out here equals the owner user user ID up here. This doesn’t, this will change the query results. Right.

Um, I’m actually not sure if I want to run this with it to actually get results. It might be smarter to just get the estimated plan. But notice that like, even with that sort of additional complication added, added in SQL Server doesn’t still doesn’t need to touch the post table twice. What does get added is this segment operator.

So sort of like when you have a row number with a partition by SQL Server adds a segment operator to mark those segments and do give you like the partitioning, uh, groups for like row number dense rank or whatever. So SQL Server just adds a segment operator to this. You still have the exact same plan shape just with this one additional operator added in.

Now this is only, uh, available for direct equality predicates. So like where score equals something, if you were to change this to like greater than or equal to, or less than or equal to, or any other, uh, non equality part, or even like not equal to, or something. Uh, then SQL Server would no longer have that available to it, right?

Like now we have two accesses of the post table where SQL Server like aggregates to get one part of it here. And then does like this whole part where it aggregates to get the next part of it. So there’s like, there, it can’t be applied to every single query form.

It’s really only a quality predicates right here. But usually this is, I don’t know, I don’t know, this is the way most of these queries get written anyway, unless, unless you’re dealing with like average and you want to find like where something’s greater than the average. Finding where something is greater than the max is pretty, uh, pretty kind of a stupid query actually.

And just finding anything that’s greater than the minimum, maybe a little bit more realistic, but usually when you see stuff like this, they are using averages. And, you know, usually you don’t want to find things that are like exactly average. You want to find things that are either like above average or below average by, you know, some amount.

So anyway, I thought this was a cool optimization. I thought this was, um, not, not the worst thing that I’ve ever seen SQL servers query optimizer do. And, uh, yeah, so no, I, I was, I was honestly, again, just a little bit puzzled as to what these other query engines could be doing.

That’s so much better. Uh, especially because in the video there was like this little teaser moment where he’s just like, but, but Microsoft’s about to get better at this. And now I’m like, well, I mean, it would be hard for you to get much better than this.

I think, I think this is a pretty ideal situation for the optimizer. So we’re going to give Microsoft a thumbs up for this query optimization. We’re going to say, good job, optimizer team.

You did a good. We’re proud of you. Keep up the good work. Keep stuff like this. Great to see. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you over in the next video, which will be, um, well about something else entirely. Because, uh, there’s only so often you can discover, or there’s only so often that you can be introduced to, uh, to new plant shapes, I guess.

Um, at, at, at this point in my life. So, yep. All right.

Well, we’re out of here. All right. Thank you. All right.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.