Scalar Aggregates Can Use Hash Matches With Batch Mode

GREAT

Video Summary

In this video, I share a minor yet intriguing discovery related to SQL Server 2019 and its batch mode for rowstore operations. Specifically, I delve into how the query optimizer behaves differently when running scale R aggregate queries—those that should return only one result—and how adding a phony GROUP BY clause can lead to unexpected execution plans. The video highlights the transition from row mode to batch mode in SQL Server 2019 and explains why this change might impact memory grants and stop-and-go operators, offering insights for those moving to or already using this version of SQL Server.

Full Transcript

Happy Sunday, isn’t it? What a, what a day, what a time, what a date time to be alive. The year get date, we are, we are alive and well and happy. I wanted to record a video before I go off to do other things about a very mild, minor discovery. I’m probably late to this discovery by months. But I don’t care. That never stopped me before. When I, I’ll, I’ll discover new things all day. Just the other day, I discovered South Brooklyn. It’s amazing. Can’t believe how well it was developed for something that I just discovered. Anyway, so with SQL Server 2019, we get batch mode for Roastore if we’re on Enterprise Edition and then compat level 150 and all sorts of things. other good stuff. But yeah, it used to be. Now I learned this from a, a Craig Friedman talk on the query processor, like years ago, that the only option when we run a, a scale R aggregate, meaning an aggregate that will only, only ever return one result. Like if I select some score from comments with this where clause, we’re only going to get one line back or one row back or whatever. But then we’re going to get one line back or whatever. But we’re going to get one line back or one row back or whatever. But we’re going to get one line back or one row back or whatever.

But if I were to say like group by user ID or group by post, then we get a whole list of scores. I mean, without, if we don’t put those columns in the select list, then, you know, it’s not very helpful because we don’t know which user, which, which posts we have the sum of scores for. But we could still do that and return a bunch of sums back, which is not helpful, but something we could do. With the old way of doing things in row mode plans, the only option available to the optimizer when you run a query like this is to use a stream aggregate. We have one stream there. This is a partial aggregate and then one stream over here and that’s a global aggregate and whatever. That’s, that’s nice. That’s all well and good. And I don’t know. It’s interesting. Now, what’s even more interesting, I think, is that even if we add a phony group by and we tell the optimizer, we’re like, buddy, listen, I want you to use a hash group on this. And we run this query.

Well, we, we don’t get an error. A lot of times if you add an option hint to a query and the optimizer can’t come up with a feasible execution plan, you’ll get that error. That’s like, yeah, I couldn’t generate a good plan. I can generate a plan based on those hints. Please remove them and try again. We don’t get that. We don’t get that here, but we also don’t get the hash group. We get one to stream aggregate. So we’re at, we have a very disobedient optimizer. We need to spank this optimizer. Optimizer needs a spank and probably a grounding and take its iPad away or something. But this all changes with SQL Server 2019, assuming you are on enterprise edition and also perhaps assuming that you are in compat level 150.

Now, if we run these queries, right, we select some creation date and slept blah, blah, blah, blah, blah, blah, blah, blah, option hash group. Now, when we run those, we can see that the optimizer had a hash match aggregate in the plan. We did not do the, the partial stream and then go serial and then another than the full stream aggregate. We have just a single hash match aggregate in here. It’s exciting stuff, right? Why, why do we have that? Well, in compat level 140, we were in row mode, but now the magic of compat level 150, our, our clustered index scan over here is, uh, is in batch mode.

Nice, nice. And our hash match aggregate over here is in batch mode. Nice, nice. There we go. All right. So why is that interesting? Why does that make a difference? Why is this a big deal? Well, there’s a couple things at play here. Uh, internally, uh, stream aggregates are, uh, non-blocking, meaning that, uh, like with, so like if you have a, a hash aggregate or a hash join, there is a pause within the query for the hash table to get built. And then when things begin probing, things carry on in the plan. That’s what they’re called blocking or stop and go operators.

Uh, stream aggregates don’t have that. The other thing about stream aggregates is that stream aggregates don’t require a memory grant where hash aggregates do because we need some scratch space to write all that stuff down in. So if you have queries where you’re, uh, uh, performing a scalar aggregate and, uh, perhaps where, uh, you had, you know, some reliance on there being a streaming operation or perhaps just where you didn’t have a memory grant before, you may find yourself having memory grants. Now you might find yourself having stop and go operations in your, in your query plans now, because with this additional choice available, we change the query plans that are available for these queries. So is this incredibly interesting?

Eh, it’s sort of interesting. Um, is this incredibly dangerous? Probably not any more dangerous than any of the, of the other possibilities that, uh, become, become available with, uh, batch mode on rowstore. But it is something interesting to consider. Um, you know, if, if you, especially if you are getting, uh, scalar aggregates for very large data sets, you may find that those, uh, those hash match aggregates ask for potentially large amounts of memory.

That could, that could change the, uh, the face of your workload if this is happening, like, during some overnight process that, like, populates, I don’t know, let’s just call it like ETL ish, right? Like you might load a bunch of summary data into a table. You might actually have an ETL process. You might actually do something. Uh, and like, you might even try to make it concurrent.

So you’re like, I’m going to do a bunch of these sums at once and I’m just going to send them on and, you know, pass stuff over. Whatever it is, whatever it is, it doesn’t matter. Just, it’s different. It’s new. It’s different. And I want to tell you about it. So I want you to be aware of it and I want you to not be astounded or shocked or dismayed when you find these new things in your query plans as you all slowly move to SQL Server 2019, which I think you should, because that’ll give me more interesting stuff to do, other than like, we must fix the function. We, this, this, this, this, this, this, this, this, this, this, this, this, this, this, this, this, this, and we got an inappropriate joy.

Like, there’s a lot of stuff that I, I wish we were all on SQL Server 2019. So I could, I could, I could start fixing more, more, more things than this sort of groundhog day stuff that it’s been a problem in SQL Server forever and ever. Anyway, I’m not going to turn down groundhog day work. Just keep in mind that, um, you know, SQL Server 2019 does fix a lot of the groundhog day stuff that has, has been shocking and dismaying and annoying and aggravating people for 20 years now.

So I’m excited. I can’t wait. I can’t wait for, uh, for it to be widely adopted or who knows, maybe by the time it’s, it’s even close to widely adopted, we’ll be on like SQL Server 2025 or something. Maybe I’ll have opened a gym by then. Maybe I’ll just not even be looking at SQL Server anymore. Who knows? Who knows? I don’t know.

Um, but heck, I’m optimistic about the future. All right. Goodbye. See you in the next video.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

GREAT

Video Summary

Full Transcript

Going Further

Related Posts