How To Think Like A Batch (Mode) In SQL Server

Video Summary

In this video, I delve into the intricacies of batch mode friendliness in SQL Server queries, sharing practical examples and strategies to optimize your query plans. Starting with a common annoyance in a query plan, I demonstrate how certain operations can hinder batch mode execution, leading to suboptimal performance. By exploring alternative query structures and pre-aggregating data, we achieve more favorable execution plans that leverage batch mode effectively. The video also touches on the importance of understanding where batch mode excels and aligning your queries accordingly, providing valuable insights for anyone looking to improve their SQL Server performance tuning skills.

Full Transcript

Erik Darling here with Darling Data. And boy, I got a video for you today. We’re going to talk a little bit more about batch mode friendliness. I don’t know, these seem to do fairly well as far as peaking interest from people. So what the hell? Let’s do it some more. Before we get into that, the usual ol’ spiel schnitzel. If you want to support the content that I produce on this channel, you can sign up for a membership. It’s a good deal. It’s a good deal. Because then I’ll keep doing it and not just retreat into a cave and keep it all to myself on stone tablets. If you want to ask me questions from my Office Hours episodes, there’s a good link for doing that down in the video description. I have enhanced the link to give you more opportunities to add detail to your questions as well. If you need consulting, you know, consulting, looking at your SQL Server and you’re thinking, gosh darn, this thing is slow. I promise you, I can do all of these things and I can do them all well. And according to Beer Gut Magazine, I do them better than anyone outside of New Zealand. So you can hire me. And as always, my rates are reasonable. You can get my performance tuning training for about 150 US dollars.

That link and that discount code also fully assembled for you down in the video description. And of course, my new T-SQL trainings. The beginner content is fully published. There is about 23 hours of it. If you’re going to pass and attending Kendra Little and I’s pre-cons, you will of course get access to all this material. It is on pre-sale still at 250 bucks. But after the summer, when I have regained consciousness, the price will go up to 500. So I would really suggest that you get in on these purchases now. They will be far less negotiable than the everything bundle. And of course, this summer, I am also traveling a bit. The nice folks at Redgate have decided to pull me kicking and screaming from my home where I do these recordings. I will be in New York City, which I guess is not too far from home. August 18th and 19th. Dallas, Texas, which is moderately far from home. September 15th and 16th. And then Utrecht in the Netherlands, which is slightly more moderately far away from home than Dallas, I guess. And of course, you know what? Now that I think about it, Utrecht and Seattle are kind of about the same flight wise.

So I will also be slightly more moderately far away from home for past data community summit in November, the 17th to 21st. With that out of the way, though, let’s talk about some batch mode stuff here. So the first query that I want to show you has kind of an annoyance in it, right? We’ve got it. We’ve oh, oh, dear. I don’t know what button that was. Did did zoom it have a problem? Are we not? Are we not doing the zoom it show here? The zoom it show works now. Great. Some reason SP who ran and that just should never happen.

We’ve got a little bit of an annoyance in this query. Part of our join clause is looking for where user ID is no or we’re giving SQL service say I really want to join on this. But if the user ID column is no on the comments table, we can we can we can leave that in. The thing is, and I can only get an estimated plan for this because when I when I’ve tried to run it and get an actual execution plan, it is horrible.

So what we do in this query plan is nothing very batch mode friendly. We scan the clustered index of the post table and the comments table. And we we sort the entire post sort the post table. I mean, not the entire table. We sort the columns that come out of the post table. But, you know, we’re selecting, you know, owner user ID, post type ID and score.

And we are ordering by post type ID ascending. And then we go into a nested loops join nested loops joins no batch mode there. Uh, lazy table spoons, no batch mode there. Uh, stream aggregates no batch mode there.

Uh, these rocking with rowstore, right? We are not getting anything batchy at all in this plan. Uh, worse is if I tried to tell SQL Server, I would, I would prefer a hash join here, my friend, uh, we will get an error back. You’ll get this, all this red text telling us that the query processor could not produce a query plan and that we should resubmit the query.

Well, okay. We could do that or we could, could, could try some stuff, right? Now, what I find interesting about this particular query format is not so much the SQL Server can’t figure out like, like how to use batch mode or how to use a parallel plan or how to use a, uh, hash join plan. What I find interesting is that if we look at the post table and the owner and the, and the look at where owner user ID is null.

And we look at the comments table to see where user ID is null, which is effectively measuring the, or rather, uh, storing the same type of data, right? It’s like whoever owned the comment or whoever owned the post, there are no nulls in the post table that we have a zero there. And we have 336,000 null user IDs in the comments table.

Very interesting. So if we change the query to look like this instead, right? If we say, uh, where owner user ID is null and, uh, you see that user ID is null.

Uh, we can at least achieve a hash join plan here. However, this query will run for a very long time. It is not a good time.

Uh, we do, uh, was this the one? No, this is not the one. So, uh, there is still no sign of batch mode, uh, in this plan until we get to the very end where we have, uh, some batch mode on this final hash match aggregate. The thing is that we do all this other work in row modes and that’s, that is not exactly what we want.

We want more batch mode happening in our plan, not just one operator. That’s, it’s not terribly helpful. So, uh, what we can do, or actually before we do that, what I want to show you is without the hash join hint, I want to show you, uh, potentially one of the most deeply offensive query plans that you might ever see in your life.

Uh, and this is all row mode. Uh, I can promise you that, uh, there is a scan of both tables. Uh, then we repartition streams again, no batch mode here.

And then SQL servers like, Oh, a merge join. Yeah, great. I have no sorted input. So I’ll just sort both of these inputs. I’m just going to sort both of these to use a merge join worse, worse, worse, worse.

This is a many to many merge join. Uh, and, uh, I, and I tried running this one too. This one ran for about 30 something minutes before I was like, you know what?

Um, I think, I think I just need to go record something at this point. Cause I’m getting tired. I’m starting to get exhausted. So we have this all row mode work.

And then of course we have our one loan hash match aggregate occurring in batch mode here. Uh, it’s not that none of the operators leading up to the merge join support batch mode. We just don’t get it.

Right. And like, like, like merge join, it doesn’t support batch mode. And again, like the parallel exchanges, like the repartition streams and the gather streams over here don’t support batch mode. But of course with batch mode on rowstore, uh, we do, we do have support for reading from tables and sorting and computing scalars or computering scalars in batch mode, but we’re just not getting it here.

SQL Server does not naturally choose a very batch mode is plan here, which is not good for, not good for batch coin. I’ll be honest with you. I’m not having a good time with that one. So, uh, what we can do in order to make this query more batch mode friendly is do some pre aggregating on our own.

One of the big downfalls of even this plan with the hash join hint is if you notice, uh, SQL server does not make any attempt, uh, prior to the join to like group data together. Like we have two columns, owner, user ID and user ID, which are lousy with duplicates.

There are so many duplicates in there. Just an incredible amount of duplicates. Right.

Like just wild with them. Um, but SQL Server does not make any attempt to do a pre aggregate. Like it just like, no, I want to, I feel like fully joining all these things together. I want, I want all the rows joined together.

This is going to be great. It’s not great. It’s not a good time. Even the hash join plan runs a very, very long time. Now what we can do is we can force SQL servers hand a little bit and we can do some pre aggregating of our own.

Uh, we still need to do some outer aggregating, but it’s okay. Cause it’s a little bit of pre aggregating that goes a long way. So for example, we can, uh, select this stuff and we can do a little bit of pre aggregation on some of our columns and we can do like a group by an owner, user ID and post type ID out here, and then we can join that to another pre aggregated result here where we group by user ID and then do our join.

And then finally out here, group by post type ID. And when we do that, now why is, why are you not properly terminated? When we do that, we get a much more favorable execution plan. I’m just going to show you the estimated plan first.

Where we do get much more batch mode. We do not get full batch mode the way that we might want, but we get much more of it. And the, like another thing that we get is of course, like using our smart brains, we, we, or at least we’re, we’re using some smart brains out there.

I mean, I don’t know. Some, sometimes I have good ideas. Uh, but when we look at this plan, right, where we do the pre aggregation SQL Server is like, oh yeah, I think I can use batch mode here.

And when I do this aggregate, oh yeah, I can, I can, I can do batch mode there. And, and, and, but I know there’s still no batch mode for the repartition streams, but, uh, SQL Server is still does some fairly batchy things leading up to, uh, all of the, uh, this work.

Now, unfortunately. Unfortunately, the hash join that we do is in row mode, but that’s not the end of the world here. We still see a really, really big performance increase, uh, and, uh, the estimated execution mode of that, this final hash join, which is now going to have like much less work to do.

Cause we pre aggregated a bunch of stuff before the join. We’re going to have, we’re going to be leaning on this thing much, much less because we’re going to have far fewer things to, to go and aggregate altogether. And, and finally, when we run this query, this one actually finishes.

Like it, it actually just completes without me having to do much. And the, like you can, you can see like, you know, sure. There’s, is there stuff we could do for this thing?

Yeah. Maybe, maybe we could add in some indexes. Maybe we could tweak indexes a little bit. Maybe we could even like add columnstore indexes. I don’t know. We could, we could do maybe stuff that might be useful here, but just like doing things that are more agreeable to batch mode, like, like doing like the early group buys and joining the results of that.

And like, you know, just things along those lines, like think like things that batch mode is into doing, which is like, you know, I want to group by this stuff and aggregate this stuff and get some sums and some counts and some other, like, you know, data warehousey analytically things.

Batch mode is just like, oh yeah, no, I get it. I’m picking up what you’re putting down. I think, I think, yeah, I think, I think I’d be useful here. Yeah. I’m going to get in on this game. So when we do this, like, like a lot more stuff, like I said, happens in batch mode.

The, the, the hash join does not happen in batch mode, but that’s, that’s okay. Because like, so, you know, going back to like reading operator times in these plans, all of the batch mode operators, like, like these, like, they’re just going to show the time underneath the operator in here for like the wall clock time for what they did.

So, you know, it’s like 416 milliseconds, 186 milliseconds. And then like, when we get to the, the repartition streams, which is in row mode, then we get like, like the total time spent added up until you get to here.

And that’s the same thing with this, this side of the branch too, since this is batch mode and this is batch mode and this is batch mode. These things are all sort of individual, all individual.

But like, like one thing I do want to repeat here is that like, like looking at operator times for a parallel exchanges is not something that you should spend a lot of time with. But the point is we don’t spend 1.5 seconds in the hash join.

It’s like 1.5 seconds minus like 500 and minus like, I don’t know, like 186 plus 416. So like, you know, there’s, there’s some stuff in there that, you know, is, is okay. Right.

It’s like, we didn’t spend 1.5 seconds here or here or here. Like we just like, it’s, it’s cute. It’s cumulative, but the whole query finishes in about just 1.6 seconds. So when you are trying to write batch mode, friendly queries, again, the, the, the closer you write your query to align with things that batch mode is good at, the better the chance you have of getting batch mode operators in your query.

So just, you know, try to think like, like a batch, right? Like think where is batch mode useful and then try to align your queries to that rather than just like, you know, writing your typical query and wondering why batch mode doesn’t show up.

You might, I don’t know, maybe you’re, you, maybe you’re like already a batch mode person like by accident. And the way that you write queries is very batch mode friendly. But, you know, like if you’re coming from like, you know, sort of like, like role mode query tuning stuff, you might be really into doing like, you know, like, like apply and things like that, where, you know, you’re like into like trying to find these like navigational seek-y strategies, maybe like getting like parallel nested loops pushed in there.

But when you’re trying to get batch mode happening in your query plans, you’re, what you really should focus on is truly thinking in terms of where batch mode excels and what batch mode is good at.

And then like writing your queries to try to conform to that. Anyway, that’s good here. I hope you enjoyed yourselves. I hope you learned something and I will see you in tomorrow’s video where I will say something that is, I don’t know.

I don’t think it should be all that, all that weird, but maybe, maybe you’ll think it’s weird. Anyway, thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.