Spotting Batch Mode Opportunities in SQL Server Query Execution Plans

Video Summary

In this video, I delve into the world of batch mode opportunities in SQL Server, exploring how to identify and leverage these opportunities for performance gains. Starting off with a discussion on my upcoming consulting services and training courses, I share that you can support this channel by signing up for a membership or asking me questions during office hours. Moving onto the technical content, I analyze execution plans from previous videos where SQL Server missed out on using batch mode, even though it was clearly beneficial. By adding a simple columnstore index to a helper table and running trace flag 7418, I demonstrate how these changes can transform row-mode operations into batch mode, significantly reducing query times. Through detailed explanations of the execution plans and performance metrics, I aim to help you spot similar opportunities in your own queries.

Full Transcript

Erik Darling here with Darling Data, and we are going to talk in this video about batch mode opportunities. It’s kind of like the clash song, except way less catchy and cool, right? Anyway, my life, what am I gonna… Started out so promising. If you would like to support this channel, you can do that. You can sign up for a membership. If you would like to ask me questions for office hours, you can do that at that link. Both of these links are down in the video description. If you would like help with your SQL Server, perhaps you need help with your batch mode opportunities. I am available for consulting. You can hire me. I’ll show up. I will wear this Adidas shirt. And I will be just as clean and kempt as I am in these videos. Not drunk, and it’ll be fun. And as always, my rates are reasonable. If you would like to buy my performance tuning training, there’s 24 hours of it, just for you. Aren’t you special? Look at you. Special little thing you are. You can get all of it for about 150 bucks for life. You go to that URL and plug in that discount code.

And this is also helpfully assembled for you down in the video description. My T-SQL course, Learn T-SQL with Eric. That’s me! Is also available. All 23 hours, just about 23 hours of the beginner content is fully published. If you are coming to pass Data Community Summit and attending Kendra Little and I’s pre-cons there, you will, of course, get access to this material as I consider it companion content to what will be going on there. The advanced stuff is being worked on currently. That’ll all start going up after the summer.

And the other thing going up after the summer is the price. It’ll go up from 250 bucks to 500 bucks. So you should buy that now while it is still 250 bucks. Speaking of the summer, gosh, how am I going to get all this done? Red Gate is taking me on a partial world tour. You know, mostly small clubs and menus.

New York City, August 18th to 19th. Dallas, Texas, September 15th to 16th. And the Hamlet of Utrecht in the Netherlands, October 1st and 2nd. And that all leads up to Past Data Community Summit where the aforementioned T-SQL pre-cons, plural, will be taking place.

But with that out of the way, let’s talk about spotting opportunities for batch mode. Now, this was the query that I ran in the last video where I said, no, that default cardinality estimator sure didn’t do so good. And if we look at the execution plan for it, we’ll come back to that.

I’ve already run it because golly and gosh, why sit through that 8.5 seconds last time? I guess, I don’t know, maybe Windows Update wasn’t doing something in the background when I ran this one. This was 300 milliseconds or so faster, so I don’t know.

We got a speed boost from something. Sure wasn’t Microsoft. But looking at this execution plan, there are things that I do not love about it. For example, all of this stuff happens in row mode, like SQL Server.

It is the year 2025. We are in database compatibility level 160. You have, I’m using developer edition, which is an enterprise edition equivalent skew of SQL Server. The batch mode on rowstore feature is there.

It should, why wouldn’t you use it here? We are scanning 53 million rows just about. Why on earth would you leave this to row mode? What is on your mind?

SQL Server. Gosh. And it does it all throughout the plan. Another way that you can tell you’re not really getting batch mode on rowstore is because we still have these repartition streams operators. Now, sometimes these can still show up in mix batch and row mode plans.

But since none of the parallel exchanges support batch mode, we know that these operators are not happening in batch mode. Neither are these compute scalars. If you squint really hard or I zoom in like a reasonable presenter should do, we will see that these occur in row mode, as does this big old whopping hash join here.

Gigantic hash join. Huge hash join. Right?

52 million rows come in from each side. And what do we do? Row mode. SQL Server. Smackity smackity smack. What is on your mind, buddy? Let’s try this again.

Let’s give SQL Server some ideas about itself. Let’s say, hey, SQL Server. What might be a good idea here? Now, this table columnstore helper is a completely empty table. I’m just going to type in a demo real quick.

dbo.columnstorehelper. Just because I want you to see the execution plan here. We return no rows from this. The execution plan shows that we have a zero row clustered columnstore object in our database.

You can do this with a temp table or whatever other kind of thing you want to slap a clustered columnstore index on. But all we’re going to do here is say left join to our columnstore helper on one equals zero. One can never equal zero.

But there is now an object with a columnstore index on it somewhere in near or around our query. And so the optimizer is going to think somewhat differently about things. So if we run this, remember that was about eight and a half seconds, right?

We’re going to just go with it there. Now, what are we down to? 2.4 seconds. Jeez, SQL Server.

I think batch mode might have been a good call here. What do you think? What do you think? How do you feel about that one, SQL Server? Should we have used batch mode? Was that a good idea?

Well, probably. So now we have, I mean, we still have a scan of the votes table on both sides because we don’t have like a where clause on the votes table that we could like, you know, like filter rows and whatnot out. But you might notice that like, you know, this takes about 800 milliseconds.

Before, this is like 1.7 seconds. This takes just about the same time, right? What’s off by 20 milliseconds here. Not that big of a deal.

But, you know, especially considering what these times were before. And all of these things are happening now in batch mode, right? We have a batch here. We have a batch here. Yeehaw.

Look at us. Good job. Even our compute scalars are happening in batch mode. And now that big giant hash join that we were doing before in row mode with 52, almost 53 million rows coming on in from each side are happening in batch mode. Even this top end sort is happening in batch mode.

Now, like I said before, the parallel exchanges do not support batch mode. Boo. Neither does the top operator.

The top operator also does not support batch mode. So the plan timing in this one looks a little funny because, you know, for batch mode operators, which is like, you know, all this stuff, the times that you see in there are just the wall clock time spent in that specific operator. So it’s like 800, like in a row, in a rowstore query, it’s cumulative going from right to left, like the child ones build it up.

I’m going to show you a way to change that in a second. So these numbers, like these numbers are all just for the individual operators.

But by the time we get over here and we get to these row mode operators, these ones add up all the times for the stuff that happened before them. So the 2.4 seconds you see here and here is not 2.4 seconds a piece. You remember this whole query finished in about 2.4 seconds.

We can validate that by going to the properties and looking at query time stats and seeing that there was about 18 seconds of CPU time and 2.4 seconds of elapsed time. So that’s one good sign that your queries could possibly do with some batch mode is when you have gigantic scans of tables, especially in parallel, and big old hash joins. But they’re happening in row mode.

It’s usually not what you want, right? Especially if you have any say over it. Batch mode really helps because the more rows get involved, you know, row mode just does exactly what it sounds like. It processes a row.

Even though we’re not like using a cursor or a loop or something like that, iteratively, like inside, like this is why query plan operators are often called iterators. Because they are iterating over rows. And SQL Server in row mode pipelines all this stuff, so it’s like one row and one row and one row.

Granted, that happens pretty quickly because, you know, the people who made SQL Server were pretty good programmers or something like that. But batch mode is much faster here because batch mode processes up to 900 rows at a time depending on the size of those rows. Sticks all those rows on a CPU register and uses something called SIMD, which is single instruction multiple data, to run CPU instructions over batches of rows at a time.

Which, which when you have many millions of rows is typically a good idea because it removes all the CPU boundness from your queries. So let’s look at another example that’s, that builds on this example. So I’m using a trace flag here, 7418.

This one came out in SQL Server 2022. And what this, what this trace flag does, which is, you know, technically undocumented and unsupported. So, you know, don’t go messing around in production with this one because who knows, right?

I can’t tell you everything that it does in effects or even if like that might cause stack dumps or assertion errors and, you know, whatever else product failures. I can’t tell you. So for, for demonstration and testing purposes only, we are going to run this query, which builds on the query that we were just running.

So what this is trying to do is add in some more information about missing IDs in the votes table. Let’s say that we wanted to summarize all of the data that we, all of the missing ranges in here. So we want to find the range, like the start range of when things go missing and the end range of when things go missing.

This is a query that will do that. We have our ID plus one. We have our min ID minus one.

And we have our not exist query here in order to find the non-matching rows with our terribly non-sargable predicate here. And then we have this final, this final predicate on our query here in order to figure out like, like, cause we don’t want to get like the last value in the table because that like, that’s not actually a missing one. That just, we just don’t need that last bit.

So I’m going to run this whole thing at once. And the two things that I want to show you are one, I mean, the query plan is the most important part. We’re dumping this into a temp table.

So it doesn’t matter much what else we’re doing with it, right? We’re not, not doing anything else terribly interesting, right? We put 5.4 million rows into a temp table, but this execution plan also happens entirely in row mode. I know it’s a little hard to see here, but you know, when, like, like if you, if you just kind of understand, the pattern of the, like the, like what, what you’re going to see in these tooltips, everything is happening in row mode.

Even once again, this gigantic hash join between two tables and all the, all the work that gets done in here. So this is, this is like very similar. Like you might, so the important thing here is like most of the operators in this query plan are not eligible for batch mode, but a small segment of them in this query plan are, in this section of the query plan are like, like granted, like any of the like data acquisition operators, like clustered index scans are absolutely eligible, but you know, things like top and stream aggregate and the, the parallel, parallel exchanges, like repartition streams and distribute streams and gather streams over here.

Aren’t nested loops joined sure. Isn’t, but I wish it was, it’d be so cool if it was, boy, I wish we had batch mode nested loops. I don’t know.

Maybe that’s just really hard to do, but this whole thing once again takes, uh, well, this one takes a little bit longer, right? This one actually, if we go over to the very end here, let’s go look at the properties and let’s go look at the query time stats.

We’ve actually been lied to a little bit. The elapsed time on this was actually almost 14 seconds, 13.7. Why that doesn’t show up here, uh, appropriately?

Well, uh, you know, like I said, the, the, the, the, um, um, wall clock time on parallel exchanges is bonkers. Bonkers. Not in a good way. Not in like, this is going to be a fun night bonkers.

Like this is like, Oh, I’m getting arrested. Cool. So like you might have a query plan where the, like a lot of the operators are not eligible for batch mode, but you might still spot like this is, this is like, like the, the last one I showed you was very simplified.

This is like that same section of the query plan, but with a bunch of stuff around it. Cause I want to teach you what to focus in on, which is like this pattern in here. Thank you tool tip for showing up on, on, on, on uninvited.

So it’s like this pattern in here, like we spend, like, if we think about like the amount of wall clock time that we spend in this plan, a lot of it is right in here. Right. Like there’s a lot going on in here.

So let’s do what we did before. Right. So I have trace flags 74 18 on. So this, this query plan is showing all of the operators is only having the wall clock time of themselves. Right.

Like even the row, even the row mode ones are only showing like the wall clock time that they consumed. So that’s like, did I say that? I think I forgot to say that about the trace flag. That’s what this trace flag does.

It makes it so when you have a query plan, all of the operators in the query plan, uh, will only show the wall clock time that they are responsible for. So it makes row mode plans act like batch mode plans, uh, in the, in that timing regard. So like all the stuff that you see in here, you know, even though it’s not happening in batch mode, just uses that timing.

So let’s, let’s put this in now. Right. So we have like just a hair under 14 seconds for this. Right.

Remember this lied to us when we looked at the query time stats wall clock time was almost 14 seconds. So let’s put this in, right? We’re going to tag in our columnstore helper friend here and we’re going to run this and we’re going to, we’re going to see what happens. I spent all that time warning you about the trace flag without ever actually explaining what the trace flag does.

So this no longer takes, uh, this no longer takes almost 14 seconds. If you look at the query time stats for this, now we are down to six and a half seconds. And the, you know, like the, the sort of annoying thing is that like, like a lot of the, like the plan that we get, it reuses a lot of the operators from the last plan.

Right. Like we still have like this whole section is still identical and this whole section is still identical, but this section in here now is all batch mode. And we can tell it’s all batch mode because the repartition streams that used to be in here are gone.

So it’s a little annoying that we were like, Hey, SQL Server batch mode would be really cool to do here. Wouldn’t it? And SQL Server was like, I gotcha.

But then like we get a, we like stream aggregate does not support batch mode, right? Like this thing, like no batch mode, this thing. I don’t know. Like it could get batch mode, right?

Like it’s, it’s possible, but this, this thing, no, it just uses row mode. But up here, this scan of the votes table uses batch mode. Right.

This scan of the votes table uses batch mode, even though the storage is rowstore. Don’t get too, don’t get confused there. The compute scalars are both in batch mode and this hash join happens in batch mode. So we were able to at least affect part of the plan with that columnstore index being in there.

We didn’t get a fully batch, full batch mode on rowstore plan because batch mode on rowstore, of course, goes much deeper into query, like the, into like the query optimizer. Optimizer than just sort of tricking SQL Server into like, Ooh, you tripped and fell and landed on some batch mode. So like, like we got at least partial batch mode in here, which improve things, but we don’t get like the full batch mode experience.

There are of course, you know, further, there are of course ways that we could change this query to probably make it a bit more batch mode friendly. But that sounds like the subject for another video. That sounds like a great video, Eric.

So we’re going to call this one here. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you over in the next video where I’m, I have not decided what, what I’m going to do next. So it’ll surprise you as much as it surprises me.

All right. Thank you for watching. Thank you.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.