Some Notes On Query Tuning With Batch Mode In SQL Server
Thanks for watching!
Video Summary
In this video, I delve into some interesting aspects of query tuning in SQL Server, particularly focusing on batch mode execution and parallelism. Starting off with a humorous anecdote about re-recording due to a weekend brain fog, I share the excitement of being voted “the classiest SQL Server consulting shop ever” by BeerGut Magazine—a moment that adds a light-hearted touch to what can sometimes be complex technical discussions. The video then explores how batch mode execution works and its limitations, especially in achieving full rowstore parallelism. Through detailed examples, I demonstrate techniques like using temporary objects and query hints to optimize performance, highlighting the nuances of SQL Server’s heuristics and the importance of understanding these mechanisms for effective query tuning.
Full Transcript
Erik Darling here with Darling Data, re-recording the video that I uploaded Friday where it sounded like my throat was full of weird bugs. So we’re going to try this again and hopefully I remember to say all the right stuff because it’s been a weekend and after a full weekend everything that was in my brain the week before disappears. So we’re celebrating a very very special day here at Darling Data. We were voted by BeerGut Magazine to be the classiest SQL Server consulting shop ever. We’ve got some drivers apparently Intel is telling me about. Thanks Intel. Get right on that. And yeah, we’re going to talk today about, I don’t know, some batch mode stuff, some parallelism stuff, some parallelism stuff. You know, some general query tuning stuff. Nothing, there’s nothing overly specific about this video. It’s kind of a funny thing about it. It’s just, it’s just the stuff that I find interesting and you’re just going to have to deal with that today. You might find value in, you know, a few different places here, but there is no overriding theme or point to this video aside from the fact that you can see here.
I find it interesting. So let’s look at a query. Let’s do that. All right. We’re going to make sure we’re doing all the right stuff there. We have query plans turned on and now let’s run this. And what we’re going to do is in about eight seconds, we are going to see the most offensive query plan ever. Highly offensive. I might have to blur it out. That’s how bad it is. Because Tipper Gore is going to be all over this channel when you see this thing. Look at this. Behold. Behold the terror, the tragedy that is this query plan. Look at this. Look at this thing.
So we have two, it’s so awful. We have a parallel merge join, which of course should not exist. Right? That’s just ugly. And for some reason, SQL Server decided that it wanted to sort a 53 million row table to make that parallel merge join happen. And no, just no SQL Server. You’re bad. You’re awful. Why? Why? I’m downloading Postgres as we speak. Postgres would never do this to me. Actually, I’m lying. It probably would.
But any database engine can forsake you like this. None of them are, you know, none of them are beyond this level of offense. But holy cow. It’s a wild one, isn’t it? It’s a real wild one. Now, whenever I’m tuning store procedures or queries and I see stuff like this, one of the first things that I always try to do is just see, like, what will happen if I use batch mode. Because oftentimes, batch mode will help us avoid calamitous query plans like that. And, you know, for queries that do a lot of CPU work, have a lot of CPU effort, batch mode can be very useful.
Because we work on batches of rows. We don’t work on a single row at a time. Sorry, something is itching my forehead and I don’t know what it is. It’s a bit distracting. Anyway, so what I usually, you know, first try to do, and, you know, if I’m using a SQL Server version that has in-memory tempDB turned on, where I can’t use a temp table with a columnstore index because I’ll get an error, which you don’t have to worry about in managed instance because it’s, like, apparently 15-year-old software and that option isn’t available in it.
Don’t use managed instance. But this is one of the first things that I’ll try here because this will, you know, give me at least some hope of maybe getting a batch mode plan that’ll be helpful. Right? And so what I’ll usually do is something like this, where I’ll, you know, create a temp table.
There will be a clustered columnstore index on it. And then I’ll do sort of a funky, tricky, little lefty join-y thing like that just to, you know, let the optimizer get a little width of a columnstore index so that it’ll be a little bit more lenient about using some batch mode operators.
Now, one thing that I’ve been sort of lazy about showing in demos when I do this is I usually give the column in whatever, you know, either temp table, columnstore helper object, or if I can’t use a temporary object because of in-memory tempDB, and I have to create, like, a dbo.t or dbo.more descriptive name, like columnstore helper table, I’ll always give this column kind of a weird name because if we were just, if we were being real lazy, and let’s say we were doing select star here, or we were doing, like, we had a CTE, we were doing, like, select into, we would get potentially duplicate column name errors.
Or if, like, we were working on just a real shamble of a query where no one was aliasing stuff, we could get, like, weird, ambiguous column name errors. If, like, this column was just, like, called ID and we were selecting just, like, ID up here, right?
Or we were joining to just ID down here without an alias. So it just, it kind of helps you to avoid weird stuff if you give that column a weird name. So there’s that.
Again, another weird little point along the way. Something for you to marinate on later. So let’s see what we get now because we’ve got this all set up. At least I think I created this temp table.
We’re going to find out. If this throws an error, it’s fine. If it doesn’t, then it would have thrown an error when I ran this query. So either way, you were going to see some, you were probably bound to see some red text if I didn’t do that, right? If I didn’t preemptively do that.
And the problem we run into here is that this trick doesn’t work here. SQL Server still chooses this godawful plan where, again, we have a parallel merge join, which is just grotesque beyond compare.
And SQL Server has once again decided to sort 53 million rows to get us, to allow for this, right? And that is, this is just no good. All of this, bad.
If I were a better artist, I would draw like a thumbs down thing here. But it’s terribly difficult to draw with Zoomit. And, you know, I think this is bordering on obscene. So I’m going to stop that.
But anyway, so this is one of those kind of funny things that happens is we’ve actually costed ourselves out of a better execution plan. So let’s run these two back to back, right? We’re going to have our eight-second monstrosity.
And then we’re going to have a slightly faster plan below it. And the reason we’re going to get a slightly better plan below it is because we have told SQL Server, please, no more any kind of join other than hash join.
And when you force SQL Server to use the hash join plan, it is also far more likely to engage batch mode because batch mode and hash joins, well, they go together like Forrest and Jenny, right? They make quirky little babies.
And we’re going to, okay, so now let’s look at the query plans. All right. So we have our same monstrosity up here. Il monstro, we hate you.
You are awful. If I saw you in real life, I would tell whoever was running SQL Server to get a refund. And down here, we have a slightly better version of the query plan where we have these two operators, which I’ll show you in a minute.
But these two operators, we got our hash join and we have a sort here. But the important thing here is that we didn’t get a parallel merge join. And the reason we were getting parallel merge joins before is because costing is stupid and broken.
And if we look at the estimated subtree cost of this query at $3,945 query bucks, and we look at the estimated cost of this query at $4,512.88 query bucks, we can at least understand why SQL Server chose this awful monstrosity of a plan.
It thought that the cost would be much lower. But it wasn’t. Well, I mean, it was.
I mean, the estimated cost was much lower. But it was still wrong. So, like, one thing that I’ve said in a billion videos at this point is, you know, you can have queries with a very high estimated cost that are fast, and you can have queries with a very low estimated cost that are slow.
So, cost really has nothing to do with how fast or slow a query is. Right? Cost is just a bunch of estimates that the optimizer uses to assemble a query plan and hope it’s the best one.
The problem is that all those estimated costs are a lot like lottery numbers. If you knew what the actual numbers were going to be ahead of time, the results would have been much better. Like, I think that if SQL Server understood what it would take to run this query, the cost would not have been so prohibitive that it chooses this awful monstrosity up here.
So, that’s that. Now, the thing that I want to show you in this bottom query plan, actually, is that we have a sort that runs in batch mode.
And, like I said, we have the hash join that runs in batch mode. But that’s it. Right? This clustered index scan on the post table, I believe. Yeah, post.
And this clustered index scan on the go away SQL prompt on the votes table, these both run in row mode, which isn’t how it has to be. Now, if you’re…
Okay, so here’s, like, the list of, like, things that have to align perfectly for you to get batch mode on rowstore. You have to be on SQL Server 2019 and up.
You have to be in compat level 2019 or better. And you have to be on Enterprise Edition or an equivalent SKU, which could be, you know, developer edition, which I wouldn’t blame you if you wanted to use that.
Because that’s how Microsoft screws you into buying Enterprise Edition in production. When, like, you’re developing stuff on developer edition and you’re like, wow, all this stuff works great and is fast.
And look at this nice thing that SQL Server does. And then you, like, release it on standard edition and everything kind of falls apart. And you’re like, wait a minute, what happened? That’s what happened.
So, you know, I sort of sympathize with people who are like, oh, well, we’re just going to use developer edition. You know? Who am I to argue?
Screw job like that? Get your revenge. So, anyway. So, like, it doesn’t have to be that bad. So, what you can do on newer versions of SQL Server is, let’s say you, like, that turned out pretty all right, right?
Where you have 3.6 seconds, like, down from 8 point, I mean, at worst, this thing was like 9 point something seconds. So, we got it down to 3.6 seconds just by getting a couple things in batch mode and using a hash join instead of a merge join and not sorting 53 million rows and spilling and, God almighty, what’s wrong with you?
But the thing that’s interesting to me is that batch mode tricks do not get you the full batch mode on rowstore experience that you could get, right?
So, if we, what you can do is you can use these compatibility level query hints to get you higher compat levels than what your database is set to, right? So, this is, like, a good way to sort of override some things and be like, hey, I got a better idea.
So, let’s run this query now. And, unfortunately, we still need the hash join hint there because if we don’t put that there, SQL Server will still cost itself into that crap hound of a merge join plan.
So, batch mode on rowstore has problems because it is all based on heuristics. And those heuristics might kick in, they might not. And even with me doing my batch mode trick there in the higher compat level, I still don’t get batch mode where I should.
I still get costed into that awful merge join plan. So, now if I look at this execution plan, this one’s a full second faster. And where it gets faster is on this clustered index scan on the votes table.
For some reason on the post table, SQL Server, oh, come on, give me the tooltip. I don’t need a missing index in. So, for some reason, SQL Server on the post table is like, no, we’re going to scan this in row mode still. That’s whatever.
And down here, come on, go away. Zoom it, catch up. Do try to keep up, zoom it. This actually does get scanned in batch mode. So, this ends up faster. But there’s something interesting going on in this query plan.
And then, so if I go and look at the properties here, this is something that we have looked at before, you and I, all of us lovely folkses together.
And this is something kind of interesting, right? Another, again, something interesting along the way. Hmm. Neat stuff.
So, we have eight threads that cooperatively scanned the post table, right? All these numbers, I promise you, will add up to this number. But only one thread, oops, I hit the wrong button there. Only one thread produced any rows.
Produced 47,575 rows. So, only thread two. This lucky devil right here did any real work. All right? Well, it produced any real work. All right?
So, you know, this thing goes into the hash join. And then, I think down here, it’s sorted out pretty well. I kind of forget, though. Oh, so only, yeah. So, down here, two threads end up doing some work. So, this is, again, some sort of a bit of lopsided parallelism.
Unfortunately, batch mode stuff does not fix this sort of parallel skew issues that can come up. Now, one thing that you can do to get around that is you can use a temporary object, right? Because sometimes, if you can’t, you can’t always rewrite a query to get it to do exactly what you want and fix every single problem.
So, sometimes using a temporary object is the right choice. So, let’s dump our post table stuff into a temp table. And notice, this is a whole heck of a lot faster now, right?
Remember, there was like 1.7-something seconds. Now, that’s only 500 milliseconds. So, we save almost a second and a half just by doing this. And what’s nice about this now is that rather than just ending up with all our rows on one single thread, we actually spread things out quite a bit.
So, we don’t have that same parallel skew problem just dumping that into a temp table. So, we’re doing well here, right? I think we’re doing better, right?
We’ve made this a little bit faster. But now, watch this, right? This is, you’re going to get a kick out of this. I know you’re going to get a kick out of this because you watch my videos. And if I get a kick out of something, you’ll probably get a kick out of it too.
Now, watch what happens here. Now, we get a batchy, modey plan, right? I think, yeah, batch mode, batch mode. Okay, so that turned out pretty good.
You know, our left join to the little temp table over there did us some good. And now, we’re working up this temp table. But watch what happens down here now, right? This one’s okay, right?
This one’s kind of okay. This number sticks out a little bit further than the rest. But now, look at, watch, ready for this? Ready for the filter? Actually, let me select this so this stays.
You ready for this filter operator? Bam! We just shifted the parallel skew problem over here. All right, that’s interesting.
Okay. So this thing, all in all, this takes 1.8 seconds. 1.889 seconds in full right there. All right.
All right. So now, batch mode, you know, for all the great good that it does to queries and SQL Server, sometimes it is, sometimes it doesn’t do as much as it should when you’re only using the batch mode tricks. Right?
Right? And what, the reason, I mean, I don’t, is there a reason why? I suppose if there’s a reason why, it’s because, you know, we are using the older compat level where batch mode on rowstore isn’t available. Right?
So if we were in compat level 160 or better, like, we would get, like, a better batch mode experience, but only if batch mode on rowstore, like, chooses us. Like, just using a batch mode trick doesn’t necessarily guarantee you batch mode on rowstore. Now, let’s race the batch mode version of this query versus the row mode version of this query.
Right? Because, like, the only two operators in here that get batch mode are these two. You know, this, all this, all the rest of this stuff still happens in row mode.
And, of course, the parallel exchange operators, they don’t, they don’t have any batch mode support. So their row mode is hell anyway. Now, look at down here, though.
That’s 1.9 seconds now. This one is just one second. Our fully row mode hash join plan is only one second. Pretty wild, isn’t it?
Now, it’s not because one of them gets a bitmap and the other one doesn’t. This filter operator is an optimized bitmap. There we are. Optimus bitmap-ius. And so we have a bitmap down here.
Remember, in these query plans, bitmaps, they get made up here, but then they get used down here. Right? So they go this way and get used to filter out rows here, usually. Sometimes they’ll get stuck at the partition stream.
Sometimes they’ll get stuck somewhere else. But most of the time, when you have a nice, effective bitmap, they’ll get pushed down from here to there. Thereabouts. You know, you can figure it out when you look at your query plan.
But this is a lot faster. And we don’t have the sort of same weird skew issue. At least I don’t remember having the same weird skew issue on this one. This one spreads all out.
So this one, the parallelism works better in the row mode version of this plan. That might not always be the case, but it’s certainly the case here. Now, if we run this query and we ask for this to happen in compat level 160, we should get a fully, a more batch mode-y version of this plan, where now we get actual batch mode on rowstore. So now we’re actually able to read once again from the votes table in batch mode.
And this gets us down to about under a second. So if we were to recap this a little bit, the query with just batch mode is about 1.8 to 2 seconds. The rowstore version of that query is 1 second.
But the fully batch mode-ed on row stored version of that query is about 800 milliseconds. So that turns out a lot better. Now, one of the reasons why Enterprise Edition is so important for batch mode workloads is because Microsoft, once again, I wouldn’t blame you if you use Developer Edition, hobbles the hell out of you in Standard Edition, where batch mode stuff is limited to a DOP of 2 in a parallel plan.
So a lot of times when I’m tuning stuff on Standard Edition, if you have a pretty big aggregation query that’s running at DOP 8, and you’re like, oh, we can try batch mode, you know, you’ll be pretty lucky if you end up with it running for like an equivalent amount of time because you lose like six DOPs if you’re running at DOP 8. So there’s a pretty big chance of a downgrade.
If you’re only running at like DOP 4 and you get downgraded at DOP 2, maybe less. But if you’re running at DOP 8 or higher, the DOP downgrade on Standard Edition with batch mode can be pretty painful. That’s why when like tuning Standard Edition stuff, it has to do big aggregations.
I’m far more likely to lean towards like indexed views and stuff like that because you don’t have those DOP limitations on indexed views. Granted, you have an indexed view, so you have a different set of problems. But, you know, an indexed view that has good indexes to help it, you know, get maintained is, you know, in the words of my wine distributor in New Zealand, it is no more overhead than another nonclustered index.
So, you know, hopefully that works out. Now, there are all sorts of things. I mean, say there are all sorts of things. There are like literally two things you can do to trick SQL Server into using batch mode.
One of them is what we did up above where we had a temp table with a clustered columnstore index on it. And we did a left join on 1 equals 0 to that temp table. And SQL Server smelled a columnstore index.
And it was like, ah, maybe batch mode. Another thing you can do if you’re not free to rewrite queries in that way is to create an index like this, non-clustered columnstore index that’s filtered.
And if you filter it to something that cannot possibly exist, then this index will not actually contain any data, but it will exist on the table as a columnstore object and give SQL Server some, you know, some impetus to use batchy, modey stuff.
You know, and if you’re, you know, tuning sort of regular things, you know, supporting indexes, you know, they’re good things. They’re there for a reason.
You know, they can help you with your joins and they can help you with your filtering and other stuff like that that are useful. And sometimes creating indexes might be, like, you know, the final answer, but the batch mode stuff is usually pretty useful as well.
So anyway, just a few notes on, you know, using batch mode, testing different stuff out when you’re trying to tune kind of bigger, crazier queries and query plans. Granted, this one wasn’t big and crazy, but that parallel merge join plan, man, that just set me off.
And I had to do the rest of this stuff because I feel like if I don’t talk about how awful parallel merge joins are, no one will. They’re truly despicable creatures.
I wish that they would be removed from query optimization, just to be honest with you there. So anyway, a little bit of a hodgepodge there, but again, stuff that interests me. Hopefully it interests you.
If it doesn’t, you don’t need to watch this. You can go do something else with 20 minutes of your life. It’s okay. Either way, my feelings won’t be hurt. So anyway, I hope you enjoyed yourselves.
I hope you did find at least some parts of this interesting. I hope you learned something, maybe a few things, because this was, again, a potpourri of knowledge. Yeah, what was I going to say?
Anyway, if you like this video, I like thumbs ups are good, as well as, you know, comments that don’t make me cry. At least, I mean, tears of joy are fine, but like tears of sadness. We don’t want sorrow, Eric, on these videos.
If you like this sort of SQL Server content, you can hit the subscribe button. It’s apparently a bell-shaped object. And you can join nearly 3,762 other data darlings.
Never going to be able to say that with a straight face. Who have subscribed to this channel and apparently find me interesting enough to keep getting notified when I’ve talked about SQL Server. So anyway, thank you for watching.
I do apologize for the audio issue on Friday’s video. There was no fixing it. Like, once it’s done, it’s done. So I just had to re-record the whole thing. I had lots of funny jokes about Justin Timberlake getting arrested in a tuxedo.
And you should have been there. It was great. Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.