Efficiently Finding Aggregate Values In SQL Server: Row Mode vs Batch Mode vs Indexes!

Efficiently Finding Aggregate Values In SQL Server: Row Mode vs Batch Mode vs Indexes!



Thanks for watching!

Video Summary

In this video, I delve into an interesting exploration of finding maximum values in SQL Server queries, showcasing multiple approaches and their performance implications. Erik Darling from Darling Data shares his insights on using cross-apply and derived joins versus the more traditional methods like CTEs, emphasizing how a well-designed supporting index can significantly enhance query performance. The video also delves into the nuances of batch mode and its impact on execution plans, providing practical examples that highlight both the benefits and potential pitfalls of different query writing strategies. Whether you’re a seasoned SQL developer or just starting out, this content offers valuable insights into optimizing your queries for better performance.

Full Transcript

Erik Darling here, with Darling Data. No longer dying. I want to say just about a full week with no antihistamines, which is wonderful, because I no longer feel like I’m dying, which is nice, right? Because, you know, you record these videos and there’s no live audience, so you have no idea if you’re dying on, say, on stage or not. It’s almost impossible to tell. Just have to wait for that one person who always comes along and downvotes my videos. There’s always one.

Just have to wait for them to pipe up someday and maybe let me know why they didn’t like the video. Maybe it’s me. Maybe they’re jealous. Maybe they’re in love with me. So there is that. But I have an exciting announcement. I have the first bit of Pass Data Summit swag. And in honor of the fact that Kendra Little and I are co-presenting two days of action-packed SQL Server performance tuning content, I’ve got these temporary tattoos made. They’re a little hard to see because they’re glossy and the lights in here are bright. And if I put them really close to the camera, it doesn’t necessarily make the situation better.

And it would also eventually make my green screen start to freak out. But they say hot SQL action. Got kind of like a cool Led Zeppelin vibe. Because that’s what all the kids are into these days. Cool Led Zeppelin vibes. That’s what I see on TikTok all the time. Kids just jocking that Robert Plant vibe thing. Anyway, in this video, I should probably mention what I’m going to talk about, right? In this video, we’re going to talk about the best way to find, well, all the queries are going to be looking for max values. But, you know, this could be, you could be looking for a man or an average or something else too.

You could do any of those things. And the reason I want to record this is because I often see queries written. Someone figured out one way to do this. There are many ways to do this. Someone figured out one way to do this and put it everywhere. And it’s not necessarily the best way to do it. There may be many different ways to write this query that are faster than the one way you found and keep repeating. So we’re going to talk about different ways to do it. We’re going to talk about both using cross-apply and derive joins. We are not going to be using CTE.

If you’re the sort of reprehensible dork who white knights CTE by saying that they make things more readable, well, you’re not going to see any here. You would find that if you did write a CTE to achieve this, you would get the exact same performance as if you wrote the derived join type query. We’re also going to look at the difference between row mode and batch mode and how indexes make things better.

Usually. Isn’t that the line on indexes? They make things better? That’s what I keep hearing anyway.

So we got that going for us. So anyway, let’s dive right in, finally. And we’re going to look at the same query written in four different ways. We’re going to look at a very simple way.

And you’re going to see these again, both with cross-apply and derived joins, because that can change things a bit. So we have a cross-apply with a max in there, correlated on owner user ID to the ID in the user table. Makes sense, right?

We’re going to try out top one over here. We’re going to try out row number in here. And remember, this is a correlated row number. Because this row number is correlated inside of the cross-apply, we do not need to partition by owner user ID.

Which is a little bit different from this query, because this one has an extra level of nestation on it, where we are selecting from this. And we are generating a whole row number in here. And then we are correlating the row number outside.

The things here. It’s exciting stuff, isn’t it? Very exciting. All sorts of different ways to skin that cat. Which is probably similar to something I…

Many different ways to feed that cat. Many different ways to pet that cat. Something like that.

I’ve never actually skinned a cat. Never actually heard of anyone skinning a cat. Or needing to skin a cat. So, I don’t know who came up with that phrase. It’s a little gross, actually.

Cats are a little gross. So anyway, let’s look at query plans. The first query, just using the simple max, you know, does pretty okay, I think. SQL Server chooses a hash join.

It does… You know, a lot of the times when you use cross-apply, SQL Server will optimize that as nested loops. But SQL Server here said, nah. We got a hash join, and the whole thing finishes in about 700 milliseconds.

Which is alright. It’s alright with me. This query benefits quite greatly from a bitmap. Remember that bitmap gets used up here and applied down here.

And that bitmap does a lot of… That bitmap goes to work. That’s why this cardinality estimate looks crappy. But because SQL Server came up with the cardinality estimates, and then used the bitmap later, and then did the stuff with the bitmap, and then the bitmap did some good work, and we streamed out a lot of rows.

So we did not get as many rows as we thought we would, passing through the bitmap here. Remember that bitmap gets applied as a predicate right there. So bitmap did some good work, and this query ended up being really fast.

Second two queries stink on ice. They stink in winter, because SQL Server did not do what it did up there. SQL Server did not say, I should just use a hash join and throw a missing index request.

SQL Server built into eager index pool twice. These each take around 27, 28 seconds. These are completely uncompetitive and useless, at least without a good index in place.

The third query, somewhere in the middle. A little bit closer to the fast one, but not quite as fast. It’s about four and a half seconds slower.

The reason why it’s four and a half seconds slower is because, you know, this is something that I caught in an, like, probably, I think it’s been in like two or three Adam Mechanic sessions from back in the day, about, you know, when you’re doing anything with computers, particularly sorting, you really want to, it’s rather, it’s preferable to do many little sorts than it is to do one big sort.

And, you know, so we’re kind of in the middle on this one, where, you know, SQL Server did probably the smart thing by choosing a hash join instead of a nested loops join, but it also did this whole thing where it sorted 17 million rows.

Now, you know, part of the problem with doing the one big sort is that you might not get enough memory. You might spill to disk. This one spilled a little bit to disk. It’s not the end of the world here, right?

It still took about a second and a half to do the whole sort, but, you know, it’s a lot slower to do one big sort than lots of little sorts. And that’s what you get when you do nested loops joins.

You get a whole bunch of little sorts. SQL Server grabs a row, puts it through this side, does the sort for just the data for that row, does that, comes back, does the next one, does the little sort.

And that would be a lot faster if we, you know, had a way to seek into this index, right? Even if we still had to sort stuff in here, it would be faster to do a bunch of little sorts than one big sort.

So now let’s look at things where there’s an index involved. All right, so we created this index and this is highly specialized index on owner user ID and score descending, not only sets us up for success with how we are correlating the post table to the user’s table, but also fully supports our windowing function where we need to partition by owner user ID and order by score descending.

So we have done ourselves a huge favor here by creating this level 1000 IQ index. So we have the same four queries.

We have the simple max. We have the top one. We have the cross supply, correlated row number cross supply, and then the uncorrelated row number cross supply. These are our four queries.

Let’s look at how these plans did. With an index in place, this one kicked butt again, right? Went from about 700 milliseconds to a little under 200 milliseconds. A very simple nested loops plan with a little top in there.

Teeny weeny little top. And then we have nearly the same query down here where SQL Server, that’s a teeny weeny little top, right? So what is kind of amusing is that the SQL Server transformed the max query in here and the top query in here to both just be a sort of a top one query, right?

Like this, they shared the same execution plan. And they’re both fast, right? They’re within a few milliseconds of each other. If I ran these a whole bunch of times, this would probably end up just about even, maybe go back and forth a little bit.

So, you know, decently, well competitive. You could write this query either way with a good supporting index. The third query down, the correlated row number, does a little bit worse.

This is, I mean, just about as good as the max query from the first one with a serial plan. But this is another kind of good example of how doing a bunch of little things is faster than doing one big thing, right?

Because this one takes 800 milliseconds. There’s about 500 milliseconds spent in here. You know, of course, you know, this is one of the things that makes me nervous about missing index requests.

SQL Server is like, we can impact by almost 77%. But, I mean, that’s not where we spent our time. The costing in this plan is kind of a joke, right?

87%. And then where we spend all our time is 10%. But a couple zero percents in there. Good. I don’t know if that’s so smart. And then here’s where the doing the one big thing sucks a whole lot more than doing a bunch of small things.

So the difference between these two plans, right? The bunch of small things plan, like we talked about, like I mentioned earlier, is nested loops. The one big thing plan uses a merge join.

And look how much slower this whole branch is doing the merge join than it is doing the nested loops join, right? We scan the whole index. We segment and sequence project to build the row number over the whole thing.

We filter stuff out later. And like, you know, the filter does the same thing in both places, just to different degrees. But then this whole thing takes just about three and a half.

Oh, that’s a very devilish number. 3.666 seconds. I did not do that on purpose. I did not have the power to do that. I wish I did.

Maybe someday I’ll work out how to attach and detach a debugger just fast enough to get query time to whatever devious numbers I want.

Not today. Not today. All right. So this is all those four queries with a good supporting index in place. If you ask me what my preference is, a good supporting index, you can write this query in a couple different ways and still get reasonably good results.

Supporting index will help just about, you know, anything that you do. So you should have those. You should make one.

The next set of queries, these are a little bit different. So we got rid of the supporting index for these. And what we’re going to talk about in these is just how batch mode affects things. Now, for the first query, and we do get batch mode in the first query, right?

We have batch mode on the hash match and we have batch mode on the join. But for the first query, it doesn’t get a whole lot faster. Right? At 602 milliseconds, it saved about 100 milliseconds from the row mode run.

The second two queries where, I don’t know, God has forsaken us yet again and we build a couple more eager index spools like this, these are totally uncompetitive. There’s no batch mode going on in here.

Where batch mode does make a difference is with the uncorrelated row number, right? So if we come down here and look, we got batch mode in a couple different places. We have it on the sort, we have it on the window aggregate, which I don’t, which is brand new for all of these things we’ve talked about and we have it on this hash join.

Now, one piece of SQL Jeopardy that you can absorb and I don’t know, you can choose to do whatever you want with it. You can, you can, you can, you can choose to take it in, chew it a little bit and spit it out or you can, you can digest it and you can, you can have this wonderful belly full of knowledge where under normal circumstances, so batch mode sorts can sort data across DOP threads, but unless they are the child aggregate, child operator of a window aggregate, they cannot, they cannot be read from using DOP threads, or they’ll be read from single threaded, but in this case, we get the window aggregate so we can read from the sort across our DOP 8 threads and that, this thing ends up being fairly efficient.

Not quite as good as the, the, the simple max in the, in the cross supply, but it’s still much better than it did with the, with the index in place, this thing was like five seconds without the index in place, this thing was also not fast, so getting this one down to one second is a pretty good win for batch mode, right?

So batch mode can make a difference with these things when, when you write queries in somewhat strange ways. All right. So now, the only thing we have left to look at now is taking cross supply out of the picture and using derived joins instead.

And we’re going to look at that with the index in place, that ends up just about the same as the other ones without the, without, so we’re going to look at these without the index.

And the two ways we’re going to look at is with and without batch mode. So one thing that’s really important to point out here before we get too far into things is that one of these queries is actually logically incorrect if we just use a derived join, and that’s this one right here.

And if you’ve been writing queries for, you know, 10, 15 minutes, you’ll probably see why. Because with derived joins, because we can only do the, the correlation outside of this, SQL Server is going to give us an incorrect result for this.

We’re just going to get the top one post ordered by score descending in here, and this gives us an incorrect result. If we come over to the results pane, that second query only returns one row, when it should be returning 600 something rows.

So this one, out of the running. We’re not even going to, we’re not even going to spend time on that one because it’s such an idiot. All right. So this one, bleh, you don’t do the right thing, you are gone, you are stricken from the record.

But the derived join actually ends up the same as the cross supply, just about, within, you know, what, 10, 11 milliseconds of each other, so not a big difference here. And then the second two queries with the, both, what you notice something here is that what happened to the uncorrelated row number in the first query, and the first, like, example that I showed you happens to the correlated row number in the second one now.

These both end up with very close execution plans. Right? And again, the problem here is that we are doing the one big scan and sort.

Right? So again, in row mode, this is fairly painful. Right? So we didn’t get the nested loops, we had to do the big sort for both of these, and they both end up with just about the same amount of time.

There’s apparently about a 300 millisecond difference, I don’t know why. Maybe, maybe, maybe I was staring at the computer funny when they ran. But using the derived join, nothing ends up like remarkably better or, I mean, aside from the query that returns wrong results, there’s like the lack of eager index pools, which I guess is a blessing.

You know, there’s, you know, this was the, one of the queries that would have eager index pool, but this one is just, you know, gone. You know, we don’t need you. But the one that didn’t eager index pool ends up the same as the uncorrelated row number one.

So, I don’t, I don’t think that’s necessarily a great arrangement. Right? Because we’re doing the one, again, we’re doing the one big sort. We want to avoid doing the one big sort. And then finally, the same derived join scheme, but this one where, where batch mode is going to be in play, we’re going to get pretty decent results, I think.

Right? Again, the top query with just the simple max does the best at about 600 milliseconds. The second query which returns wrong results, we don’t care about you.

Again, that returns just the one row and just that one row is not the correct result. We need about 600 rows from all these. And then the second two queries that do benefit from batch mode end up just about identical and just about one second and 40, well, 30 to 43 milliseconds, which again, is not enough of a timing difference for me to care much about.

So anyway, some key points and takeaways from this video. One, supporting indexes really do make a huge difference when you’re trying to write these kinds of queries.

Row number is very frequently not the best choice. Row number can be much more competitive when you have batch mode in place.

And if you don’t have a good index in place, you really want to avoid the top one thing and the correlated row number thing because you will most likely end up with a nasty eager index spool on the inner side of your nested loops join and you will suffer tremendously both personally and professionally for the remainder of your days on this planet.

And the other thing is that when you need to do this kind of stuff, batch mode makes a huge difference and the more you can gear your queries towards doing lots of small sorts versus one big sort or even generating lots of small row numbers even if you don’t need to sort, even if you’ve been a whopping genius and you’ve created the right index, generating that row number over a lot of rows is a lot slower than generating a bunch of row numbers over smaller sets of rows.

So if you find yourself dealing with slow queries that look like this, as much as you can, either try to get batch mode involved or try to get a good index and try to get lots of little sorts and lots of little row numbers involved because generating a row number both with a sort and without a sort over, you know, in this case 17 million rows is incredibly painful but, you know, getting batch mode for this made it a whole lot less painful.

Cool. All right. So, with that, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. If you like this video, you can give it a thumbs up, leave a nice comment.

If you’re feeling extra charitable, I’ll accept the one person who always gives me a down vote and never leaves a comment. You’re still welcome here.

Just wish you were, I wish you were a nicer person. That’s all. If you enjoy this sort of SQL Server performance and student content, you can subscribe to my channel and you can join.

Hold on. Let me get the most up-to-date number here. You can join nearly 3,708 other lucky subscribers in getting a bonk on the head every time I publish one of these.

And if you need help with this sort of thing, my rates are reasonable. So, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.