Query Plan Patterns For Windowing Functions: Row Mode vs Batch Mode
Thanks for watching!
Video Summary
In this video, I delve into the intricacies of query execution plans when window functions are involved, breaking it down into three parts to ensure that the content is digestible and comprehensive. Starting off by discussing the differences between row mode and batch mode for these queries, I highlight how critical it is to use batch mode for optimal performance, especially with large datasets. I also explore the impact of having or not having a `PARTITION BY` clause in your windowing functions, noting that the absence of this clause can significantly alter the query plan’s parallelism and execution efficiency. Throughout the video, I share practical insights on how to recognize these patterns and optimize queries for better performance, emphasizing the importance of proper indexing and predicate usage.
Full Transcript
Erik Darling here with Darling Data. Feeling real happy at the moment. Extra happy. Probably the happiest I’ve been in, I don’t know, 37, 38 years? No particular reason. Feeling kind of peppy at the moment. And so this is going to be split up into three parts, not six parts, just the last part. three. One, three or the other. Only three, not double three. Because the material’s a little too dense for a single video. I don’t think anyone would stick around for the whole thing. So we’re breaking this one up into three parts. This is going to be talking about query execution plans when window functions are involved. Because it struck me when I was recording something a while back. And I was talking about how I’ve, over time, learned how to predict what will be in a query by looking at the query plan. And I think someone commented that that would be a cool thing to talk about. So I’m going to talk a little bit about query plan patterns for different queries that I can sort of recognize. There’s going to be a bunch of those videos, but there’s three on windowing functions. There’s just about enough to talk about with windowing functions that I think it is worthwhile. There are essentially three different kinds of windowing functions. There’s like the ranking ones, so like row number, rank, dense rank. There’s aggregates, like sum and count and average. And then there’s the analytical ones, like cum dist and whatever discrete something. I can’t keep track of all those things. I don’t think I’ve ever really used them. I guess Entile would be in there. Entile is the only weird one that I’ve used pretty regularly. I dig Entile. I think Entile’s a cool cat. If you don’t use Entile, I don’t know, you should. Team Entile over here with Darling Data. Anyway, what was I going to say? Oh yeah, some some crap that apparently is useful because since I started saying this stuff at the beginning of the videos, things have been picking up. So good job, everyone, for paying attention early on. If you like this content and you feel like it is worth like four bucks a month, you can sign up for a membership to my channel. That’d be cool. If you don’t have the extra four bucks a month, if that’s going into your Happy Meal fund, Lord knows inflation has really messed with the prices of Happy Meals. Maybe that’s why I’m happy. I started thinking about Happy Meals and that brought me back to my youth. Never can tell. Other ways to show your love, support, and enduring allegiance to Darling Data. Humble leader. Likes, comments, subscribes, all nice things that you can do.
If you are in a real pickle and you need SQL Server consulting, I am available to do just about anything that doesn’t involve replication. These are the things that I excel at. I don’t use Excel much, but I do excel at these things. And what do you call it? If you need anything outside of this sort of stuff, well, my rates are reasonable. Anyway, training. It’s another thing. Good to have. Good to buy and actually use. Good to purchase training and then actually go through the training.
Not just say, I’ve got the training. Because it does not through osmosis. Like Edgar Cayce, when he would say he could put a book on his stomach and absorb the knowledge. I’m not quite sure that SQL Server training works in the same way. You do have to participate in order for it to be effective. All right. Good thing to do there. If you are so smitten with me that you want to come see me do things live and in person, where you will see me in something probably other than an Adidas shirt, you can catch me in these places on these dates.
Friday, September the 6th, I will be in Dallas, Texas, doing a full day pre-con for Data Saturday Dallas. And then also on the 7th at the regular event. And then November the 4th and the 5th, I will be in Pass Summit for two days of pre-cons with Kendra Little as my charming co-host.
And that’s not co-host in like Johnny Carson, Ed McMahon sort of way. We are on equal footing. This is not like, you know, there’s like the show leader and then like the charming sidekick.
We are equal partners in these pre-cons. So I look forward to seeing you there with maybe some more ChatGPT created images. I don’t know. That seems like fun.
Anyway, let’s talk about these windowing function query plans. So what I want to go through in this one is sort of two things. One, the difference between row mode and batch mode for window function query plans.
And the other thing is difference in query plans when you have and when you do not have a partition by clause. Because that does change things a bit. Also, the type of predicate that you have on your windowing function, whether it’s equality, less than, greater than, can also change the query plan a little.
So, with all that in mind, gosh, let’s get started. All right, let’s move it on this thing. So I’ve got these two queries already run.
And what I want to show you here is without a good supportive index, both of these queries have to do a bit of work to generate the windowing function stuff. The thing is, the row mode query just plain stinks at it. When you have to do this over a lot of data, when you do a windowing function over a lot of data, you just absolutely should not be doing anything in row mode with it.
So, this query takes 9.3 seconds. This query takes 1.4 seconds. Big difference there. And it’s all because of batch mode.
Now, where things that you should know about window function query plans is that in row mode, you will always see these two operators, the segment and the sequence project. Segment is responsible for the partitioning and the ordering.
And then the sequence project is responsible for the numbering. So you put the data in the order that it needs to be in. If you don’t have an index that already has the data sorted the way it needs to be for the windowing function, then you have that sort there, right where my finger ends, right there.
That puts the data in order. Then the segment sort of does the grouping and then the sequence project sort of does the numbering. At least that’s how it was explained to me.
If someone lied to me, I will find them. Remember Boku Delta Miles over here. Diamond medallion darling data. So we have a sort that puts the data in the order that we need it to.
We have this thing that sort of does the grouping, you know, put stuff, you know, maintains the order. And then the sequence project that does the numbering. So, all well and good.
This is what you will generally see in windowing function query plans in row mode. In batch mode, it will look like this. We still have a sort, but notice that sort doesn’t quite, you know, have the problems that the other sort did. And then we will see a window aggregate.
I’ve said this in other videos, but one of the very cool things about window aggregates in SQL Server query plans is they are the only operator currently that is able to read from a batch mode sort on multiple threads. All the rows come out of the clustered index on the comments table, go into the sort on multiple threads.
The sort occurs on multiple threads, but any other operator would have to read from that sort single threaded. So, kind of a cool thing there about window aggregate operators. So, this is kind of the general physical appearance difference of query plans between row mode up top and batch mode at the bottom.
Batch mode gets the window aggregate. And then row mode has the segment and the sequence project. Because we don’t have an index in place, both of these have to sort.
So, let’s move on a little bit and let’s look at what queries look like when we don’t have a partition by clause. Notice that we are just ordering in this one and this one and this one down here. I’m going to run all three of these at once so we can compare and contrast these wacky query plans.
These wacky dacky doos. One of those things is a little slower than the others, wasn’t it? Now, the big difference between partition by windowing functions and ones that don’t have the partition by clause, they only have the order by clause, is that SQL Server eats up the parallel zone a lot earlier on in the plan.
The query plans that we looked at before were parallel the entire way through to the very end where there was a gather streams. In this one, we gather streams really early on. Both of these things gather streams right after the, well, the row mode.
Both of these things, here I go. Being a knucklehead. The row mode plan gather streams immediately after the sort and does the segment and the sequence project in a serial zone.
The batch mode plan keeps the parallelness through right after the window aggregate. And, you know, the batch mode thing is a little bit faster here anyway, but whatever. This one’s a little bit different because, you know, we really just have to, we don’t have to do as much work as we do with the partition by, especially in row mode.
Curiously enough, emitting the partition by in row mode actually gets it pretty close to on par with batch mode. But, you know, most window function queries that you’re going to use, they’re probably going to have to partition by something. So definitely use batch mode on that.
Now, one thing that I wanted to show you that sort of emphasizes the serial zone in the plan is when you use a greater than filter on a windowing function like we’re doing here. This one is uniquely designed to continue to return zero rows. We have a, what we saw in the other plans where we have, sorry, my head’s in the way.
I’m going to like Mario, boop, the clustered index scan. Then we sort, then we gather streams, then we segment, then we sequence project. And then after that, we reopen a parallel zone.
We have a distribute streams operator here that re-puts things out on parallel threads and then re-filters them. You’ll notice that this thing really slows down. This is not a good time.
This greater than predicate, whatever happens in here, it is no good, right? We are not having fun. We do a lot more work in this query. So, you know, be aware of, you know, how you do that.
Of course, if you use batch mode like any sane normal person would, you don’t have those same problems. Again, row mode query plans for windowing functions are nothing but headaches. I, you know, I will use any trick in the book to get batch mode for windowing function queries because in row mode, they just eat it so often that it’s mind-boggling.
I don’t know how we ever got away with windowing function queries without batch mode. It is truly a dismal experience a lot of the time. So, again, this very fast, even with batch mode in place.
And we don’t have the same sort of weirdness with the, you know, the serial, the parallel zone ends and the segment and sequence reject and then the parallel zone reopens. We just have a parallel plan the whole way through, which turns out a lot better for us from a performance perspective. So, in this video, we have gone over some just kind of physical appearance differences between row mode and batch mode window function plans.
Again, in the row mode plans, you will see the segment and the sequence project with an optional sort if you don’t have an index that supports the partition by order by or just order by clause. We looked at queries that only have the order by and saw that they, you know, while they did have all the same operators, the parallel zone and the only order by clause windowing functions ended a lot earlier. And we also saw that SQL Server has to do some where work when you have a greater than predicate in your windowing function filter.
Think of it like this. If you just have a where row number equals one or where row number is like, you know, equals zero or something, it’s real easy for SQL Server to figure out like where that row number would be, right? It’s going to be at the start of every, you know, either partitioned chunk or at the very start of the results for a query or for the row number, right?
Sorry. For a query without partition by only has order by, there’s only one chance for that to be the equality predicate. When you do greater than, SQL Server has to spend a lot more time putting rows through and generating those row numbers to compare them because it has to go like basically to the very end of the result to do it.
So that’s why there’s just a lot more work involved. Anyway. Hope you enjoyed yourselves as usual.
I hope you learned something as usual also. Thank you for watching. And I will see you in the next video where we are going to talk about, if I sneak down a little bit here, we’re going to talk about indexes and index usage and memory usage in windowing function query plans, which is just going to be a thrill ride.
Edgier seat. Hold on. Better hope you have your wood screws in type stuff.
So. All right. I’m going to go. I’m going to go prep for that. We’ll upload this. It’ll be a good time. All right.
Cool. Great. Good job, everyone. We barely stumbled over words we’ve been saying for all of our lives. Phenomenal on that.
All right. Great. Thank you for watching. Thank you.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.