Query Plan Patterns For Windowing Functions: Indexes And Memory
Thanks for watching!
Video Summary
In this video, I dive into the world of windowing function query plans from a unique perspective: indexing and memory management. I explore how indexing can impact these queries, particularly in batch mode, and highlight some of the downsides when using row mode for windowing functions. I share practical examples to illustrate why parallel execution plans often disappear once you start indexing for windowing functions, and discuss the implications of this change on query performance. Additionally, I delve into memory grants required by sorts within these queries, emphasizing how they can significantly impact overall plan efficiency. By sharing insights from my experience as “the most SQL Server Consultant,” I aim to provide valuable tips and tricks for optimizing your own windowing function queries.
Full Transcript
Wouldn’t you like to get away? Bum bum bum. Erik Darling here with Darling Data. Eh? Eh? Look at that. Look at that logo.
Recently voted by BeerGut Magazine to be the most SQL Server Consultant. I’m not sure if that was an incomplete thought, but I’ll take it. I think being the most anything would be probably pretty good.
Maybe the most dead would kind of suck, but I’ll take being the most SQL Server Consultant. We’re going to continue talking about windowing function query plans, this time from the perspective of indexing and memorying. Because both of those things are important to windowing function query plans.
Without them you would be in quite a bit of trouble. You would not have a good time. So we’re going to do that today.
Important stuff. Alright. Cool. If you like me to the tune of like four or five bucks a month, you can sign up for a membership here. It’s kind of a nice way to say thank you for the countless hours of entertainment and knowledge about SQL Server that I pass along.
If you are somehow short five bucks at the end of every month, I don’t know, maybe you have a gambling problem. Maybe you have a shopping problem. Maybe you just booked too many vacations.
I don’t know. If you just don’t have the five bucks, you can interact with my channel in other ways. You can like. You can subscribe.
You can comment if you’re feeling real crazy. If you’re feeling real engagey, you can do that. And I will engage back with you. Probably.
Sometimes it will just be a thumbs up because, you know, there’s not a lot to respond to. And other times I will personally write you a message. It’s not, I have no interns working for me writing these messages.
So, you have that to look forward to always. If you’ve got SQL Server problems, I’ve got SQL Server answers, questions, answers, solutions. Yeah, I guess I get a little bit of all that.
Health checks, performance tuning, performance analysis. If you’ve got something immediately wrong with your SQL Server or if your developers just need training because they keep putting worse and worse stuff into SQL Server. I may not look like it, but I’m just the guy to help.
And with recent accolades, like being the most SQL Server consultant, it’s hard to say no to that sort of level of confidence that you can have. So, these things, my rates are reasonable. If you need cheap SQL Server training, you can get 24 hours of it for the rest of your life for 75% off.
Think about all of the dumb things you have spent 150 bucks on that you’ve never used. Think about all the hobbies you’ve picked up. All the things that you pretended to got into.
The things you bought that are broken immediately and there was no return policy. They were like, nah, it’s yours. You broke it. You already bought it. You keep it.
Think about all those things. Then think about this. You can actually get better at something in your life by buying this training and using it. Once again, you do have to participate in the training for it to be effective.
It’s like medicine. If you don’t take it, it doesn’t work. So, yeah. If you are so enamored with my physical being, the specimen that I am, you can catch me live and in person.
This Friday. Not this Friday. But the Friday that will be September 6th in Dallas as part of Data Saturday Dallas. I am putting on a full day pre-con.
All about SQL Server performance stuff. Naturally. Because what else would you pay to hear me talk about? So, Monday and Tuesday, November 4th and 5th, I will be at Past Data Summit in Seattle, Washington, with the lovely and talented Kendra Little.
We will be co-hosting two days of kick-ass pre-cons. All about SQL Server performance stuff. Because again, what else would you pay to hear us talk about? It’s a short list.
I don’t really know what else I have going on for me that you would be interested in. If you let me know in the comments, I will do my best to oblige. That’s as much as I can promise.
And I have officially changed on with the show to Let’s Party. I think it’s a far better lead-in, especially to the wild and crazy material we talk about in SQL Server. Okay.
Anyway. In the last video, we talked about sort of the physical appearance differences between row mode and batch mode plans. In this one, I want to talk a little bit about, well, indexing for batch mode, for windowing function queries. And some of the downsides of indexing for batch mode, for windowing function queries.
I don’t know why. I’m so excited to talk about batch mode that it just keeps jumping out of my mouth. But really, it’s just windowing functions.
So I believe this index should already be created. Good, good, good. We’re set there. Now we need to run these two queries. Now, when you start indexing for windowing functions, something kind of disappointing is very likely to happen.
Your beautiful, magnificent parallel execution plans processing those millions and millions of rows cooperatively go away. So before, we didn’t have an index to support the windowing function. I’m going to freeze frame this about right here, because we created this index on the comments table on user ID and creation date, which matches exactly the partition by order by that we’re using in the windowing function.
If we were selecting more columns from the comments table, it is very likely that we would need to have includes for this index with those columns in them. Otherwise, SQL Server wouldn’t choose them. They would think that the effort to do lookups for all that stuff would be far too much.
And it would probably be right. And we want our indexes to be used. Otherwise, they are useless. So now that we have that index in place running these two queries, we are stuck now with these sort of annoying parallel queries.
All right. We, come on, do the grabby thing. There we are.
If we look at these two things, they’re not terribly fast. So I know I’ve said this in other videos, but we’re sitting here looking at it. So an important difference between row mode up here and batch mode down here, query plans, are the way operator times are accounted for. In row mode plans, parent operators include the time spent in the child operators.
So they sort of just add up going from right to left, starting with the index scan and going to the segment and the sequence project in the filter. Notice we no longer have to sort data for this one. We also don’t have to sort data for this one.
Sorry, my hand is getting lost in the void. Which is, you know, I mean, I guess why we created the index. So we don’t have to sort data. We don’t have to worry about memory or spilling or any of that stuff.
These slow down a bit though. In the batch mode plan, each operator accounts for its own time. So the index scan that takes 3.5 seconds, the window aggregate that takes about 100 milliseconds, and the filter that takes about 40 milliseconds.
That’s the total time for the plan. A quicker way of seeing that is to go into properties and to go to query time stats. And then that will tell you just about in total the CPU and elapsed time.
And since this is a serial plan, those numbers are equal for this one. Alright. Cool. Except it’s not cool.
And what’s doubly not cool is what I’m about to show you. We rerun these two queries. Remember the first one took about 5.5 seconds.
The second one took about 3.5 seconds. We’re going to rerun these. And of course we’re going to look at the query plans, because that’s the whole point here, isn’t it? Looking at query plans.
This one ends up being a full second slower. Weird, right? Paralympic is supposed to make stuff faster.
Why did it go a full second slower? Because once again, windowing function stuff in row mode sucks. It’s awful.
Don’t do it. Use batch mode as much as humanly possible. I beg of you. It’s a bad time in row mode.
Something kind of interesting happens with the batch mode query though. No, this wasn’t there before. For some reason, the parallel version of this plan needs to sort stuff.
Perhaps it’s a side effect of parallelism in batch mode. I’m actually not entirely sure at the moment. I’m going to have to dig into that a little bit more and get back to you.
I’m happy saying I don’t know right now. It’s just something that I’ve noticed and I haven’t had time to dig in further into. So, it’s weird, right? Not knowing everything the second you look at it.
It’s strange. Strange stuff, right? But we still have to sort user ID and creation date for that one for some reason. Perhaps it’s a function of reading from the table in batch mode that disorders things.
I’m not sure. Perhaps this is one of those crazy things like when people create columnstore indexes with a max stop one hint to get better compression and ordering and segment stuff.
Maybe, perhaps it’s related to this sort of thing. I need to find out. I need to educate myself here. I need to learn that so I can teach you.
And then someday we’ll all be smart together. Now, what I think is interesting, specifically focusing on the batch mode version, is that the reason why…
Let’s quote that out. Let’s run this. Doop-a-doop-a-doop-a-doop-a-doop-a-doop. And it ran.
And it’s still 3.5 seconds with a cost of… Oh, why’d you leave me? I was hovering. 77.3616 query bucks. If we…
Which beats my cost threshold for parallelism. My cost threshold for parallelism for this is 50. So, 77, definitely higher than 50. If we rerun this with the parallel hint back in there… Again, unsupported, undocumented, not safe for use in production, blah, blah, blah, blah, blah.
I use it in production, because it works and it makes things faster. And I know what I’m doing. I’m qualified.
I am the most SQL Server consultant. Don’t you forget it. This thing has a cost of 124, almost 125 query bucks. So, the addition of the sort in the parallel version of the plan is why SQL Server doesn’t choose the parallel plan for this.
Even though the parallel plan for this does a whole heck of a lot better time-wise, the serial version is 3.5 seconds. This is 1.1 seconds.
I will take any improvement I can. Now, again, batch mode for windowing functions is what you should be aiming for. You should not be relying on row mode windowing functions, except for maybe very small amounts of data.
So, there’s that. Realistically, though, if we’re doing this sort of stuff, we want to have a columnstore index available in order for… Did that throw an error? No, it’s running.
I was like, wait a minute. I thought I broke something. Realistically, if we’re doing this sort of work, we want a columnstore index to help our windowing functions query… windowing function queries. The reason for that, columnstore, great compression.
Minimize I.O. Native batch mode stuff. We don’t have to worry about batch mode for rowstore thinking we’re special. We get the full beauty of batch mode and columnstore together.
They’re wonderful interactive things. They interoperate quite well together. The thing is that, you know, like when we looked at the parallel batch mode, windowing function query up above, we had to sort data.
You know, columnstore indexes don’t really, you know, have data like sorted sorted the way that rowstore indexes do. So, things are a little bit different here. If we run both of these, and chug, chug, chug, chug, chug, there we go.
So, what I have here is one query running in compat level 140, where batch mode on rowstore is not available. And one query running in compat level 160, where batch mode on rowstore is available. What’s nice here is that these both get fully batch mode plans out the gate.
Of course, they both have to sort data, which isn’t awesome. But the presence of a columnstore index gives us enough batch mode goodness that we don’t have to worry about all the performance issues that come from rowstore queries. Everything in these two queries are close enough to being the same time for me that I’m not going to sit there and worry about it.
If you can, if you’re able, adding a non-clustered columnstore index to a table, in real life, you know, you’d probably want to have way more columns in your columnstore index. Because you never, you know, if I know anything about your queries, they are not reliable as to what people want to select and filter on and row number by and sort by and all that other stuff. So, you really want to have more columns in there.
So, SQL Server’s choice of using that columnstore index is far more apparent. It’s a far easier decision to make from a costing perspective. You don’t have to worry about awful lookups involving columnstore indexes.
So, it’s a good thing to avoid. So, as much as possible, you want to use, you want to pair columnstore indexes with your batch, with your window, with your window and function queries. Because that’s where you get the biggest, that’s where you get the biggest gains perf wise.
If you take a look at these two things, look at the time that it took to read from the columnstore table, versus the time that it took to read from the rowstore table, even in batch mode. Right? Far, far, way less time. Almost nothing on that. So, these two queries end up pretty good, at least as far as I can see.
I do apologize for the more frequent coughs today. It is incredibly dry in here for some weird reason. Another thing that you need to be very careful about, and this is, we talked about the indexing, now we’re going to talk about the memorying.
Another thing you want to be very careful about with windowing function queries is which columns you’re selecting. Because, you know, a big topic of stuff that we’ve talked about in this video and in the video before it is sorts. Sorts require memory.
Sorts require an amount of memory based on the number of columns going into this sort and the data types of those columns. So, the width of the rows and the length of the results. So, if we run this query, and we don’t need the legacy cardinality estimator hint for this one.
We will use it for the other one. By the way, if you are query tuning and you’re using compat level, like, what is it, 120 or higher? One of the first things you should always try, like, aside from recompile, is forcing the legacy cardinality estimator.
Because, it works, it truly works wonders on a lot of queries. So, if we look at this thing, it takes, it takes almost 16 seconds, right? That’s about as close to 16 seconds as you can get.
If you blinked, this would have been 16 seconds. This thing asks for a whopping 16 gigs of memory. Now, remember, memory grants for SQL Server can be up to 25% of your server’s max server memory setting.
Unless you use resource governor to nerf that a little bit. So, these queries with these sorts can ask for lots and lots of memory. And, of course, the reason why this sort asks for a lot of memory is if we come over here.
Come on, give me the tool tip. There we are. SQL Server thinks the estimated data size is 17 gigabytes. So, it asks for 16 gigs of memory.
It is probably not correct. It is probably incorrect. The reason for the inflated memory grant is because one of the columns, so, we’re only ordering by user ID and creation date. But, one of the columns that we have to sort is the text column in the comments table.
String columns will make memory grants huge. One way to get around this sort of stuff is to… Actually, we’re going to throw this on the pile for this one.
I like to show off this hint for this query. We’re going to structure this query a little bit different. In our comment table expression, we’re going to do all our row number stuff only selecting the ID column. And then down here, we’re going to join the comments table to itself.
And we’re going to select everything from the joined copy, C2. Notice this is dbo.comments is C2. And this is just comments.
The comment table expression is C. And if we run this, I mean, not only is this going to be faster, I think, twice as fast if I’m remembering correctly, but it’s also going to use a whole lot less memory. Because we don’t have to pass the text column through the sort.
Now, God bless Microsoft for putting operator times in query plans. Really, thank you. I appreciate it. But can you fix it?
Can you please fix it? Like, can I beg you to fix this code? Because if you look at what happens in here, and again, this is all row mode. There’s no batch mode operators in here.
None of these queries are only timing themselves. We start out at 478 milliseconds. We’re going to ignore the repartition streams because it makes things too weird. Then we go to 8.2 seconds, and then 8.7 seconds, and then 8.9 seconds, and then 9 seconds.
And then repartition streams makes things weird. 7.8 seconds. I don’t know where we lost 1.2 seconds in the shuffle.
But we did. And then we remain at 7 seconds here, and 7.8 seconds here, and 7.8 seconds here. Alright. Okie dokie.
And then somehow this ends up at 16.9 seconds. Again, parallelism stuff making things weird. If we go to the properties of this thing, and we go to query time, the elapsed time is only about 9 seconds.
The CPU time is 29 seconds. But the elapsed time is 9 seconds. Alright. So the elapsed time really closely matches about what we had here. That the rest of the query plan completely screws up.
Like it drops by a second where it shouldn’t, and then it goes up by like 8 seconds. Or 7 seconds. Something like that.
Where it shouldn’t. Microsoft, please fix it. We want to be able to tune these queries and know what things mean. This doesn’t help us one bit. But this query only asks for 2200 megs of memory.
Or 16 gigs of memory. Again, because the sort operator here, we don’t have to manage as much. Right?
We don’t have, we’re still ordering by the same two columns, user ID and creation date. But we only have to deal with selecting the ID column. So that asks for far less memory. And the size of this data is far smaller.
Alright? So 538 megs is the estimated data size. There is some additional memory requirement, of course, because we have a hash join back to this thing. If we really wanted to experiment, and this truly is an experiment, because I didn’t try this one before.
If we put a loop join hint on this thing… I don’t know, maybe we’ll ask for less memory. Because the hash join was asking for something. I don’t know.
But, but, but, but, but… Oh, the time is about the same. Oh my goodness! Look! It’s right this time!
The parallelism gathers streams did not screw us up! Great! We can actually see how long that took. Oh, but our sort spilled! Boo! What happened? How much memory did we ask for?
A little bit less. So we went from 2200 to 1800 megs. Um, oh, I don’t know. The loop join asked for a little bit less memory. Kind of a mixed bag there. Maybe it needed that other memory so it wouldn’t spill.
Either way, I’m kinda happy to have shown you that. It was a fun little diversion for me. So anyway. You can see the little hint on the bottom there. That we’re gonna talk about aggregate windowing functions next.
So, uh, we’re gonna stop this video here. Cause we’re over 20 minutes! Good lord! What do we do with ourselves in these 20 minutes? What do we even talk about?
I don’t know. Stop knowing. Anyway. In this video, we looked at indexing for windowing functions. How indexing for windowing functions can lead to serial plan disappointment. Uh, and continued, continuing the effort to get you to only use batch mode for windowing functions.
And of course, uh, be careful which columns you’re selecting because the columns you select directly affect the amount of memory that a sort can ask for, or that a sort will ask for, uh, which is based on max server memory.
So, if you, splitting, sometimes splitting queries up into what you need to order and what you need to show people is a good way to, uh, tune memory grants. Um, we also learned, or continued to push, that the Legacy Cardinality Estimator is usually the best one.
It’s my favorite anyway. I don’t know. Maybe I’m sentimental. Nostalgic old man, but that Legacy Cardinality Estimator gets a lot of things right. Anyway, uh, thank you for watching.
Hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will talk about aggregate windowing functions. Big excitement, big party, big party energy. We’re going to do that as soon as this one is uploaded.
And I have had some water because, uh, I’m, I’m a little raspy. Anyway, thank you for watching. Thank you for watching. Thank you.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.