Lost In Costs: Part 3
Video Summary
In this video, I delve into the complexities of query plans and execution costs, addressing common misconceptions about cost percentages and highlighting their limitations in accurately representing where time is spent. I explain how operator times can vary significantly between row mode and batch mode execution plans, providing practical examples to illustrate these differences. Additionally, I explore the concept of trivial plans and how SQL Server handles them, showcasing a demo that demonstrates the impact of simple parameterization on query optimization and plan caching. By breaking down these intricate aspects of SQL Server’s cost-based optimizer, I aim to provide viewers with a deeper understanding of what goes into query execution planning and help them make more informed decisions when tuning their queries.
Full Transcript
All right, Erik Darling here with Darling Data, back once again to where this is going to be the third and final video, regardless of length, in the Lost in Cost series, because I have to start something new, right? And so this is going to be the last one here. We’re going to finish it. And the next thing we’re going to do is stuff from my Row Goals presentation. You might see that hanging out right here. That’s going to be the next one that I go through. That should also be three or four videos. But anyway, what was I going to say? Very important. So the Row Goals presentation is a little embarrassing because I actually lifted a lot of the content for that, at least in the beginning, from this one. So there’s going to be sort of a double dose of Darling Data Delights when we start that one. It’s going to feel a little bit repetitive, but I promise you will be quite entertaining. If you didn’t tire of those jokes the first time, you won’t do it. And then you’ll be tired of them the second time. Anyway, when I’m talking to people about query plans, they will persistently look at query costs and percentages and things, and I will have to keep whacking them on the head and telling them to stop. And the reason for that is because costs will shift. Certain things will look free in one context, but very expensive in another context. So here’s an example. And if we run this query, and we look at the execution plan, we’ll see that SQL Server says 96% of the cost is here, 0% of the cost is here, 1% of the cost is here, and 3% of the cost is here, at least they add up to 100 this time, at least I think they do. But it’s interesting because you know, we spend 192 milliseconds here, that 96% of the time, no, this query runs for 522 milliseconds, so it can’t be 96% of that. This thing that costs 0% adds a significant number of milliseconds, so it can’t be 96% of that.
We go from 192 to 334. This thing that costs 1%, we go from 334 to 453. This thing that costs 3%, we go from 453 to 522. So these percentages don’t precisely line up with what takes time in a query plan. And then if we look at this query, which is slightly different. So in this one, I should probably point this out, is that we are partitioning by and ordering by the clustered primary key. So it’s already in order for us, we don’t need to sort that data. When we change our windowing function to use now the account ID column, which has no supporting index on it currently, oh, I should highlight the query, shouldn’t I? And we run this, well, all of a sudden, I mean, it takes a little bit longer, because we had more work to do, but that’s okay.
All right. So we scan the clustered indexes, but now this is free, right? This is 0% of the cost, but 636 milliseconds of the query execution time. This sort is 100% of the cost. And you know, I give it, it’s a good chunk of the execution time. But then this query runs, well, at the sort operator, we were at 2.1 seconds minus that. So like 1.5 seconds. And then we go to 2.182 here, but the query ends up being almost 2.4 seconds. And all these other things, thanks, Zoom, all these other things cost zero, right? They have zero costs, but time is still spent in them.
So cost percentages do not equate to exactly where the effort is spent. And the cost percentages are based on the operator costs. So you can’t follow those and get a good sense of where time was spent in a query plan. Right now, it’s worth talking about how to read operator times in different execution plans, because they are slightly different between all row mode plans and all batch mode plans, or at least mostly batch mode plans. So if we run this query, it’s going to do some stuff. And we are going to get some operator times, and we’re going to examine those.
So in a row mode plan, operator times are cumulative reading from right to left. I know it’s funny because this is my left hand. I know that because my watch is on it. But this one, I point this way because I’m a mirror image. This is left in the room, which is very strange because I’m pointing right to my body. Makes everything very strange.
But getting back to the point, since this is the first operator at the end of the query plan, we call it the caboose. The time spent here is just itself. Now, since this plan is all executing in row mode, then the 2.698 seconds here includes the 179 milliseconds here. Likewise, the 3.066 seconds here includes the 2.698 seconds here. And that will be true of all the operators going throughout the rest of the plan.
Right. So this thing didn’t execute for 3.1 seconds or 3.114. That includes the 3.112 from here. So that was like 2 milliseconds, really. And then the 3.112 to 3.100, that was really like 12 milliseconds. So it’s cumulative reading from right to left. There is a trace flag that makes row mode plans read a little bit more like batch mode plans, but we don’t have time for that now.
This query plan will execute mostly in batch mode. When I say mostly, it’s because the operator at the end here, this gather streams operator, this is a parallel exchange. There’s gather streams, distribute streams, and repartition streams that are the three parallel exchanges available in SQL Server.
And these do not support batch mode, at least currently. Who knows if maybe one day they will. Personally, I think they’re a waste of time in batch mode plans. So hopefully they never do. But batch mode plans are a little bit different.
So this gather streams did not run for 393 milliseconds. No siree, Bob. We already know that, though, because we just talked about that. The difference is that in batch mode plans, each one of these operators just times itself.
So this one really did run for 241 milliseconds. This one really did run for 147 milliseconds. This one really did run for 1 millisecond.
And this one really did run for 0 milliseconds. So all of the times that was here and here gets reflected here at the end because this is, again, row mode. Right? Whereas all the other stuff, well, actually, that’s row mode too, but that’s okay.
This is batch mode. The stuff, the individual operators after the caboose operator are all in batch mode. So even this filter is in batch mode. So batch mode operators, time themselves, row mode operators, they also include the time of their child operators.
At least that’s how SQL Server works today, unless you’re using a trace flag to make that different. So let’s get back to costing and stuff because it’s terribly interesting to me. One part of costing is the concept of a trivial plan.
Trivial plans are low cost, low memory, well below the cost threshold for parallelism. And SQL Server does not explore optimization opportunities for them. Often they are just so, like, there’s just such an obvious thing that it doesn’t go beyond any, like, just, oh, yeah, here’s a query plan, get out of my face.
So what we’re going to do is create a constraint on the user’s table. And that constraint is going to tell us, or rather, it’s going to tell the optimizer that the reputation column, the only valid values are greater than zero and less than 2 million, okay?
Once you hit 2 million reputation, they sit, like, hey, get a life. Go outside, go do something. Do some push-ups and some sit-ups.
Get out of here, right? You’ve been in the house too long. Get out. So if we look at this check constraints view, we will see that this constraint is not not trusted. Whoever named this column should be fired immediately into the sun.
Is it trusted or not? Is not trusted zero is a little hard mentally for most people to grasp, but I can assure you that it is not not trusted. It is trusted.
When we run these two queries right here, we might expect SQL Server to use the knowledge that is passed along via that constraint to maybe say, hey, there’s no data there for that because we don’t allow anything less than zero.
Everything has to be greater than zero. So if we tell SQL Server this, and let’s just run these two queries together, we’re going to get back, of course, this query returns zero rows up here, and this query returns 1,090,000 rows.
If we look at the two query plans together, something kind of funny happens, right? This query scans the entire user’s table, takes 212 milliseconds and finds nothing. This query does the same thing but finds 1,000,000 somethings.
So if we look at the properties of this, right, we will see that we got a trivial plan, and we’ll also see that this query has been simple parameterized. And because we got a trivial plan and because it was eligible for simple parameterization, SQL Server is going to cache this plan and maybe reuse it for other queries that do the exact same thing.
And other queries that do the exact same thing might have different literal values that we substituted a parameter for. So guess what?
SQL Server can’t cache a plan specific to zero, or less than zero, rather. All right, it’s because we would reuse that plan for reputation is less than two, and that would find a million rows, and that wouldn’t be safe at all.
If we add a silly little thing to the end of our query, one equals select one, all of a sudden SQL Server will go beyond the trivial plan stage of things. This will still return a count of zero, but it returns a count of zero in a much different way.
We just have a constant scan here. We didn’t touch the table at all. We didn’t do any I.O. to find zero rows. Also, our query was not simple parameterized, right? And now this query down here is like, hey, maybe a missing index would be…
Maybe if you added an index, that’d be great, right? We could figure some stuff out if we had an index, right? This one up here with the trivial plan, no missing index, right? Look at that execution plan, zero missing indexes, right?
Do you want that? No, I don’t want that. But trivial plans, the trivial plan giveth and the trivial plan taketh away. Anyway, let’s move right along here.
So let’s look at some other costing decisions that SQL Server might make. So this is a very fickle demo. So I need to run this once and actually look at the execution.
Okay, this turned out fine. Great. You know, it’s fun to do these fickle demos live and in person. I guess they’re good for a joke. But something like even choosing an aggregate, like choosing how to aggregate data, is a costing decision.
If we’re saying select the top 4,277 distinct reputations from this table, SQL Server can use a hash match flow distinct because it is less than or equal to the number of rows that it thinks it can aggregate things down to.
So we get this hash match flow distinct operator, right? If we do, well, we have 4304 here, right? We could do 4278 for this one, I think, right? Just do this and say 4278, right?
We run this. SQL Server will choose a hash match aggregate, not a flow distinct, right? So there is a costing decision based on the number of, even the number of rows that you’re selecting and aggregating, right?
So that’s another fun costing thing. SQL Server has another choice too. SQL Server could have gone and, oh, I mean, we could just do this, right? Let’s just make this a little nice looking. But we can actually force SQL Server to use a different type of aggregate.
But it had this choice, right? It had this choice available to it. It just costed it away, right? So we could have used a stream aggregate or actually two stream aggregates to do that. But we didn’t, right?
Because we would have had to sort data because stream aggregates require sorted data. So the SQL Server has many different things that it thinks about when it’s costing even something as simple as what kind of aggregate to use. There are three different search phases, but they’re not numbered 1, 2, 3.
They are numbered 0, 1, 2, like European elevators. And these search phases are based on sort of an initial cost given to the query based on various heuristics looking at like joins and group buys and like how complicated things are.
But the three search phases are 0, which is OLTP. You get some basic decisions about stuff and you get SQL Server might choose between nested loops and hash joins. Sounds kind of like an adaptive join thing, right?
Then you have search 1. And search 1 is where SQL Server starts exploring parallel plans and different transformations and some join reordering. And then you have search 2, which is really like everything. SQL Server throws the works at search 2.
And this is where some queries might have long compilation times and stuff. But when you write a query, very boring stuff happens that I don’t talk about ever, like syntax validity, object reference existence, maybe even some security stuff.
But then some cool stuff happens, right? After all the boring stuff. Yeah.
Wait, I went to draw a line there. That was supposed to be like this. There we go. Let’s cross that out. Boring stuff, right? Thumbs down. If I could draw a thumb. I can’t even draw a thumb in real life. Never mind with a mouse. But this is the boring stuff.
Then the fun stuff happens, right? Like simplification. Like SQL Server removing unnecessary joins and things. Figuring out cardinality estimation and join ordering. And then figuring out if we’re going to do trivial plan, full optimization, stuff like that.
The optimizer will look at your query and do good things for you. Like discard unnecessary joins, collapse, duplicative expressions, push predicates as deep as it can into the query plan.
They do sometimes end up in filters, which is unfortunate. It will also do things like match expressions. So things like if your query, if the expression in your query matches an index view or a computed column or a filtered index, it will start matching those things up and saying, oh, I could use this index view.
Or I could use one of the other things that I just said out loud. Great. We can also do something called contradiction detection. So if you have something in your query that obviously can’t be true, like where something is greater than five and something is less than five, there is no such thing that is both greater than and less than five.
SQL Server will throw a constant scan at you and say, go away, you ding dong. You have made something. You have made a terrible mistake. You can view, much like you can view the transformation rules, you can view the different optimizer things that the optimizer has done.
Optimizing things? I don’t know. Whatever. So if we run this query and we look at this, we will see all sorts of things that SQL Server has done, right, at various points to look at stuff.
We’ll even see the search zero, one, and two stuff that I just talked about, right? So search zero with the tasks and the time it spent in those tasks in search one and search two. So these things are all available to you to look at and maybe use to figure out, like, what the optimizer did when it got your query and went, oh, God, not again.
Wow. Why do they keep doing this to me? But one of those things might be discarding an unnecessary join, right? Like this, right?
SQL Server’s like, we don’t need that left join for anything. Why? Because we joined the post types column to the post table on two unique, well, I mean, they’re clustered primary keys on both of these.
This join is not technically correct for these two tables, but we’re doing a left join and we are not selecting any columns from the table we are left joining to. We are only selecting from this. So SQL Server says post table, no way, right?
So I’m not bothering with that. And it’ll also do stuff like I talked about with matching expressions, right? So if I say, if I create an index on reputation with a filter that says where reputation is greater than equal to, I think, is that a million or a hundred?
That’s a hundred thousand. And then I say, SQL Server, get a count from the user’s table, where reputation is greater than one million, that number, whatever’s going on in there.
SQL Server will match to our filtered index. And we will, SQL Server will say, that’s a good idea. All right, great. One thing that’s important to know is that missing index requests are often quite junky.
And I’m going to show you an example of that here, where we’re going to run this query, which has a scalar UDF in it. And we are going to, you know, obviously the scalar UDF makes things a bit funky, but we’re selecting the top three rows from the user’s table, where reputation is greater than 100,000.
I should really learn my zeros better. And order by last access date. And if we look at the execution plan for this, SQL Server said, well, guess what? We could reduce the impact of this query by 83.684% by creating an index on the user’s table.
The thing is, this query ran for five and a half seconds, but only 175 milliseconds was spent scanning the user’s table. Would I add an index to fix this?
No, but it’s very, very misleading. Why? Because this costs 97% and this costs 0%. You might look at this query plan out in the wild and say, golly and gosh, we need that index. We’re not going to survive without it.
It’s a sink and ship, not knowing the whole time that all the problem was in the scalar UDF and this compute scalar operator, right? Very dingy times here.
So if you see missing index requests in the wild, please don’t just add them all, right? It’s not a terribly good practice. They’re not terribly smart. They’re not terribly thoughtful. There’s all sorts of stuff in here that you can read when you diligently download the demo script that I’m going to provide to you and all that other stuff.
So please just don’t do it, right? There are things to pay attention to in query plans, right? So the stuff that we care about, operator times, wait stats.
There are wait stats in query execution plans. There are session level wait stats and sys.dm exec session wait stats. Estimated versus actuals of all variety, rows and things like that.
Spills. Another good thing to pay attention to, right? Especially batch mode spills. Batch mode spills are the worst. They take forever. Batch mode sorts are the devil.
And of course, parallel thread distribution. So we’ve looked at operator times a bit so we can look at the other things a little too. So this is a good example of wait stats in a query plan, right?
So we’re going to run these two queries, right? We’re going to run them back to back or neck and neck or something like that. And they both do the same thing, right?
They both do the exact same thing. But, you know, one of them takes a little bit longer than the other. When we go look at the execution plans for these, I wish there was a way to just make this not have to be fussed with. All right.
We look at this first query. This first query takes 6.3 seconds. The second one takes 695 milliseconds. Why? Well, for this one, we can go figure it out with the wait stats, right? Like, why did this clustered index scan take 6.287 seconds and this one takes 6.94 seconds, right?
Why? These mysteries reveal them to me. Well, if we right-click on the root operator of the query plan and we expand the wait stats down here and we look, this query spent almost 39 seconds waiting on page IOLATCH SH or reading pages from disk into memory.
This query, well, its top wait was 698 milliseconds on CXSync port. So some kind of parallelism stuff. First query took a long time reading from disk.
Second query was fast reading from memory. Shocking, I know. But you can start to figure this stuff out by looking at wait stats instead of puzzling over costs and percentages and other goofy things. And then there’s this query, right?
This query right here. This query is a funny one, right? So we’re going to declare a couple local variables up here. Most obviously one called top, which is a big int, which is equal to 100 times 358.
And this one down here called dummy, which is also an integer equal to zero, at least until we do something down here. And then we’re going to tell SQL Server to get the top number, this number of rows. And then we are going to say to optimize this query for top equals one, right?
So now we’ve done something quite devilish to SQL Server. And this is all quite funny, right? What happened in here?
Well, we had a big spill, didn’t we? Look at this big spill, right? All right. Hashmatch flow distinct, our friend. Hashmatch flow distinct, it spilled on us.
All right. And we can see that this thing took 7.983 seconds. And this thing took 26.8 seconds minus 7.9 seconds. And if we hover over this, we can see that this thing did quite a bit of spilling.
All right. Spill level four spilled one threads. So that took quite a while. So pay attention to things like spills and query plans. When weight stats aren’t useful is when you have parallel, at least when weight stats that you see in the query plan are not useful, there are other useful places to get them, are when you have parallel thread distribution issues.
So let’s put this stuff into a new window, I guess, because it’s telling me to. All right. We all do that.
Oh, you can see I’m almost done here. Ah, nuts. Not thank you yet. So let’s put these into a new window. And oh, the index finished.
It’s our lucky day. Let’s turn on query plans, right? So I’m going to run this query. And then I’m going to use this session weight stats view that I’ve been talking about so much, right?
Because this is a very useful view. We’re going to run this query. It’s going to take a little bit to run. But you might infer from the name of the view that we’re selecting from parallel, but serial that we’re going to get a parallel plan.
But the parallelism is not going to be very effective or efficient at all. All right. It’s going to be terrible.
So we go look at our execution plan. We will. Oh, come on. Give me the grabby thing. Come on. Give it to me. Give it to me. Why are you being shy?
If we look at this execution plan, we’ll see that there is a serial zone in the plan over here. And then we distribute streams over here. And then we go into a nested loops join here. And we go into an index seek here.
And we go into a key look up here. Okay. All fine. All well and good. The problem is when we distribute streams, right? We use something called, what is it?
Round robin. Sounds good. You know, nice round robin. Everyone loves a round robin. We wouldn’t want a square robin. But the problem that we run into is that the rows, ah, that’s the wrong one. There it is.
All these rows end up on thread one this time, right? They could end up on a different thread another time, right? So that’s not fantastic for us. And then this pattern will continue throughout the query plan too. Oh dear.
I’ve lost all sorts of context here, right? So we come over to this side and I should have highlighted one of these not to lose it. There we go. We still have all our rows on thread one. And we still have all our, well, now we have 27,000 rows here. And now we go to here and now we go to here.
And now, but now something magical happens. After all this stuff, right? We’re like, like coming up on this one. Oh gosh, the compute scalar is ruining my life. If we look at this one and the actual number of rows here.
So all 27,000 are still on one thread here. But then we have this repartition streams and repartition streams does, I mean, almost exactly what it sounds like.
And it repartitions our rows around, right? So if we look on this side, at least if the demo gods are working with me here today, now we have a little bit of thread distribution throughout, right? Not great.
That one’s got zero, a couple with ones and some fives and a six. But at least it’s not all on one thread. Granted, it’s not that many rows at that point. But the repartition streams helps to redistribute rows on parallel threads so that we don’t end up with all of them in one place.
Getting back to my point, though, if we look at the wait stats in this execution plan, right? If we right click on this and we hit properties and we go to wait stats, we are not going to see anything terribly useful to us, right?
We’re going to say, well, I mean, what is a CX sync port? What is, I mean, we know what SOS schedule yield is, right? Query scheduling, all that.
Why are we reserve memory allocation xing? I don’t know, right? Like nothing in here gives us a good reason for why things are slow. But when you have a parallel query where parallelism is very lopsided, something that the query plans won’t show you is that you waited a long time on something called CX consumer, right?
This thing right here, CX consumer, the wait time, 138.966 milliseconds. And this is a very, very good sign that you have unbalanced parallelism in your query plans. So if you have a singular query and it’s parallel and you look at the wait stats for it, not in the query plan because Microsoft screen CX consumer out.
Why? I don’t know. Ask Salesforce. This is going to be why, right? Because this is the CX consumer. We were not properly consumed.
So you can see that there, but you can’t see that in the query plan, right? So if you see CX consumer weights looking at the wait stats for a single query from like the session wait stats view, then you know you have unbalanced parallelism and you should be right clicking around looking to figure out where the rows are, where things are unbalanced.
But now we will do the actual thank you, right? So this is where I normally would finish things out and say, hey, good job. You made it through. I’m happy for you. I’m proud of you.
You can get in touch with me here if you want to hire me or make fun of me or call me names or something. But this is the end. This is the, again, the final installment of Lost in Cost. So we’ve done it.
High five. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you most likely in an office hours video after this. All right. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.