Plan Cache Liars Archives | Page 2 of 3

Signs Your SQL Server Execution Plan Is Lying To You: Repartition Streams

Posted on February 7, 2020January 20, 2026 by Erik Darling

Steakenly

Video Summary

In this video, I delve into how parallelism can lead to execution plan issues that are challenging to diagnose, specifically focusing on repartitioned streams operators. I walk through a query that runs for about 42 seconds and analyze the execution plan, highlighting the significant time spent in these operators despite no data spilling. By examining the distribution of rows across threads, I illustrate how skew can cause certain parallelism exchanges to become bottlenecks, even when not all operators show this issue. Additionally, I explain why cached plans often hide crucial details about such issues and discuss the limitations of tools like SP Blitz Cache in detecting these problems post-execution.

Full Transcript

Yip, yip, yip, yip, yip, yip, yip, yip, yip. As my dear friend Hamer says, yip, yip, yip, yip, yip, yip, yip, yip, yip, yip. I’d like to finish out talking about where parallelism can cause plan issues and where it is difficult to figure out why an execution plan might be solved. I’m looking at it by talking about repartitioned streams. Now, we looked in the last video where several exchange operators spilled and things were rather dire. In this one, we’re going to look at where things don’t spill, but perhaps we have some reasons why things might have gone poorly. Now, again, this query runs for about 42 seconds, so that’s a pretty good chunk of change there that I don’t want you to sit through, so I ran the query ahead of time.

And when we look at the plan, let’s see, go over here, we have index seek into the votes table, and then we have this repartitioned streams operator. And the first thing that I want you to notice is that we spend just about, let’s see, again, these are row mode plans, so the operator times are cumulative, reading from right to left. So 19.6 minus 1.6 is, let’s see, 18.6, 18. So we spent just about, well, I mean, like, what, 17 and a half seconds in this repartitioned streams operator?

That’s not a good sign. This kind of happens again, too. If we look up a little bit further in the plan, we’ll have this hash match operator, which is about 24 and a half seconds, and then this repartitioned streams operator, which is about 33 and a half seconds. So another good chunk of time, almost 10 seconds spent in there. Now, in this case, I do believe it’s because there is quite a bit of skew in the parallelism, which I’ll show you in a minute.

But I’ve definitely seen cases where this happens when there isn’t as profound skew. I’ve definitely seen cases where it happens when there is no skew. In this case, this demo just happens to work out really well where there’s skew and there’s slow parallelism exchanges. So if you look at what happened here, all 3.37332131, ooh, that’s 37 million rows ended up on a single thread.

That is not a good time, apparently. Not much got repartitioned here. If we look at this operator, if we look at the index seek, things started off kind of okay. Like, kind of okay, right? A lot of 1.2s. Thread 3 was an outlier. And then when we went to rebalance the streams, we ended up in a bad spot.

And this sort of happens again. I should probably keep this in focus here. This repartitioned streams actually does its job. Look how nice and even that is. And then this hash match is nice and even, too. But then this repartitioned streams is back to Sucksville.

And that’s no good at all. And that’s all going into this thing here, which only ends up with one row. Just a bit absurd. And I’m not exactly sure why this gather streams takes another seven seconds. I’m not that handy with a debugger. But this is where a lot of the clogging happens in the execution plan.

Now, one could make all sorts of reasonable efforts to tune this query plan or to tune this query. One may see things one may want to do. But we’re not going to do that here because we are not tuning queries here. We are just talking about how the cache plans can hide things from you.

Now, this is one thing where SP Blitz Cache can’t help. There is no way to detect parallel skew after the fact. And there is no way to get per operator run times from a cache plan to give you a warning about what went wrong when we ran this.

But what I can show you is that… Excuse you. Excuse you again. What I can show you is that we lose all of that interesting information in the cached plan.

Right? We don’t see how long this ran for. It’s like, oh, it cost 9%. What? No big deal. This cost 24%. We should do something about that. And then, like, you know, we’ll see stuff here.

But none of those… None of the parallelism operators where there was really a big holdup in the execution plan are showing… Are showing why they were…

Or showing that they were the slow points. We do see that there were spills in this. But the spills weren’t the problem. Right? The spills were pretty small. And we just didn’t…

You know, we just didn’t… Spill so much that, like, you know, fixing the spill would be the big fix for this. Right? And this doesn’t even have the big problem that the query plan with the exchange bills… Or rather, the metadata about the query plan with the exchange bills had.

In this case, we have total CPU at about 110 seconds. The duration at about 40 seconds. So we don’t even have, like, that, like, you know, forensic helper of, like, you know, CPU being even or lower than duration in a parallel plan to look at. So that can be rather misleading as well.

So, again, you know, stuff to keep in mind when you’re looking at cached plans is, you know, that a lot of stuff’s going to be missing. Looking at the metadata can help sometimes. Other times it’s a mystery.

You know, there are… Like, a lot of times you will definitely need to see an actual execution plan in order to make any sort of, like, reasonable guess at what part of the query or query plan to focus on. Anyway, that’s it for this one.

The next video is going to start a whole new topic. We’re going to start fresh. Who knows what clothing I’ll take off for the next video. Anyway.

Thanks for watching and see you over there. Goodbye.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Signs Your SQL Server Execution Plan Is Lying To You: Exchange Spills

Posted on February 6, 2020May 16, 2022 by Erik Darling

Since They Wanna Know

Going Further

Signs Your SQL Server Execution Plan Is Lying To You: Table Spools

Posted on February 5, 2020May 16, 2022 by Erik Darling

Ashing On The Table

Going Further

Signs Your SQL Server Execution Plan Is Lying To You: Index Spools

Posted on February 4, 2020May 16, 2022 by Erik Darling

Spool Spool Spoolio

Going Further

Signs Your SQL Server Execution Plan Is Lying To You: Triggers

Posted on February 3, 2020May 16, 2022 by Erik Darling

Doored

Going Further

Signs Your SQL Server Execution Plan Is Lying To You: Blocking

Posted on January 31, 2020May 16, 2022 by Erik Darling

Nopetimism

Going Further

Signs Your SQL Server Execution Plan Is Lying To You: Row Goals

Posted on January 30, 2020January 20, 2026 by Erik Darling

Bring The Goalies In

Video Summary

In this video, I delve into the mysterious world of row goals in SQL Server, highlighting their impact on query plans and execution. Row goals can significantly alter join strategies and data retrieval methods, leading to seemingly inefficient queries that return zero rows but run for extended periods. By examining a specific example where a query took 15 seconds to return no results due to an overly optimistic row goal, I illustrate how these settings can mislead the plan cache and cause confusion. I also discuss how to identify such issues using tools like sp_BlitzCache in expert mode, pointing out the ‘low cost, high CPU’ warning that often accompanies these scenarios.

Full Transcript

Sweet. We’re live. I’m here with the third installment of Plan Cache Liars, things that can look very confusing and confounding in the Plan Cache. And this is the final installment of things that have a low cost but high CPU. Now, there are other things that can cause this. Chief among them is, you know, parameter sniffing or parameter sensitivity. I’m gonna cover that. I’m gonna cover a few different variations on that in future videos. This is like the non-parameter sniffing stuff. This is just stuff that’s straight up weird. Alright, and I think probably one of the weirdest things in SQL Server is our, our, our, probably, uh, row goals. Because they can affect query plans in very strange ways. Uh, you know, SQL Server will internally say, look, I think, uh, I, you know, I don’t want to read every row in the table. I think I can, I think if I set a, a smaller internal row goal, I can, I can meet, I can meet, I can meet my quota. I can meet like all the, the, the, the demands of this query in a, in a much quicker and easier way than if I like, you know, uh, process every single row and do all this other stuff. So, uh, in doing so, it reduces the cost of the query and, um, chooses a total amount of data.

Totally different strategy for doing things. Um, so like you’ll see a lot of like, you know, little nested loops joins and key lookups and stuff where like, you know, without the row goals, you might see like hash joins and merge joins. Now, row goals aren’t always bad. They’re there for very good reasons. And sometimes they can be very, very helpful to queries. Other times they can be less so. So let’s look at a case where they are less. So I’m going to execute this query. And then we are going to look at a query plan. Surprise! Nice! Shocking, I know. No one would have seen this coming. It’s the most bizarre thing. Boy, this is running for a long time. How long does this query run for? 15 seconds to return zero rows. Why? Row goals! Yay! We figured it out. We cracked the case, you and me. Now, first thing I want you to see. These lines are thick. Thick, thick lines. Much thicker lines than one would typically want to see going into a nested loops join. That is a very unscientific thing to say. But, it’s one of those things that you might, you might look at and say, hmm, that’s a lot of work for a nested loops join.

Now, if we, for newer versions of SQL Server, which hopefully you’re on by now. Gosh! It’s enough already. Newer versions of SQL Server will, not in the tooltip, but in the properties window, the extended properties. If you hit F4, right click and go to properties. You will see this little doodad up here.

Estimated rows without row goal. 3, 7, 6, 7, 5, 6, 0. Estimated number of rows. And this is an estimated number of rows with the row goal. 2.84807. So, SQL Server is like, I think I can do this in about three rows. I don’t know what a .84807th of a row is. Don’t ask me. Well, let’s just call that three. Because, I don’t, I don’t, if I start to bend this finger down, strange things will happen.

If I wanted to represent, like, .84807, I would have to do strange things. Like, bite part of my finger off. But, SQL Server is like, I think, you know, the estimated rows without the row goal. Look, that’s a big number. Estimated rows with the row goal. It’s a much smaller number.

Or, the thing is that, let’s see if I can, I can frame this well. This number and this number are pretty close to each other. SQL Server did not save itself any work with this row goal.

SQL Server did a bad job of, of, of finding, of, well, actually, maybe it did a great job of finding data that wasn’t there or something. I don’t know. But, the point is that we had this row goal introduced.

And, because the row goal introduced a different join strategy, a different strategy for finding data than we might otherwise see. Then, we have this little nested loopy plan where SQL Server took 3.75, 375, 375, 3.7 million rows from the votes table. And, for each one of those rows, stuck it through a nested loops join and went to find data over here.

And, then, and then, obviously, didn’t find data. We found zero of four rows over here, which is interesting. but the point is that uh with because we had that row goal we had that nested loops join where sql server thought three rows would do the job here it clearly did not it took 3.75 million and then we had to do 3.75 million uh things over here which is unfortunate to say the least now this is of course not about actual execution plans this is of course about uh non this is about how the plan cache can lie to you and how you can use sp blitz cache to figure things out now once again we’ll see a very low cost query here that low cost is that cost reduction or rather the cost reduction comes from the row goal being introduced over in the warnings column we will see two familiar warnings low cost high cpu and row goals uh again newer versions of sql server where this is available in the uh plant in the plan xml blitz cache will find this for you under expert mode um there are a bunch of checks that don’t get run in non-expert mode just because for most people um there’s enough information with the the like round one checks to go on and uh you know just some of the all the xml plan parsing stuff sucks so like if we can avoid it unless people ask for like additional stuff and i i like to do that now the query plan for this is going to look like nothing happened my goodness how could this have possibly taken 14 seconds or whatever how could this be well bro goals now if we go over here and we look at uh some stuff we will see that indeed we had a whole bunch of cpu and a whole bunch of duration right bad bad bad news and this was uh if we go over a little bit further uh we can see that we’ll take all that time to bring no rose back thanks thanks that’s very helpful anyway um now i don’t want you to think that row goals are all bad i don’t want you to think that they’re horrible things that you should avoid at all that you know you should go stomp out and fight at all costs and also something i forgot to mention earlier is nothing in this uh series is going to teach you how to fix things this is just to show you how how plans might lie to you if you want to learn how to fix things well that’s a whole different set of videos uh but anyway uh mr paul white mr paul white who does his black magic uh has written a series of four part series on row goals uh over here at seagullperformance.com and i would highly suggest uh reading every single one of those in excruciating detail and then leaving a lot of comment questions for paul because paul loves comments adores them anyway uh i am eric darling and uh the next video is a total surprise to everyone i’ll see you there

Going Further

Signs Your SQL Server Execution Plan Is Lying To You: Table Variable Cardinality Estimation

Posted on January 29, 2020May 16, 2022 by Erik Darling

Terrible Tables

Going Further

Signs Your SQL Server Execution Plan Is Lying To You: Scalar Valued Functions

Posted on January 28, 2020January 20, 2026 by Erik Darling

No Values

Video Summary

In this video, I delve into scalar valued functions and their impact on query plans, particularly focusing on how they can mislead us when analyzing execution plans. I demonstrate using SQL Server 2017 to show a scenario where a scalar valued function causes significant CPU usage despite having a low cost in the overall plan. By leveraging SPBlitzCache, we explore the detailed behavior of these functions and uncover surprising insights about their execution, such as how they can run multiple times per row returned by the query. This video is part of my ongoing series on hidden issues in the plan cache that can trip up even experienced SQL Server professionals.

Full Transcript

Hello, Erik Darling here to continue or start really this is to be the first this is the first one where there’s like actual stuff going on in my series about things that lie to you in the plan cache. And we’re gonna start with everyone’s favorite bad actor, the scalar valued function. Now, I know what you can… Apparently someone is opening a crypt outside. So, I know that you’re gonna be sitting there saying to yourself, this is fixed in SQL Server 2019. No need to worry about this anymore, but that’s not true.

There are some interesting and surprising limitations and gutches around how SQL, like what kind of, what kind of function SQL Server can inline, the extent to which they can be inlined, and some other things. This particular function is the kind that cannot be inlined. There are, I don’t know, of course, other ways that you can do this sort of thing. There’s a built-in function that can do string aggregations, but it’s a little incomplete for some people.

There are some, you know, interesting side use cases where you can’t use the built-in string ag function. So, anyway, what I’m gonna show you here is let’s turn on… Actually, I don’t have to turn on query plans. I’m smart. So, I built all of these stored procedures to have set statistics XML on inside them.

So, I don’t have to turn on query plans anymore. I don’t have to remind myself to do that and then turn them off and then blah, blah, blah, blah, blah. Anyway, let’s have fun. So, I’m going to run this query. And this should take around about 10 seconds to run, which is almost the perfect amount of time for any demo query. When they take longer than that, I have to talk and fill up a lot of dead space. And that’s where things get dangerous.

That’s when I say things that I shouldn’t. That’s when I probably break NDAs and stuff. So, let’s look at the query plan. Now, it’s gonna be a little bit more obvious what happened here with… Because this is SQL Server 2017 and I’m using SSMS 18 point, whatever latest is. Just insert latest number here.

It might not even be 18. It might be 25 by the time you watch this. Who knows? But the point is, with the query operator times here, it’s a lot easier to see exactly where SQL Server spent time. And it’s gonna be pretty easy to see from here where things have gotten…

Where things could be misleading as far as the plan cache goes. So, we spend 0.000… 0.002 seconds scanning the cluster and index of the users table. However, SQL Server assigns 99% of the cost there.

Ditto this top operator, which costs 1%. And we don’t really spend any more time there. But, then we get over to this compute scalar… And… this doesn’t go so well.

This runs for 13.5 seconds and costs 0% of anything. So, lucky us. Lucky us to have this information, to have these details. So, let’s close that out. And let’s take a look at what that looks like in the plan cache.

Now, I use SPBlitzCache for all this stuff. If you want to use it, you can go to firstresponderkit.org. That’s firstresponderkit.org.

I still contribute to this thing. I love it. I believe heavily in it. And, as I was… as I was… starting to put this presentation together, I realized that a lot of the stuff in here was kind of like an ode to SPBlitzCache. And, you know, the work that people have done on it over the years.

So, it’s a really cool, handy tool to have. And, let’s run this and look directly at the stored procedure that we just ran. Now, a lot of people don’t realize that you can pass in a stored procedure name and get results for one specific stored procedure in the cache.

But, I think it’s pretty spiffy. But, what we’ll have here is we’ll have a couple warnings. One for forced serialization.

That forced serialization warning is because we call the scalar valued function. And, we’ll also have that low cost, high CPU warning. Now, if we go over here and we see this cost of 0.76 whatever whatever, that’s a very low cost query. It’s a bit suspicious that a query with that low of a cost will run… will use 50 seconds of CPU time and run for 13 seconds total.

Crazy, right? Crazy. Now, if we look at the query plan for this and we look at what happened over here, it would be fairly difficult to tell exactly why this took 13 seconds.

A lot of people would say, Damn, this clustered index scan 99%. How dare you exist? How dare you scan that clustered index?

But, reads aren’t the problem. Using a ton of CPU is. Now, we can, if we hit the properties of this compute scalar, we can look here and we can see the call to the function, right?

We can see this, the call to the function here. And this might give us some, some clue as to what was happening, perhaps. But, it would still be quite misleading if you were just, just getting started looking at execution plans.

You would focus on this clustered index scan. You would freak out for all sorts of reasons. And you would probably not know to hit F4 or to get the properties of an operator in order to, see, get more details about it.

So, now that we know what happened with, or now that we know that we called a scalar valued function, we can use SPBlitzCache to focus in on the scalar valued function. We can see exactly what it did.

Now, something interesting here. This cost 40 query bucks, right? If we remember the cost of this store procedure, it was like 0.03 or 7 or something. It was very, very low.

The cost of the function was not baked into the cost of the query that called it. Right? Because it’s black box, black boxing of things. Now, we can see, well, this is kind of a lot of warnings over here.

None of them really pertinent to what we care about. But you can see that SPBlitzCache did indeed go crazy and look at stuff. But I want to show you something kind of interesting.

The query plan for the scalar valued function went parallel. That’s why a lot of CPU got used for the calling query. The calling query itself is forced to run serially, but the body of the scalar valued function can go parallel.

So that’s why it chewed up a ton of CPU. And there’s also the fact that for the 100 rows that we executed, remember there’s a top 100 in here. For the 100 rows that we executed that function for, the function executed once per row.

Now, this, it’s a pretty convenient shortcut to say that scalar valued functions run once per row returned by the query. That’s highly dependent on where the compute scalar is positioned in the query plan. If you have a top 100 query and the scalar valued function doesn’t execute until like the very, to the very leftmost portion of the plan, then yeah, it might only execute for 100 rows.

But if that compute scalar is somewhere else in the query plan, if it’s like deeper into the plan, where like maybe you have more rows coming out of a table and like you, you use, you, you call the function earlier in the query, then it could end up running a lot more times.

In this case, it just happens to run a hundred times because of the top, top 100. And the fact that it, you know, the only, it’s at the very beginning, beginning end of the query plan. It’s at the finale of the query plan.

So we can see that it used a lot of CPU for the, well, I mean, in total, we’ll use it about 500 milliseconds on average for the 100 calls. So you can see that it did chew up. That was, did chew up a lot of CPU and time. And that’s where we spent most of our, our awful stuff, time things doing whatever.

It’s early in the morning. It’s 934. I haven’t been up this early in weeks, school vacation and everything. I’ve been sleeping in baby. Oh, it’s back to the grind.

Anyway, I’m going to call, I’m going to call it a day on this video. And I’m going to record the next video, which in case you, your eyes have not wandered, will be about table variables, which still aren’t fixed, still aren’t fully fixed in SQL Server 2019.

Anyway, thank you for watching. I hope you learned some stuff. Hope you enjoy yourselves and I will see you over in the next video.

Going Further

Signs Your SQL Server Execution Plan Is Lying To You

Posted on January 27, 2020January 20, 2026 by Erik Darling

Polygraph

Unless you’re looking at an actual execution plan, one must never ever ever ever ever believe what one is seeing.

What one is seeing is a series of estimates that may have very little to do with reality. I don’t only mean within the execution plan; I also mean the runtime issues a query may encounter.

With all that in mind, here are some of the many ways a query plan can hide work:

Pants On Fire:

Low CPU, Long Duration: Scalar Valued Functions, Table Variables, Row Goals
Low Cost, High CPU: Blocking, Triggers
Spools: Index, Table
Parallelism: Exchange Spills, Repartition Streams
Dynamic SQL and Sub Procs
Many to Many Merge Joins
Nested Loops/Lookups High Executions
Memory Grants/Sort and Hash Spills
Operator Costs
Batch Costs
Branch usage/Startup Expressions
Cached Temp Tables

Over the next few weeks, I’m going to show you how all of these things can be dreadfully wrong. The links will come alive here over the next few weeks, but you can see the full playlist here.

Anyway.

Thanks for reading!

Video Summary

In this video, I’m Erik Darling from Erik Darling Data, and I’m excited to share my insights on performance tuning queries in SQL Server. As someone who has been diving deep into the intricacies of query optimization, I realized there are some tricky aspects that weren’t as clear when I first started. This series aims to explore these complexities, particularly focusing on how actual execution plans can differ from cached plans in the plan cache or Query Store. I’ll also delve into why queries with low costs but high CPU usage can be misleading and discuss what happens when stored procedures call dynamic SQL or other stored procedures. Additionally, we’ll examine the nuances of parallelism and merge joins, as well as some parameter-related issues that aren’t exactly about sniffing but still significantly impact performance tuning. Stay tuned for more insights as this list grows over time, and who knows—maybe Microsoft will surprise us with a solution to these challenges!

Full Transcript

Hello, Erik Darling here with Erik Darling Data, almost on the verge of celebrating my one year anniversary. I’m going to be recording a series of videos because I think this is something that I wish I understood better when I was first getting into performance tuning queries. And I think there’s some really tricky and interesting things that I’m going to be recording. stuff that happens between like differences between actual execution plans and the cache plans that you might look at in the plan cache or in query store. And the ways that the cache plans can be very, very misleading. So I plan on also having all of these demos as a talk. So I’m recording these videos sort of as supplementary material to that too, in case anyone wants to go and revisit. anything. And this is just sort of the introductory video where I’m going to talk through the stuff that I’m going to record over the course of today. And this might change over time. This list will probably get a little bit longer. Hopefully, maybe have some new stuff added to it. But just like I got drunk on a Sunday and this is what I put together. I mean, it’s what day is today? No, this is now Thursday.

So I’ve been I’ve been working on this. And I look forward to giving my liver a break. So what I’m going to talk through what can stuff that can be misleading. When queries have a very low cost, but use a lot of CPU and the things that can cause those when see when queries have low CPU but high duration spools and how those can be crappy and misleading. Weird stuff that can go on with parallel What happens to a store procedure when you use it to call dynamic SQL or other store procedures? When merge joins have the many to many flags that is true. And then some stuff sort of I don’t want to call it parameter sniffing. I just want to call it parameter related because not all of it is exactly parameter sniffy stuff. Some of it is like if you like have if logic or something like that in your query that you use to like branch stuff off or like make some decisions about what exactly what you want to call it. You’re going to do inside of a store procedure when it runs. So I’m looking forward to sharing all this stuff with you. I’m also looking forward to this list growing as I don’t know as the time as time goes on or I don’t know maybe Microsoft will magically make it so that we don’t have to worry about this stuff anymore. That would be nice too. I’m going to go make some phone calls about that. Anyway, I will see you in the next video where we will talk about. Well, we will start with getting these liars to expose themselves. Anyway, see you there.