Steakenly
Video Summary
In this video, I delve into how parallelism can lead to execution plan issues that are challenging to diagnose, specifically focusing on repartitioned streams operators. I walk through a query that runs for about 42 seconds and analyze the execution plan, highlighting the significant time spent in these operators despite no data spilling. By examining the distribution of rows across threads, I illustrate how skew can cause certain parallelism exchanges to become bottlenecks, even when not all operators show this issue. Additionally, I explain why cached plans often hide crucial details about such issues and discuss the limitations of tools like SP Blitz Cache in detecting these problems post-execution.
Full Transcript
Yip, yip, yip, yip, yip, yip, yip, yip, yip. As my dear friend Hamer says, yip, yip, yip, yip, yip, yip, yip, yip, yip, yip. I’d like to finish out talking about where parallelism can cause plan issues and where it is difficult to figure out why an execution plan might be solved. I’m looking at it by talking about repartitioned streams. Now, we looked in the last video where several exchange operators spilled and things were rather dire. In this one, we’re going to look at where things don’t spill, but perhaps we have some reasons why things might have gone poorly. Now, again, this query runs for about 42 seconds, so that’s a pretty good chunk of change there that I don’t want you to sit through, so I ran the query ahead of time.
And when we look at the plan, let’s see, go over here, we have index seek into the votes table, and then we have this repartitioned streams operator. And the first thing that I want you to notice is that we spend just about, let’s see, again, these are row mode plans, so the operator times are cumulative, reading from right to left. So 19.6 minus 1.6 is, let’s see, 18.6, 18. So we spent just about, well, I mean, like, what, 17 and a half seconds in this repartitioned streams operator?
That’s not a good sign. This kind of happens again, too. If we look up a little bit further in the plan, we’ll have this hash match operator, which is about 24 and a half seconds, and then this repartitioned streams operator, which is about 33 and a half seconds. So another good chunk of time, almost 10 seconds spent in there. Now, in this case, I do believe it’s because there is quite a bit of skew in the parallelism, which I’ll show you in a minute.
But I’ve definitely seen cases where this happens when there isn’t as profound skew. I’ve definitely seen cases where it happens when there is no skew. In this case, this demo just happens to work out really well where there’s skew and there’s slow parallelism exchanges. So if you look at what happened here, all 3.37332131, ooh, that’s 37 million rows ended up on a single thread.
That is not a good time, apparently. Not much got repartitioned here. If we look at this operator, if we look at the index seek, things started off kind of okay. Like, kind of okay, right? A lot of 1.2s. Thread 3 was an outlier. And then when we went to rebalance the streams, we ended up in a bad spot.
And this sort of happens again. I should probably keep this in focus here. This repartitioned streams actually does its job. Look how nice and even that is. And then this hash match is nice and even, too. But then this repartitioned streams is back to Sucksville.
And that’s no good at all. And that’s all going into this thing here, which only ends up with one row. Just a bit absurd. And I’m not exactly sure why this gather streams takes another seven seconds. I’m not that handy with a debugger. But this is where a lot of the clogging happens in the execution plan.
Now, one could make all sorts of reasonable efforts to tune this query plan or to tune this query. One may see things one may want to do. But we’re not going to do that here because we are not tuning queries here. We are just talking about how the cache plans can hide things from you.
Now, this is one thing where SP Blitz Cache can’t help. There is no way to detect parallel skew after the fact. And there is no way to get per operator run times from a cache plan to give you a warning about what went wrong when we ran this.
But what I can show you is that… Excuse you. Excuse you again. What I can show you is that we lose all of that interesting information in the cached plan.
Right? We don’t see how long this ran for. It’s like, oh, it cost 9%. What? No big deal. This cost 24%. We should do something about that. And then, like, you know, we’ll see stuff here.
But none of those… None of the parallelism operators where there was really a big holdup in the execution plan are showing… Are showing why they were…
Or showing that they were the slow points. We do see that there were spills in this. But the spills weren’t the problem. Right? The spills were pretty small. And we just didn’t…
You know, we just didn’t… Spill so much that, like, you know, fixing the spill would be the big fix for this. Right? And this doesn’t even have the big problem that the query plan with the exchange bills… Or rather, the metadata about the query plan with the exchange bills had.
In this case, we have total CPU at about 110 seconds. The duration at about 40 seconds. So we don’t even have, like, that, like, you know, forensic helper of, like, you know, CPU being even or lower than duration in a parallel plan to look at. So that can be rather misleading as well.
So, again, you know, stuff to keep in mind when you’re looking at cached plans is, you know, that a lot of stuff’s going to be missing. Looking at the metadata can help sometimes. Other times it’s a mystery.
You know, there are… Like, a lot of times you will definitely need to see an actual execution plan in order to make any sort of, like, reasonable guess at what part of the query or query plan to focus on. Anyway, that’s it for this one.
The next video is going to start a whole new topic. We’re going to start fresh. Who knows what clothing I’ll take off for the next video. Anyway.
Thanks for watching and see you over there. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.