A Little About Parallel Exchange Spills In SQL Server

A Little About Parallel Exchange Spills In SQL Server



Thanks for watching!

Video Summary

In this video, I delve into exchange spills in SQL Server, specifically focusing on parallel exchange operator spills. After addressing a question from one of my viewers about how to fix these spills, I explain that the solution depends heavily on understanding why they occur—whether due to poor cardinality estimates, overly complex queries, or missing indexes. I then analyze a query plan with all three types of parallel exchange operators spilling data, highlighting the issues and potential fixes such as using hash joins or setting the query to run at a lower degree of parallelism (dop1). The video also covers weight stats, emphasizing their importance in diagnosing performance issues related to spills.

Full Transcript

Erik Darling here with Darling Data. Another day in paradise. Today’s video we’re going to round out. I’m going to roundhouse kick. If I had the office space, I would roundhouse kick, but I don’t have the office space for roundhouse kicks. I would knock over my camera and then we wouldn’t get to talk to each other anymore. It’d be sad. It’d be so sad. We’re going to round out our series of videos. I’m going to round out our series of videos on spills in SQL Server by talking about exchange spills or parallel exchange operator spills if you want. If you want to say, add a few extra words on that. You can see the query plan over here for it. Now I did get a question on another spills video. It was actually a fair question. It was, how do you fix spills? Well, the answer to that is a little bit longer than just… …than it would seem, isn’t it? So, I mean, first, you know, you want to make sure that you are getting the actual execution plan and that you are making sure that you…the spill that is occurring in the plan is running for an amount of time where if you fix it, people will say, wow, that’s much faster. Thank you. You’re the best.

So those are the first two things. And then how you fix it, of course, does depend on why it’s happening, right? There are all sorts of reasons why spills might happen. Probably most common, you might have just a really poor cardinality estimate. And that can happen because your query is very complicated or because you have done something that intentionally stifles SQL Server’s ability to make a good cardinality estimate. Local variables, table variables, things like that. You know, you could have an overly complex query because you stacked all those super readable CTE together and screwed the whole joint up.

You also might have a very obviously missing index somewhere. I don’t know. But, you know, how you fix them really does depend on, like, what got you into that situation. You know, it’s like fixing anything else. You want to make sure that you understand how you got there and what needs fixing and why it needs fixing. It’s pretty important stuff, right? Otherwise, I don’t know. You’re layering everything with duct tape and hoping that it stays together, much like my life.

Anyway, we’re going to look at this parallel spill plan. Now, I want you to just, you know, take a moment to ponder the majesty of this magnificent pagan beast. Look at this thing. This is the trifecta, right?

Because I have spills on all three types of parallel exchange. There is a gather streams that spills right there. There is a repartition streams that spills right there.

And there is a distribute streams that spills right there. That is all of the available parallel exchange operators in SQL Server. And they have all spilled for me because I’m pretty good at writing demos.

Now, if you have been watching my videos for any length of time, you will have heard me say how much I hate parallel merge joins. And this is why I hate parallel merge joins. All right. They are awful when it comes to this stuff.

And the reason why they’re awful is because parallel merge joins, or rather merge joins in general, much like stream aggregates, expect ordered input. Input has to be sorted so that everything can be done in a nice orderly fashion and flow right through all the merging and the blah, blah, blah, blah, blah. And so what you end up with is not necessarily – there’s no sort in this query plan.

There’s like no explicit sorting, right? Nothing in this query plan says sort or sort distinct or anything like that. But there are things that preserve order in this query plan that are a little tricky to spot.

So not this one, but if you look at this parallel gather streams operator, it has an order by at the bottom of it. And this order by means that it is preserving the order of the user – the ID column in the user’s table as it passes through here. Preserving order across parallel threads is what leads to things getting all gummed up and slow.

So even if this query weren’t spilling out to disk or the exchange buffers weren’t spilling out to disk, I would still be pretty concerned that – like when I see this stuff, especially if it has to deal with a lot of rows. Because you’re dealing with a lot of rows and if there’s any skew or if there’s any – just anything weird about the way rows are arranged across threads, that – those intra-thread dependencies, keeping those – keeping all those rows in order across exchanges and buffers and threads and all that stuff gets really, really nasty.

So that affects this one, right? This one only has – this one does not have an order by, but it does have a partition column right here. Now, this is the first one that spills.

And you can see operator used tempdb to spill data during execution with spill level 0 and one spilled thread. What is spill level 0? Weird, right?

What is one spilled thread? I don’t know. Which thread spilled? Couldn’t tell you. It’s all a mystery to me. But then these two other – these two other parallel operators, these have much bigger problems. So this one only runs for about 15 seconds because you’ve got 17 there.

Well, 14 and a half. 2.5 there, 17 there. So about 14 and a half seconds in this one. These ones have it a lot worse. This one – well, let’s see.

Why did you change colors? No, you don’t change colors on me. This one is really – I mean, assuming that all of these times are honest and that nothing is screwy about the operator timing code and SQL Server, and we all know what it is, then this one only runs about 8 seconds, right?

Because we’ve got 124 here and 132 there. This one here is really the problem, right? That’s like a full minute and 12 seconds.

Most of the effort is between these two. This one only gets it a little bit, but we still spill on this one right there because I’m good at writing demos. Something.

So this one does have an order by, right? This is the especially long-running one. Oops. I did not frame that correctly. We have an order by at the bottom here doing the same thing, keeping the users.id column in order. And then we have our merge join here.

And then we have our gather streams here, which also has an order by. And so I do want to recount some of the spill level stuff. This is spill level 0 and 6 spilled threads, so 6 out of 8.

And this one is spill level 0 and 8 spilled threads, right? So all of these order-preserving operators and order-requiring operators like these merge joins, these are all why I hate parallel merge joins because it makes queries very susceptible to issues like this.

All right? So the sort of unfortunate thing is that when you run across this stuff, there are certainly ways to fix it. And they range from just adding a trivial hint like option hash join.

So hash joins, they don’t require anything to be in order. And they’re sort of like the unordered equivalent of a merge join in that they support two reasonable-sized inputs and big scans of stuff, right? So that would be like I would try a hash join there, option hash join.

If the hash join wasn’t giving me what I wanted, I might even try just setting this query to run at dop1 because dop1 might still be slow, but it’s probably not going to be slower than a query that spills on every single exchange operator because that’s pretty painful, right? It might also be a case where, you know, you might want to add some indexes that do not keep your join keys in primary sorting order.

So that you might, like this, like if you can’t add hints, sometimes if you add indexes that would in other circumstances be considered suboptimal, you could get a hash join plan naturally because SQL Server wouldn’t have nice ordered input for things as it is. So like when we’re looking at this query specifically, I have indexes on the users table, nonclustered indexes on the users table and the post table, and these both lead with the column that’s being joined on, right?

So this one leads with the ID column. This one leads with the owner user ID column. We merge join those here. Those inputs are already sorted, so the merge join works without having to sort. And then up here, we’re just using the clustered index of the users table, and the clustered index of the primary clustered index of the users table is also on the ID column.

So we just get, like, all this stuff is ordered in SQL Server. It’s like, woohoo, merge joins everywhere. And that sucks. One other thing that’s important to sort of pick out here is what the weight stats look like.

So what I am not doing, I’m not looking at the query level weight stats here, because the query level weight stats here are going to be disappointed, because our dear friend Sam, that’s someone at Microsoft, decided that some of these weights that are important would not be important to show you, right?

So when we look through these, we quickly exhaust any useful amount of waiting time on these. Milliseconds down here, tiny milliseconds. The only thing that we have that looks meaningful at all to me, anyway, is CX packet, which we have a whole bunch of milliseconds of, right?

Look at all those CX packets flying around. Let’s see, 1, 2, 6, 5, 8, 5, 7. That is a seven-digit number of milliseconds in CX packet. So we are eight threads CX packeting around, right?

Doing all sorts of crazy CX packet stuff. But the weight stats for the actual weight stats, the session-level weight stats for the query itself, have a lot more interesting stuff to tell us, right?

So when we look at the session-level weight stats, we have some more interesting things in here. We see our dear friend CX consumer. Look at all the CX consumer time we had in there, right?

That’s a lot of time. And the max wait time on both of these is quite nauseating. And, I mean, if you want to talk about something else that’s kind of interesting, the signal weight time on CX consumer and sleep task is also quite interesting for these parallel spills.

But sleep task is another one. We’ve seen this with the hash spills. With the exchange spills, we also see a lot of sleep task weight time pileups.

About 15 seconds of sleep task weight time in total. And almost 14 seconds of that 15 seconds is signal weight time. So waiting for CPUs to say, yes, you can do something now.

And that’s also very interesting with the CX consumer weights. That’s nearly 29 seconds of signal weight time out of the, let’s see, what is that? Oh, sorry, I messed that all up.

Let’s see, 452959. It’s a six-digit number. So that’s 452 seconds of weight time on CX consumer. The query plan doesn’t show you.

Someone at Microsoft, man, why? And like 30 seconds of that 452 seconds was just waiting on CPU signal. It’s just bonkers.

The CX packet doesn’t have that signal weight time thing up there, but it just has a whole lot of actual wait time. And so more signs that your queries might be having problems. So we talked about, over the course of these videos, we talked about sort spills, hash spills, and now exchange spills.

Hash spills in row mode will show you a lot of the sleep task weight. Apparently, exchange spills will too. Apparently, there’s a lot of sleep tasks involved with exchange spills.

The thing is that the sleep task weight, like I’ve said before, is associated with a lot of other stuff. So you can’t necessarily look at a server and say, oh, sleep task bad. But if you’re looking at individual queries and you’re seeing this weight crop up a lot, maybe you’re just running who is active and you’re seeing lots of sleep task weights or something, or IO completion if it’s a sort spill.

That might be a pretty good sign that whatever is happening in there, unless you’re getting an actual execution plan, you’re not going to be able to see what operators are spilling in a query. So you might have to run SP who is active, get plans, look at whatever plan SQL Server has running for that query, and then assume that one operator that requires memory, like a sort or a hash or an exchange buffer, is spilling. And then that’s what the either sleep task weight or the IO completion weight is on about.

If it’s batch mode, you’ll see the BP sort weight. If it’s batch mode hash something, you’ll see a whole bunch of these different HT, memo, delete, repartition, all sorts of crazy HT weights in there. So just stuff to keep an eye on.

Like, you know, when you’re looking at queries running, the weight stats aren’t always going to tell you exactly what’s wrong. But just, like, being able to understand that sometimes they can and what to look for in the query plan based on the weight stats that you see is pretty important. You know, it’s sort of like what I’ve talked about in videos about eager index spools where you see exec sync weights pile up in a parallel execution plan while an eager index pool is being built.

Like, if you’re looking at queries running and you see that exec sync weight in the same way that you might see any of these weights up here, up at the top here. Like, if you see those piling up, you know, just knowing where to look in the query plan based on the high weights you’re seeing is a really valuable thing for performance tuners. So at least something to, like, you know, take away from this video is, like, all the videos that I’ve done on spools so far is just, like, either looking at, you know, weight stats as a whole for a server or looking at queries that are currently running with SP who is active or, like, looking at the weight stats at the query level, you know, anything you’re doing there.

Like, knowing what the weight stats that you’re seeing can mean in the query plan as far as the bottlenecks goes, really, really good stuff for you to know. So, with that, I’m going to jump out a window. Thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. If you like this sort of SQL Server content and you would like to express your undying gratitude to me for publishing it, I do enjoy thumbs-ups and I do enjoy helpful commentary. Not hurtful commentary.

I like helpful commentary. That’s the best kind. If you like this sort of SQL Server content in general and assuming I survive my jump out the window, you can subscribe to my channel and you can get notified along with, let me drumroll, please, as I update my subscriber numbers here, nearly 3,809 other data darlings. You can join them unanimously and with perfect synchronization getting notified when I post these videos.

So, that’s about it. What was I going to say? Oh, yeah, nothing.

Thank you for watching and that’s all for today. Okay. Sticking the landing on this one pretty good. All right. Cheers. Some seltzer for your troubles. All right.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.



4 thoughts on “A Little About Parallel Exchange Spills In SQL Server

  1. Dear GOD/GODS and/or anyone else who can HELP ME (e.g. TIME TRAVELERS or MEMBERS OF SUPER-INTELLIGENT ALIEN CIVILIZATIONS):

    The next time I wake up, please change my physical form to that of FINN MCMILLAN formerly of SOUTH NEW BRIGHTON at 8 YEARS OLD and keep it that way FOREVER.

    I am so sick of this chubby, balding Asian man body!

    Thank you!

    – CHAUL JHIN KIM (a.k.a. A DESPERATE SOUL)

Comments are closed.