A Little About The CXSYNC_PORT Wait In SQL Server 2022

A Little About The CXSYNC_PORT Wait In SQL Server 2022


Video Summary

In this video, I delve into the fascinating world of parallelism in SQL Server queries and introduce a crucial aspect that can significantly impact query performance: CXConsumer weights. Parallel execution plans are designed to distribute workloads across multiple threads for faster processing, but sometimes, these threads don’t utilize resources evenly, leading to inefficient use of parallelism. By examining CXConsumer and CXPacket weights, you can identify when queries might be underperforming due to skewed thread distribution. I walk through a practical example using the `parallel, serial` view to demonstrate how to analyze execution plans for signs of inefficiency, including looking at query time stats and wait types like CX sync port. Additionally, I highlight important metrics such as DOP (Degree of Parallelism) and thread usage, which can provide valuable insights into whether your parallel queries are truly benefiting from multiple threads or if they’re just wasting resources.

Full Transcript

Erik Darling here with Darling Data, of course, still to this very day, always and forever, faithfully yours. In this video, we’re going to talk about, well, we’ve got a couple things to cover. It’s an admittedly interesting subject that has to do with parallelism and which weights you can use to indicate that a parallel query may have had inefficient usage of parallel threads. Now, what that means is you might see some queries that use like .threads to process rows, right? You have the coordinator thread zero, then you have some number of rows, whatever your .threading is, it will have rows given to them to do work. Sometimes, the distribution of rows across those threads is nice and tidy and near perfect. Other times, it’s very skewed towards a single thread, or just like, you know, one thread has like 90% of the rows and the other threads have like a smattering of rows. There are all sorts of different weird reasons this can happen. But the important thing is that you need to be able to figure out if it’s happening on your server.

Now, a while back, Microsoft introduced a weight called CXConsumer to split up CXPacket versus, well, to split up how parallelism is measured within a query. So like, while a consumer thread is waiting on parallel threads, you get CXConsumer, while parallel threads are doing stuff, you get CXPacket. Now, because some guy who works at Salesforce was like, CXConsumer is totally benign, a lot of scripts that you might find out there on the internet will automatically screen out CXConsumer.

I won’t do that to you because it can be really important. But SQL Server, well, I guess the really bad part about that is they get screened out of actual execution plans. So like, you’ll see CXPacket weights, but you won’t see the CXConsumer weights in there. And the CXConsumer weights in there can really help you figure out why or really be an indicator of why your parallel query was quite slow. The kind of good news is that SQL Server 2022 has inherited a weight stat that both Azure Managed Instance and Azure SQL DB use.

Thankfully, no guy who works at Salesforce is around to tell you that this weight is negligible and benign and that you don’t have to worry about it. So, so that you can, you can find in your query plans when parallel might be inefficient. So what I’m going to do is run this first part of the script and this first part of the script, well, it’s going to be a little over, a little, little, little bit more than I would care to normally capture.

I don’t usually want the query plans for the, those temp table inserts, but in this case, we’re just going to live with it because it makes, makes more sense to kind of do it all in one shot. So, so right now we’re selecting data from this view called parallel, but serial, which you can probably infer from the name is a query that generates a parallel execution plan. But the, the, the, the rows all end up on a single thread.

So in effect, it’s actually just a serial execution plan. So let’s dig a little bit into what we have here. We have an execution plan right here.

And if we zoom in a little bit on the execution plan, we can see all this lovely stuff. See all this lovely data, there’s all these lovely parallel operators. And we might be wondering, well, why did this, why did this parallel query take 24 seconds?

The reason, if we dig in a little bit more closely, now we’re going to do some advanced stuff here, right? We’re going to, well, actually zoom it, zoom it kind of, zoom it, zoom it kind of jinx me on that one. So this is how, this is how many rows our query is dealing with.

The problem is that when we look at the properties over here, and this is a little, little wonky at the moment. Let me fix that for you. There we go.

Let me go and slide this on over. Very professional presentation over here. You can see that all the rows end up on this thread one. So it’s not very good.

And if we carry on, we actually go and look at this one. See that all the rows end up, still end up on thread one here as well. So that problem persists throughout the query plan.

And part of it is the lack of a repartition streams operation anywhere here. We have a distribute streams operation, which attempted to divide rows evenly out to different threads. But the round robin partitioning type kind of messed that up a little bit.

Anyway. So digging into what went on here. If we look at the properties and we look at, well, there’s two important things. One of them is the query time stats.

So one thing that you want to be on the lookout, if you have a parallel execution plan that runs for a while and you’re like, what the hell took so long? What were all those threads doing? A very important thing to look at is in the query time stats section of the property.

If you go to the select and you get the properties there, pretty much do what I just did in the video. You’ll see this. CPU time and elapsed time are just about even.

And that’s not good because the whole point of a parallel execution plan is that you give more CPUs work so that you spend less time on the wall clock. Right. You have multiple CPUs.

I’ll do a bunch of work. Assemble that work later. Those threads all do an equal amount of work. Like let’s say your query runs at DOP 8. Your query should be eight times faster than wall clock time, assuming that parallelism scales linearly. It may not always, but you know, neighborhood of, you should see some improvement there.

If we go into the wait stats, and I’m going to bring this back out. If we go into the wait stats for this query, we’ll see a whole bunch of CX sync port. This is that new wait type that I was talking about.

It’s in SQL Server 2022, Azure Managed Instance, and Azure SQL DB. And this can kind of help you figure out when queries have skewed parallelism. Now, something that eagle-eyed watchers might notice is that the wait time on this wait is nearly even with CPU and elapsed time for the query itself.

So this is a pretty good indicator that the whole time this query was running, the 23 or so seconds this thing ran for, most of the time we were waiting on parallel thread stuff. And then if we dig, again, dig a little bit further into the execution plan, let’s get rid of this. We can look at almost anything that shows where rows end up.

We can look at this index seek operator, and we can look at the actual number of rows, and that will show us all the rows ending up on a single time. Right there. We could look at any one of the lines that shows where rows were distributed among threads and see it there.

Remember, initially, we looked at this line and this line. You would also see that if you look at this key lookup, where all of the rows, just like before, ended up on a single thread. And sort of likewise in the query plan, those sections where the rows are terribly skewed.

Oops, didn’t need that tooltip. Thanks anyway, though, tooltip. Very useful otherwise.

Those rows are where things in the query plan really did accumulate the most time. So basically all 23 seconds of the runtime for the plan was just in this chunk right here. So again, always be looking at your actual execution plans.

Always be paying attention to operator times. And if you have a parallel plan that you think is running very slowly, it’s always worth investigating if one or a couple of the threads ended up with the majority of the work, and you had a bunch of threads waiting around, sitting there doing nothing.

Now, another sort of interesting point in here is if we look at the thread stat for this query, we will see that this query was allowed to have three active branches. So three parallel branches were allowed to be running simultaneously.

We reserved 24 threads for this whole query to run, but we only used 17 of them. So at certain points, what’s that? Four, well, seven threads.

Remember, because it’s DOP8. At some point, seven threads are just like, I got nothing to do. Good. Not doing much over here.

So that’s another good indicator that perhaps your parallel query did not do the work that you thought it did. Now, what I have to do is over here, I have to figure out why I have a bunch of negative numbers in here and why I have a bunch of duplicates in here, because all I did was insert rows into this table for my SPID and then join that temp table over here.

I think I found a bug in SQL Server where parallel queries don’t clear out when you open new windows and you use the sys.dm exec session waitstats view. Something is a muck in there.

So that’s fun. It’s a lot of fun, actually. Really, really happy to find these things. One last thing that I will point out is that a lot of this information is available to you in the plan cache and query store. So if you go digging around the plan cache or query store, you might see times when you have a parallel query.

Last DOP 8, last elapsed time, 23 seconds. Last worker time, 23 seconds, where things are quite even despite a high DOP, and that could be a pretty big warning to you. There’s also these.

Well, I mean, we already looked at the DOP column, but let’s give it a pink hug anyway, where the last reserved threads 24, last used thread 17. So it kind of brings you that information that we saw in the actual execution plan. And, of course, that is our query here.

So morale of the story, you will not see this in cache plans. Cache plans don’t save the row stuff for you, right? You don’t have all that cool information about which rows ended up on which threads in cache plans.

So that’s the plan cache, query store. Estimated plans won’t show you that, but we can go digging because that’s what we are. We are query plan archaeologists.

We can go digging more into actual execution plans and figure this stuff out. So anyway, I hope you learned something. I hope you enjoyed yourselves.

What was the other thing? If you like this video at all, please consider throwing it a thumbs up. If you enjoy this sort of hard-hitting investigative SQL Server content, then feel free to like and subscribe to my channel.

Like the video, subscribe to the channel, in that order. And you will get helpful notifications every time I post this sort of hard-hitting investigative SQL Server content. Anyway, thank you for watching.

I will see you in another video some other time.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.