YouTube Days: Improving The Parallel Query Processing Documentation For SQL Server

Improving The Parallel Query Processing Documentation For SQL Server


I’m working on ramping up my recording and streaming setup now that life has settled down a little bit, and publishing the results to YouTube while I work out the wrinkles. Enjoy some free video content!

Video Summary

In this video, I delve into the parallel query processing documentation in SQL Server, pointing out where it’s incorrect or misleading, and highlighting some dead-end links that don’t provide additional information. I walk through several examples to demonstrate how certain constructs, such as recursive common table expressions (CTEs), scalar user-defined functions (UDFs), multi-statement table-valued functions, and the `TOP` keyword, can actually support parallel execution in ways not fully captured by the documentation. By creating detailed demos using the Stack Overflow 2013 database, I show that some parts of these queries can indeed run in a parallel zone despite what the docs suggest, while other parts remain single-threaded. This video aims to provide clarity and practical insights into how SQL Server handles parallelism for these constructs, helping developers make more informed decisions about query optimization.

Full Transcript

Erik Darling here with Darling Data. And today’s episode is going to focus on the parallel query processing documentation, some places where it’s wrong and some places where it’s misleading, and some places where the links to get more information sort of lead to a dead end. There isn’t actually any more information at some of these links, which is kind of funny.

Now, the docs start by talking about constructs that inhibit parallelism, which is this list here. You have scalar UDFs, which is mostly right. You have remote queries, which is like half right.

You have dynamic cursors, which is right but incomplete. You have recursive queries, which is half right. You have multi-statement table-valued functions, which are also half right.

You have the top keyword, which I’m actually going to call half wrong, because it is, again, not really accurate. And what’s funny is that for a lot of these things, there’s all these things that say for more information, you go over to whatever link. And for a lot of them, there’s not actually any more information about parallelism at the other docs.

And that’s fine. Maybe it’s not the right place for it. But it’s also not very helpful to folks who are trying to find things if there is nothing additional at those other links about parallelism. Now, if you scroll down a little bit further, there’s this really exciting table of non-parallel plan reasons.

The non-parallel plan reasons are, they’re right for the most part. But the thing that kind of gets me is that there’s actually better detail in this table than there is up there at the top, or up there in the list of things that might inhibit parallelism. And for some of them, it’s kind of funny because you’ll see reasons that queries can’t go parallel here that don’t actually show up in here.

Like there’s nothing about top in here, there’s nothing about multi-statement table valued functions in here. There is, of course, something about table variables not supporting parallelism. But there’s nothing in the documentation up here that says the problem with the multi-statement table valued function is the table variable, not the function itself.

So I’ve written a series of demos that go through and either complete or debunk some of the information in the docs. So let’s start over in the lovely Stack Overflow 2013 database. And the first thing that we’re going to look at is recursive CTE.

Now, what I’m going to do is not actually run this whole thing because we don’t actually need to run this whole thing. And for a lot of these queries, I have some hints on there to force parallelism because it wasn’t naturally occurring. But what I want to show is that some part of the plan can indeed go parallel if need be.

So we’re just going to get an estimated plan for this recursive query. And just to make sure everyone understands a recursive CTE, a CTE is when you start off with a CTE. And then inside of that CTE, you execute some recursive doodads.

I love the doodads. So here we’re just selecting from post where the ID is equal to some post value. And then you union all and you join the recursive portion of the CTE.

You join that sort of back to itself, right? Because the name of the CTE up here is P. And we’re selecting from that part of the CTE.

This is the anchor. And this is the recursive portion. So if we look at the plan for this, now, if we remember what the docs said, one thing that would happen if we had a recursive query query is that we would have parallelism inhibited for that query.

And we can clearly see the execution plan that a large chunk of the query plan, after the recursive portion, does execute in parallel. We have a parallelism distribute streams.

And that’s where we join off to the user’s table outside of the recursive portion. So after we select from the recursive CTE, we join to the user’s table.

That whole portion of the query is free to run in a parallel zone in the plan. But the recursive CTEdoes have a restriction where this part has to run in a serial zone in the plan.

So the recursive CTEdocumentation, about half right. Just about half right. And have like the recursive CTEportion, can’t go parallel, but other parts can. The scalar UDF version is, again, right, but incomplete.

So I’m going to create this UDF, scalar UDF here. And this UDF, even though I’m on SQL Server 2022, and I am in compat level 160, 160, one restrict, so like some functions could potentially be inline, scalar UDF inlining, pretty cool feature.

Started with, you know, SQL Server 2019. But there are a lot of limitations to it. And one of the limitations to it is that if your scalar UDF has a CTE in it, then it is disallowed from scalar UDF inlining for some reason.

Why? I don’t know. It just is. So if we run a query that calls that function, we’re going to run the whole thing once. And then we’re going to run, actually, you know what?

We’re not going to run this whole thing because I didn’t create indexes to support this, and it will run for four hours. That’s a different demo. But what I do want to show here is why the documentation is only half right. So the initial call-in query up here is forced to run single-threaded.

If we grab the properties of the select operator and we look at the details in the properties pane, we are going to see the infamous non-parallel plan reason. And since we are on SQL Server 2022, we actually have the filled-in reason here, the t-SQL user-defined functions not parallelizable, which is right.

They’re not when they can’t be inlined. Inline functions might not go parallel even. But what you do see in the part of the plan that calls the scalar UDF, right, in here, where the UDF is executed, we have a parallel plan.

If we look over here, we can see that the body of the function was indeed able to engage a parallel execution plan. And part of the reason why this occurs naturally is because I didn’t create indexes.

So good on me for remembering that at the last minute and not sitting here for four hours waiting for results to come back. And the scalar UDF thing can be expanded out to two places where I see UDFs pop up quite a bit, where I really wish they wouldn’t.

One is in computed columns and the other is in check constraints. So if we create this function, actually I already created this, so doing this is really just for show. Create this function, oh, what did I do?

Oh, oh yeah, okay, I can’t alter it because it’s already referenced by the table. Good, good, good. So we don’t actually have to do that for show. But what I have here is a table created called serial. And the definition of the table has a computed, persisted even computed column in it that calls that function.

I’ve already stuck some data in the table and I’m just going to run this and get the estimated plan. And this will, because we have a scalar UDF in the computed column on this table, helpfully called serial, we are going to have our non-parallel plan reason yet again.

All right, so even SQL Server, SQL Server’s scalar UDF inlining abilities cannot defeat a scalar UDF in a computed column even if it’s persisted. It would be the same deal even if we indexed it as a parallelizable thing.

Check constraints are a little bit more complicated in the restriction. So I’ve got another table where I have this as a check constraint and where I already have some data in there. And the main difference in what’s parallelizable when you have a check constraint is in which columns are referenced.

So in the first query, I’m just, I’m just like basically cross-joining these tables together. But when I do a cross-join, the plan’s a little weird. So I’m just doing a left join on 1 equals 1, 2, I’m self-joining this table to itself basically.

And I’ve got the query trace on 8649 here to force a parallel execution plan. And when you look at these two plans, the first one, actually let’s go back to the queries real quick. The first one just does a count big.

We’re just doing a count of everything everywhere. Which means that the check constraint column doesn’t have to be projected out anywhere. It’s not involved in the query.

So the first query plan actually does, is fully parallel. This is all parallel up until the very end of the execution plan. And the second one, where we select the column B, which is what the check constraint is on, and where we also group by B, because we have to group by B if we’re getting a count, that query plan is forced to run fully single threaded.

And if we look at the non-parallel plan reason, again, we’re going to have the non-parallelizable function in the details there in the properties. So a little bit more complicated of a story with the check constraints.

And so it really is the column that’s involved in the check constraint that causes the problem. So if I created this table differently, and I put the function on the ID column, then doing anything with the ID column would, if I had to join with it, selected any of that stuff, then that would cause the plan to run single threaded.

Now, multi-statement table value functions are also somewhat restricted in parallelism, but not fully restricted in parallelism the way that the documentation kind of indicates. So if we create this function here, it returns a table variable, we do all this work in here.

The part of the query plan that is initially forced to run single threaded is going to be the modification to the table variable. Modifying table variables can’t run in a parallel zone.

The other part of the query plan that’s going to be single threaded is going to be reading from the table variable that the multi-statement table valued function returns. There’s all sorts of intricacies here where if you had table variables, like since multi-statement table valued functions report, like basically support, you like multiple steps, if branching, all that crazy stuff in there that uh, inline table valued functions don’t.

So if you had table variables declared, inserted to read from, and internally inside of the function, uh, table variables could theoretically be read from in a parallel zone, but the return table variable, the table variable that you return data from, cannot do that.

So if we run this query, or just get the estimated plan from it probably, uh, we’re going to see the estimated plan both for the query that calls the multi-statement table valued function, and for the multi-statement table valued function itself.

So when we looked at the scalar UDF, the body of the scalar UDF was allowed to run in a parallel zone. It’s not the case here. If we look at the properties of the table insert for, uh, the, uh, multi-statement table valued function, we are going to have, uh, that’s sorry, this is, this is a long, this, this one’s a mouthful.

Uh, we’re going to have a non-parallel plan reason that table variable transactions do not support parallel nested transaction. I’m not sure if that should be transactions in there. Uh, it seems a little weird that that’s singular, but whatever.

Um, I’m not the grammar police. Uh, don’t, don’t engage in that sort of fascism. Uh, but, uh, up here in the, uh, the query that calls the, um, calls the multi-statement table valued function, we do indeed have an entire parallel zone in that query plan.

The part of the query that is not allowed to run, uh, in parallel is reading from the multi-statement table valued function, returning that table variable. That is, that is a restriction, uh, across the board.

Uh, another place where the documentation is, uh, quite short on detail, and where, you know, if we remember, like, come back, coming back over here and looking, we see the top keyword mentioned here, right?

Trying to give tops a bad name. Big mistake. Uh, and then we look down in the non-parallel plan reasons. There’s no mention of top anywhere in here, right?

And top isn’t mentioned as a non-parallel plan reason. Uh, and this is where it’s a little misleading, because it says top will restrict parallelism, but then if you look in the details down here, it doesn’t actually do that. Uh, there’s no, there’s like no mention of it here.

And it’s, it’s only even partially true that it, because it only happens sometimes. So if we look at this query, where we have, uh, two select top ones, one here and one here, and we just sort of join those together.

Uh, and we look at the estimated execution plan for this. We are going to see, uh, two branches of the parallel query plan, right? Because we have the two top ones.

Uh, they can’t share whatever they were doing. Uh, and we do have SQL Server coming into a serial zone after the initial parallel zones, right? So like we have the gather streams operators here, which is clinching our parallel zones together.

Uh, these two things happen completely in parallel. But then as we get into the top operator, this is where the serial zone is, right? So everything basically from here on over happens single threaded, going into the top.

Where that diverges from being the God’s honest truth is when we use apply. So, uh, when either when we use the cross apply, uh, language element here, or if SQL Server optimizes a nested loops join to use, uh, apply nested loops rather than regular nested loops where the apply pushes predicates down to when we, we talk to the table on the inner side of the join.

Uh, then, then we can get a fully parallel zone there. I’m just going to get an estimated plan for this. And you can see that in this case, we have a fully parallel zone.

Well, let’s sorry. Let’s frame that a little bit better. Uh, there we go. We have a, uh, the end of the parallel zone gathering streams going into the final top, but we do not have, but we, we don’t have any restriction on this part of the query.

If you can even see this top end sort is, uh, fully inside of a parallel zone. There’s no clenching or cinching of the parallel threads there. Uh, the other thing or another thing that can, uh, cause a parallel, or sorry, a serial zone in a query, uh, is a global aggregate.

So, uh, I’ve got quite a bit of fancy, um, query stuff, query hints in here to get exactly the plan that I want. Uh, sometimes it’s kind of tough to get exactly the plan you want, just writing the query, uh, with, you know, some of the tables that you have available.

Uh, but if we look at this estimated execution plan, we have, uh, two parallel branches of the query. Uh, we have this part, which is fully parallel. And then going into both of those, uh, uh, I have to remind myself what those were.

There are two counts in this case. If we look at this going into each one of those counts produces a, um, produces a serial zone in the plan. We can see the gather streams happen there.

And the stream aggregates that do the count here. Uh, and then the global sum, right? The getting the sum of the two counts, adding those two counts together is also in a serial zone. This is would also qualify as a serial zone in the plan.

If we did some, if we, if our query looked a little bit different, uh, you can kind of, uh, see how that works. If you, well, actually, that doesn’t even go parallel. So forget it.

Um, and that’s not true of, uh, queries that have a grouping element, right? So global, global aggregate is just like select count, select sum, select average, min, max, all that stuff. Uh, for queries that, uh, have a, uh, grouping, uh, element to them, those queries, uh, those aggregates are allowed to happen fully inside of a parallel zone.

All right. So this one, this whole query runs in parallel. That’s fine. Uh, another thing that the documentation is only half right about is linked servers. So, uh, and I, I realize that this is cheating a little bit, but it’s the best that I’m going to do, be able to do quickly.

Uh, so I have a linked server here that I create. It’s a loopback linked server, meaning the server basically points to itself. And what I’m going to do is, uh, clear out, uh, wait stats and latch stats.

Uh, I’m going to query, uh, wait stats for the server. Uh, and then I’m going to run, I’m going to query the linked server here using open query. And then I’m going to select, uh, from the wait stats DMV here.

So if I clear everything out and I’m the only person on the server, it’s just me, nothing else is going on. And this happens quickly enough that I’m confident that it’s my query doing the parallelism. Uh, the first run through of wait stats after I clear everything out is all zeros.

And then the second run after I execute my query and re query the wait stats, I do have some parallel query weights and that’s pretty consistent across like every execution of this seems to end up with just about the same amount of CX sync port, CX sync packet, all that stuff.

If I keep executing this, the, the, these weights stay close enough that I’m, you know, I can confidently say that’s my query doing it, right? It stays pretty, pretty close to, uh, what it is.

Uh, if we kind of go back up a little bit here to where it’s quoted out and the reason I have this is just to show that one query, like the query that I’m executing does go parallel is, uh, well, actually, you know what, we’re not even gonna, we’re not gonna bother running, uh, oh, actually, I finished.

What, what the hell? So if we look at these two, uh, query plans, the version of the query that just touches the base table on the, on the server itself without using the remote query stuff, uh, that does go fully parallel.

So, um, this is about 238 milliseconds of, uh, execution time. So I do believe that the 300 or so milliseconds of parallel query time is close enough to this, that that would, that would indicate that it was my query that was responsible for it.

But if we look at the query plan for the, uh, remote server query, uh, we can, we do see that locally, the query that calls the remote server can’t go parallel. That restriction is local to where you call the linked server, but on the linked server, the query over there can, can run in parallel.

So again, only about half right on that one. Now, uh, I think the final thing that I want to talk about are fast forward cursors. Now, this is one part where, again, the table is more helpful than the documentation source, because we have, uh, where is it?

No parallel fast forward cursor is a reason in the documentation here, but it’s a little weird that the documentation up here only mentions dynamic cursors. All right. So that’s a, that’s a little strange, especially because when you go to that, uh, go to the documentation about cursors, there’s no mention if a fast forward cursor is a dynamic cursor, which might be something helpful to mention. So if I run this and everyone can be impressed by how blazing fast my query, my, my cursor was look at, look at that fast cursor execution. Uh, and we look at the properties here. Uh, we will see the, the, the, uh, warning that I highlighted before the no parallel fast forward cursor that does indeed restrict a parallel plan, but, uh, that is not really, uh, that is not really documented well over on the, uh, the learn site. So, uh, that that’s pretty much the end of it there. Um, you know, uh, there’s a written version of this post with all the demo scripts in it coming out, uh, around the same time that this will, this, this will be coming out. So, uh, if you want to dig in further on any of the scripts, see how things work there. You can, um, you can, uh, you can follow along that blog post. This is just the video version of it, which, uh, walks through stuff.

I don’t know. Some people like videos better. Uh, me, I’m, I’m just mostly trying to get the, uh, the video set up exactly how I want it here and also, uh, get, get back into getting used to recording things because, uh, I let that slide for a really long time because, uh, various reasons that I’ll hopefully get to talk about it past some. But anyway, uh, that’s it for today. Uh, I’m going to, um, go drink some water. Parched. Dry in here. Hear it. Anyway, thanks for watching. Hope you learned something. Hope you enjoyed yourselves. Uh, and I will see you in another video. Uh, I don’t know.

Maybe, maybe tomorrow, maybe later today. We’ll see what happens. See how, we’ll see how, uh, see how motivated I’m feeling after some nice, nice New York tap water. All right. Thanks for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.



4 thoughts on “YouTube Days: Improving The Parallel Query Processing Documentation For SQL Server

  1. Thank you for the great post, Erik. A wealth of learning jam packed into 22 minutes.

Comments are closed.