It's Time to Get Rid of the Cost Threshold for Parallelism Setting.

It’s Time to Get Rid of the Cost Threshold for Parallelism Setting.

Video Summary

In this video, I delve into the outdated and largely unnecessary `cost threshold for parallelism` setting in SQL Server. This setting has been around long enough to have outlived its usefulness and relevance in modern database management practices. I explain why relying on a fixed number for this setting is not only ineffective but also misleading, as it can lead to suboptimal query execution plans and resource overutilization. I argue that the focus should be on optimizing queries to run faster rather than obsessing over cost estimates, which are merely pre-execution metrics with no actual meaning in today’s computing landscape.

Full Transcript

It’s time to get rid of the cost threshold for parallelism setting. It is a setting that has long outlived its usefulness and even longer outlived its meaningfulness as far as how SQL Server should be considering the benefits of using parallelism when executing a query. There are far too many people out there in the world that think that there is some hidden perfect setting for their workload when there isn’t. They refuse to acknowledge that settings like this are just good, you know, general guardrails for things. They are not a fully bulletproof way of making sure that your workload always does the right thing. You might see people say you should start at 30 or 50 or 150 or 500 and maybe figure out what you should set it to based on the type of workload that’s running. But it really is all magic. It’s all magic number thinking and it’s really annoying to watch people chase their tails on this. It only ever meant seconds on one computer and it is not your computer. It is a unitless metric. If anyone ever called me a unitless metric, we’d probably end up in a fight. You shouldn’t be looking at unitless metrics to tell you how your modern workload should execute. You might think that there is some magic number thinking.

You might think that there is some magic number, but then find that really tragic optimizer costing issues leave important queries with a serial execution plan and options for forcing a parallel execution plan, either trace flags or use hints are all unsupported. And that is something that desperately needs to change, but that’s a topic for another day. Remember that max stop is not min dot. Max stop limits how many CPUs a query is. Can use does not instruct SQL Server. How many CPUs to use. So setting max stop will not force a parallel query. Likewise, you might, you might also, you know, set cost threshold for parallel parallelism in a way that, you know, not only prevents some very important queries from getting a parallel execution plan, but you might also set it in a way where too many queries end up getting parallel execution plan. Because they no longer qualify for exploring the parallel execution plan.

You may also set it too low, which is also in some handy quotes there and find that you’ve run out of worker threads because now everything is going parallel. And that six core server that you threw all the hardware in the world at is just completely overwhelmed by parallel queries all running at some dop or another. One of the, some of the, some of the most, some more frustrating things about it are that it puts way too much emphasis on a very, very flimsy data point. Uh, something that a lot of people, uh, something that a lot of people either refuse to acknowledge or it just never occurs to them or they have never read or listened to anything I’ve said is, And this is like maybe the most important thing in here.

Cost is an estimated pre-execution metric. When you look at your execution plans, there are no actual costs. There are estimated costs for everything.

There are estimates for a lot of things. And for some things, there is even an actual component that gets added after you execute a query. When you look at the execution plan, there will be things like actual rows and actual executions and other stuff like that.

There are no actual costs. So people start paying way, way more attention to operator and subtree cost when they’re trying to tune things. And it makes them think that a meaningful goal is reducing cost when it is not.

Reducing cost is not the goal of tuning a query. If it happens to be a byproduct of tuning a query, that’s fine. But it’s not a query tuning goal.

Your query tuning goal should be to have that query run much faster. Reducing cost is not a way to do that. You might have very high cost queries that run very quickly. And you might have very low cost queries that run very slowly.

There are a variety of reasons for that, of course. But cost is not an indicator of execution time. And in today’s world where compute is at a premium due to licensing, people will start paying attention to these cost things like they’re going to save them money.

That’s what’s going to drive costs down. You might as well be stuck in 2008 looking at logical reads or some other dinosaur metric. You might as well look at PLE or buffer cache hit ratio or, I don’t know, like context switches and disk queue length and other stuff that has just gone the way of, you know, just obsolesion.

One thing that I dislike about it that is, you know, maybe not the biggest deal in the world is that it tends to give people a very false sense of control. Sure, you can change this number whenever you want, right? It’s a thing that you, it’s a knob that you have control over.

You can change it to 49, 50, 51, 62, 75. You can go up, go up as higher, as low as you want with it. But you can’t change how the optimizer costs things internally, at least not in any supported way.

There are some, there are some DBCC commands that you can run, but, you know, most people aren’t going to start running those on their production servers just to see, see what if this and that happens. It can also be very confusing to the same people who are confused by a lot of, a lot of things in SQL Server who are just not experts or who have never spent much time with it. Or, you know, maybe they spent a lot of time with it, but, you know, they’ve been, they’ve been taking the backups and doing the index rebuilds for years.

And that’s, that’s their, that’s their single point of experience. They’ll see parallel plans with a cost that’s lower than their cost threshold for parallelism setting and think that SQL Server is broken, right? But that’s not true.

What, what, what, what no one gets is that every plan starts out as a serial execution plan. If that, if the cost, the estimated cost of that serial plan is higher than your cost threshold for parallelism setting. And SQL Server starts looking at parallel plans.

There are no natural inhibitors to a parallel execution plan, like a non-inlineable scalar UDF or an insert into a table variable or something. Then SQL Server will start looking at these candidate parallel execution plans. And if it finds a cheaper one, then it just might go with that.

After, you know, factoring in CPU reductions and whatnot. But, but you might end up like if, let’s say that your cost threshold for parallelism is 10. You might see a parallel execution plan with a cost of 8 or 9 or 1 or 5 or 0.

Because the parallel plan was cheaper than the serial plan. Well, while we’re on the topic of parallel plan costing, nested loops queries get an absolute screw job on parallel plan costing. Because the costs for anything that happens on the inner side of the nested loop are not applied any reduction.

But often nested loops queries benefit quite a bit from parallelism. Right? Serial nested loops queries at a certain point just drive the mind, just fries your brain. We’ve come to the point with SQL Server as a, let’s call it a mature software product.

Where we have enough stuff going on that falls under the intelligent query processing umbrella of features. There are many things that will happen or not happen based on other heuristics. I think probably, probably the easiest one to recognize in that category is batch mode on rowstore.

SQL Server will use various heuristics, heuristics about the type of query, the joins, the size of the tables and things like that to see, to figure out if batch mode on rowstore might be a good thing for you. And batch mode on rowstore leads to all sorts of other neat things like adaptive joins and whatnot. We’re at the point now where there is almost no sense in taking user input on what query cost should be before a query goes parallel.

There might be a good candidate setting to replace it with, but it might also at this point just be completely replaceable by some intelligent query processing feature that just uses heuristics. Again, similar to what batch mode on rowstore does to implement or explore the parallel query plan space. And there are also many feedback mechanisms where one might look, where the optimizer could look at the query plan and then the engine could execute the execution plan.

And then we could look at things after the fact and say, well, you know, we didn’t think parallelism would be good there, but we ended up with a lot of rows on a single thread. And boy, howdy. Maybe, maybe more threads would help.

We actually already have a setting called dop feedback, which is, I mean, plum useless the way it was designed. But, you know, I wasn’t the PM on that. So don’t blame me.

So there are things that would be better to do at this point, right? Like setting, like leaving this cost threshold for parallelism setting, like, you know, A, out of the installer, right? Because, like, there are so many other things in the installer now.

You can set maxed up in the installer. You configure 10 dB in the installer, right? There’s, like, stuff, like, you can even turn on, what’s it called? Like, perform volume maintenance hash, like the lock, the instant file initialization thing. You can turn that on in the installer now.

But they leave cost threshold for parallelism out of the installer. And by they, I mean Microsoft. There’s, like, nothing in the documentation that gives people guidance or anything on it. You can’t change it at all in Azure, right?

So who knows what Azure is doing, right? Azure SQL database, you can’t change it at all there. I mean, you can change it in managed instance. You can change it, like, if you have a VM. But Azure SQL database, you can’t change it there.

So perhaps there is some scientific exploration going on about a better way to gauge the relative benefits or drawbacks of parallelism for queries there. I don’t know. I don’t have any information on that.

But I just have to hope that at this point we can finally either drop it, drop cost threshold for parallelism as a setting completely, or we can finally start to give people some meaningful thing to do with it. The thing is, I just don’t think that there is a routinely meaningful thing that you could tell people that they should do with it.

They would solve their problems. Anyway, I’m done here. It’s Friday.

I’m going to go think about this and, I don’t know, stare at some red stuff in a fancy glass. All right. Thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.