Tweaking SQL Server Queries To Induce Parallel Execution Plans

Tweaking SQL Server Queries To Induce Parallel Execution Plans



Thanks for watching!

Video Summary

In this video, I delve into the nuances of parallel execution plans in SQL Server and why sometimes a hint or trace flag isn’t enough. Erik Darling from Darling Data shares his insights on using an `ORDER BY` clause to trick SQL Server into generating a parallel plan when needed. He explains how adding unnecessary sorting can make SQL Server perceive the query as more costly, leading it to choose a parallel execution path. The video also covers other methods and considerations for inducing parallelism, emphasizing the importance of validating whether such plans are truly beneficial or just adding overhead.

Full Transcript

Erik Darling here with Darling Data. And I have sort of a short video today because sometimes a short video is what I have time for and sometimes a short video is what you have time for. So this is going to be a good and happy time for when I only have time for a short video and for when you only have time for a short video. It’s just amazing how that’s going to work out. Well, one thing that comes up a lot for me in my consulting and in, well, I mean, I guess sort of like a general query tuning methodology is, you know, there are definitely times when a parallel execution plan is preferred. And faster. The problem is that Microsoft doesn’t give us a min-dop hint. There’s no way to tell SQL Server via a hint called min-dop that you want to set a minimum degree of parallelism for a query. It would be nice if you could. It would save everyone a lot of trouble. Unfortunately, Microsoft doesn’t care about you that much. Or apparently, it doesn’t care about you that much.

Or just wants to make you suffer. And that sucks. What we do have, we have a trace flag called 8649. The trouble with trace flags is that, you know, you do need admin permissions to use them. And, you know, you can’t put trace flags in some modules like views, things like that. Like, you know, you can’t put, like, you know, option query trace on whatever at the end of a view query. And we also have a use hint called enable parallel plan preference. That one has fewer drawbacks, but the problem with both of them is that Microsoft does not document nor support them.

So, you know, we as query tuners are left with this unfortunate sort of, you know, do we want to use undocumented, unsupported hints and trace flags in our queries? Yes. Yes, yes, we do. It’s fun. Or do we want to find other sort of more natural ways of getting SQL Server to think that a query will do so much work that we need a parallel plan for it? Now, there are a lot of circumstances where you just don’t need to encourage SQL Server all that much.

Most of the time, SQL Server will just be like, you get a parallel plan and you get a parallel plan and everyone gets a parallel plan. Go crazy. So, you know, that’s fun. But sometimes you got to play some tricks. There have been various attempts over the years that are more successful in some compatibility levels and in some contexts than others.

I often find that, you know, you can usually coerce SQL Server into using a parallel plan by making it do something extra that doesn’t logically change the query that it wasn’t doing before. I think probably the easiest one to talk about is like sorting, like making SQL Server order data. A lot of the times when you make SQL Server order data, it will freak out and be like, whoa, expensiveo.

Parallel plans for everyone. So just to give you like, I realize these queries, you know. Okay, first off, like if I was yelling at someone about this query, top without order by is complete garbage nonsense.

If you’re writing queries like this and you don’t have like the, like, if God did not hand you a rock that said this query can only ever return one row anyway, and you do this, I think, I kind of think you’re an idiot and you should stop. This is not a good practice.

But I just want to show you really quickly that using this top one query and then using a top one with an order by, how that, how SQL Server doing more work makes it be like, oh, we need a parallel plan now. So let’s look at these two things.

All right. Top one, same query, top one with an order by. And if you look at these query plans, you might be shocked. You might be horrified.

You might be mortified to see that the first query we ran with no order by runs for nearly 11 seconds. All right. So we spend about almost nine seconds in the votes table and then doing some hash flow distinct hippie stuff right here.

And this thing runs for quite a while. And what’s really amusing about both of these queries is neither one of them return any rows. I did that on purpose.

Kind of a funny, kind of a funny query. I know the stack overflow data apparently well enough to get queries to do dumb things like this. So the second query does have an order by.

But here’s what’s tricky about it. It has an order by on the ID column. And how did that end up over there? That is not where, that is not correct indenting.

One, two, three, four, five. Oh, we need one more. There we go. Ah, no, come back. There we go. That is proper indenting. Four spaces in for select list columns. I don’t know why SQL prompt sometimes screws me and aligns this thing with the top or distinct.

It’s awful looking. But this query, those were ordering by u.id. And the funny thing about u.id is that it is the clustered primary key of the users table, meaning this thing already has an index on it.

It is well indexed. It should be, SQL Server should be able to return this in index order very easily. But that’s not what SQL Server has a problem with.

What SQL Server has a problem with is the votes table. Look at our mighty votes table. Look at all these rows.

These rows coming out of the votes table. Look what we do to those rows. We sort them. We have to sort these rows.

Because we ask SQL Server to return this data in order. Ah, curse you. To return data from the users table in order. SQL Server thinks that from an I.O. perspective, it would be a whole lot smarter if we had data in order from the votes table.

All right. So to make the loop join a lot faster. All right.

So SQL Server does us a solid here. And sometimes this kind of stuff, this general approach works with like all sorts of queries. But especially, you know, when you have exists and not exist queries where you’re dealing with queries that have these sort of funky little row goal things in them.

You really need to, you know, make sure that the plan that you’re getting is not one that is susceptible to row goal problems. Row goal problems are painful problems. Because this query up here, quite honestly, has a row goal problem.

If you look at this stuff, that’s a sassy row goal problem. The SQL Server is like, ah, I think there’s only one. Mm-mm.

Mm-mm-mm, SQL Server. You’re wrong. We’re all wrong. Well, I think being wrong about row goals is the worst kind of wrong you can be. Because when you’re wrong about row goals as a query optimizer, you really barf all over everyone’s day.

It is a sick and sad thing to see in the world. But this query, these numbers look a lot smarter to me. And this parallel plan turns out a lot better, especially when we’re dealing with lots and lots of rows.

So, and if you’re out there trying to tune a query and, you know, let’s just say on the off chance that maybe you’re like, I think a parallel plan will be better here. And there might be all sorts of reasons you might think that. You might, you know, look at one of your operators and be like, holy cow, 50 million rows on one thread?

That seems like a bad idea to me. We should have those rows on dop threads because dop threads would be a lot faster. So, if you come across a situation where you’re pretty sure that a parallel plan would be a lot faster.

And, you know, like it’s okay if you want to test queries using trace flag 8649 or the enable parallel plan preference use hint. But if you need to find a more natural way of inducing parallelism, often, you know, adding things that don’t really change much about the query but do change the work SQL Server thinks it has to do are pretty valid ways of getting a parallel plan where you otherwise just wouldn’t be able to get one. So, order by is one of those funny things.

And there are all sorts of places where you can stick order by, especially with big tops, like top two bajillions, in order to get a parallel plan where it just, it doesn’t change logically or semantically what the query is doing, what the query is returning. But it does fool the optimizer into thinking that a query is going to be a lot more expensive from a costing perspective. Not necessarily like, oh, it’s going to make the query slow.

Remember, costing and slowness are not the same thing. But it’s just a pretty good way of getting SQL Server to think, to like operate under the impression that it has to do a lot more work than it actually has to do. Get yourself a parallel plan and just make sure that, you know, like you can validate pretty quickly if the parallel plan is faster and if the parallel plan is efficient.

Remember, one of the key things about parallel plans is that you should see a fairly good difference between CPU time and wall clock time. If you have a DOP 8 query, ideally, you would see CPU time be 8x whatever wall clock time is, but that’s not always going to be the case. But what I want you to keep an eye on is something like the closer CPU time and wall clock time get to each other in a parallel plan, the less efficient the parallelism was.

That’s where you’re going to see like a lot of rows ending up on like one or two threads or something within the parallel plan. And that’s going to just just going to show you that the parallelism was pretty inefficient. So I’ve covered that in many videos, many, many videos, thousands of videos at this point, maybe even billions.

Hard to keep track. Only. Only there was some sort of dashboard told me how many videos I had.

All right. Anyway. Thank you for watching. Hope you enjoyed yourselves. At least a little bit. I hope you learned something.

And if you like this video, if you are into playing tricks on the optimizer, like I am, throw this video a like or throw me a comment that says, Cool dude bro job.

I think you did a cool job, bro. Something like that maybe. Your English might be better than mine. Mine’s terrible.

And also if you like SQL Server content, messing with the optimizer, giving it noogies and wedgies and, you know, all that good stuff. Bullying it a little bit. It’s okay to bully the optimizer.

It’s a computer program. It doesn’t have feelings. So if you like that kind of content, subscribe to the channel. Like almost 4,000 other people have. I mean, 4,000 people can’t be wrong.

If I’ve learned anything from elections the world over, 4,000 people cannot be wrong. All right. Cool.

With that sort of crushing optimism, we’re going to sign off here. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.



One thought on “Tweaking SQL Server Queries To Induce Parallel Execution Plans

Comments are closed.