Sancerrely Yours
Video Summary
In this video, I delve into the fascinating world of parallelism in SQL Server queries, specifically focusing on how to identify and address issues related to skewed parallelism. By running two queries—one at max degree of parallelism (8) and another at max 1—I demonstrate how to analyze query performance metrics such as duration versus CPU time. When these values are nearly equal but the query runs in a parallel fashion, it often indicates that something is amiss with the parallel execution plan. I walk through an example where a parallel query uses only a fraction of its allocated threads, resulting in inefficient use of resources and potentially slower performance compared to its serial counterpart. To illustrate this point, I also show how changing a nested loop join to a hash join can significantly improve query efficiency by distributing rows more evenly across available threads. This video is packed with insights for anyone looking to optimize their SQL Server queries and understand the nuances of parallel execution plans.
Full Transcript
This is the last demo that I have right now in this series. And it is, I think, one of my favorites because it deals with interesting problems in SQL Server regarding parallelism. So what I’m going to do is I’m going to run two queries. The first one is going to run at max.8. The second one is going to run at max.1. And I’m going to kick this off. And then I’m going to start babbling for a little bit because these things don’t finish terribly quickly. So one thing that I always look for when I’m tuning queries is I look at the duration of the duration of the query versus the CPU that the query used, which can be instructive in a few different ways. If duration is much longer than wall clock time, it can indicate that the query was blocked or had something else weird going on with it because we did not burn CPU the entire time the query was running. Another thing that we’re going to run at max.1.
So what’s very interesting is if CPU time is equal to duration, but the query went parallel. Well, that just shouldn’t be because under normal circumstances, we should see wall clock time be some divisor or be some product. I suck at math. Remainter. I have no idea. But like let’s just say that we had a query run for 20, 40, 60, 60, 80 seconds. That would make a lot of sense to me. But if we have a query run at max.4 for 20 seconds and then max.1 for 20 seconds. Well, that tells me something is amok with the parallelism. So what I’m going to do is go over to the execution plans and show you a little something. So we have the parallel query, which ran for 19.6 seconds. And we can validate that. We can make sure that our eyes are not deceiving us by going to the query time stats. And we see CPU time, 19,603 milliseconds, which is 19, excuse me, 0.6 seconds and elapsed time 19,601.
Not so good. The serial version of that query runs for 18.9 seconds. And we can again validate that by going to the properties and looking at the elapsed time. We have CPU time 19,606 and elapsed time 18,967. So that adds up. It does not add up that we had this query run at dop8 that it had three parallel branches reserved 24 threads and only used 17 of them. So we had 24 threads reserved, but we only ended up using 17 threads. But I don’t think we really went all that parallel. Let’s start over here.
All the way to the right. We have this is serial and this is serial and this is serial. Now we finally go and distribute streams, but we distribute streams for one row. So if we look at what went on here, we took round robin fashion. We took one row and we put it on thread one and we had two, three, four, five, six, seven, eight entirely empty threads. And we sent that into a nested loops. And we still have the same thing. I’m going to keep this open. Actually, we went to a nested loop join with thread one had 108, 107, eight, seven.
That’s an eight digit numbers. That’s almost 11 million rows on one thread. And we stayed on one thread there and it’s one thread there and it’s, oh, 27,000 rows, but still on one thread and 27,000 rows still on one thread. And over here, oh, we’re down to 21 rows on one thread and 21 rows on one thread. And, oh, well, we, okay, things, we repartitioned streams and things. Got a little better one, five, six, 11, 12, 14, 20.
Maybe I, maybe I added that up wrong somewhere, but yeah, it should be 21, right? Six, eight, nine, 14, 15, 20, 21. Okay. I did. I was off by one. Most common error in all of computing off by one. But yeah, so we had this, this massively, I mean, not massively massively, but we had a good chunk of parallelism for this query.
And we, we used none of it. Oh, it just makes you want to quit. This makes you want to quit. And so the, the serial version of the query actually finishes a little bit faster because we don’t have all the overhead of, of going parallel.
We don’t have to distribute streams or repartition streams or do, or, or gather streams. You don’t have to do any of that. We, we just use one thread the whole time and it kind of doesn’t matter. Right? Use one thread here. There’s only, only one thread to be used. Who cares?
Like this. And we, that we get a much more reasonable thread reservation of one. There’s no need to have thread stat here. We have a serial query. We’re, we’re only going to use one thread.
Ain’t that something? So that’s a bummer. That’s a, that’s a big, big bummer. Now, um, this problem does, does get exacerbated quite a bit by the nested loopiness of the joins.
So what I want to show you is if we change, uh, this to be a, in our hash, hag, hag join. I wish we had a hag join. I would love to join a hag.
If we run this, it finishes very quickly. That finishes in one second. Uh, and we don’t have all of the same weird loopy issues that we had, uh, with the other plan. If you look at the properties here, uh, actual number of rows, pretty, pretty even spread.
Uh, not like perfect, but I’m willing to accept that over having 11 million rows on a single thread. And, uh, I don’t know, this, this looks a lot better to me. So, if you have parallel plans and, uh, you look at, you’re looking at the execution plan and you notice that CPU time and duration are just about even, you might want to go digging through your execution plan a little bit and start looking for, uh, skewed parallelism.
It’s not going to always be that every single row ends up on a single thread, but quite often it does end up that way. That problem will get exacerbated tremendously by, by, uh, looping constructs, nested loops, key lookups, things like that. Because, um, you’re basically sending one row to do, one thread to do a whole lot of work over and over and over again.
Uh, and the more, the more rows get involved, the worse it is. Um, for, uh, queries with, uh, with smaller row counts that may not, uh, may not flare up so, so terribly. But, gosh darn it, for queries that, that have to process a lot of rows, you can certainly run into some pretty nasty performance issues.
So, uh, that’s my advice there. If you have a parallel query, CPU time and duration are neck and neck. Start digging.
Again, you might find that, um, hinting the query to be maxed off one gives you just about the same execution time. Um, and if it’s a, you know, you see a lot of looping constructs in the plan, try saying, I want a hash join instead. Because sometimes, SQL Server just chooses poorly.
And when it does, we get yelled at. This is the sort of stuff that we have to resort to. Looking at query plans.
Hash joins. Hints. Oh. Anyway. Thank you for watching. And I’ll see you some other time. I’m going to take a little break. Recording things.
It’s been a long two days of, of recording stuff. And, um, yeah. That’s all. Anyway. Thank you for watching.
I hope you enjoyed, uh, most if not all of these videos. And, um, sorry if some of them only come out one ear. I apologize for my unruly children.
I’m going to go start drinking. Because it’s daylight. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.