A Little About DOP and Bitmaps In SQL Server

A Little About DOP and Bitmaps In SQL Server



Thanks for watching!

Video Summary

In this video, I delve into the effectiveness of bitmap operators in parallel queries and how they can impact query performance. By analyzing a specific example from StackOverflow’s database, I explore the relationship between the degree of parallelism (DOP) and the efficiency of these operators. Through various DOP tests, I demonstrate that while higher DOPs may not always double execution speed, they can significantly reduce processing time, especially in larger datasets where row counts are more substantial. I also highlight how adjusting DOP to optimize bitmap performance can lead to better overall query efficiency by reducing the amount of data processed and joined, making SQL Server’s workload management more effective.

Full Transcript

Erik Darling here with Darling Data, the one and only, as far as I know. I did recently get an email that someone was trying to file a patent on Darling Data, and that if I paid this random lawyer a bunch of money, he would prevent it from happening. The crap you get from LinkedIn, you sign up for LinkedIn, and you put anything other than, like, I’m just an employee, man, leave me alone, in your title or experience, or like, you put a business on there. This is the kind of garbage that you have to look forward to. I also get lots of spam emails about VOIP phone lines from my office. I’m like, you’re gonna be real disappointed. I don’t know, some guy keeps trying to sell me wires, but mostly it’s just a bunch of emails like, hey Erik, call tomorrow. And I’m like, no, call never at any time. So, yeah, life on the internet sucks. We all knew that. Cool. In today’s video, we’re gonna talk about DOP, degree of parallelism, in bitmaps. I don’t think bitmaps is an acronym for anything. If it is, I’m way out of line. So, oops, I hit the wrong button. That was escape button. Hey, there we go. Pretend that didn’t happen.

But before we talk about the exciting world of DOP and bitmaps, the rich tapestry of DOP and bitmaps, we’re gonna talk about how you can keep me alive. You can sign up for a low-cost membership for the channel. It’s like four bucks a month. It’s a nice way to say thank you for getting, like, five free videos a week. I mean, I realize that at four dollars a month, they are no longer free, but the average cost per video is still really low. If you do that math. If you are unable to fork over four dollars a month and you would like to say thank you or show some appreciation in a slightly different way, you may subscribe to the channel exactly once, unless you have a, like, a bot farm or something, but, which would be cool, but wouldn’t really help me. Wouldn’t really help me with this part. You can also like the videos and comment on the videos in case you weren’t aware that on the internet there’s thumbs up buttons and ways to type into things and say words and have them there permanently.

If you need help with SQL Server, these things, they’re fun for me. Doing health checks, performance analysis, hands-on tuning. I mean, the emergencies are less fun. They’re more stressful, but mostly for you, because, I mean, I enjoy myself, even during an emergency. It puts me in a calm and soothed state. Or if you want developer training so that you have fewer emergencies and you freak out less, that would probably be great for you. I can do all of those things at a reasonable rate. If you would like some low cost, high quality, the highest quality, I mean, golly, golly, I can’t, I can’t even begin to stress how high quality this training is. You can get 24 hours of it for about $150 for life with that, that coupon code. There is also a link for that in the, in the show description.

If you, there’s a link for the membership, in case I forgot to say it, and a link for the training in there. So there’s just links galore in those descriptions, in case you’re unaware that on the internet you can link web pages from another place. So there’s that. If you want to see me live and in person, you have two options right now. You can come to Seattle November 4th and 5th and see me and Kendra do two days of performance wondering on, not wondering, wondrous performance tuning on SQL Server. You can, you can do that. Or you can tell me about an event near you that you might actually go to. And I can say, hey, event, do you need a pre-con speaker? And would you like it to be me? And they might say yes.

And then I might show up and do it. So that’s what you got right now. With that out of the way, let’s go do some SQL. Let’s do a query, my friends. You know, query real hard here. So the purpose of today’s video is to tell you, tell you a little bit about, uh, DOP and bitmaps. So the first thing you should know, uh, and this is, this is outlined, um, in a, in a fairly old Paul White article, uh, where even in a serial plan, uh, that gets a hash join, you, there’s like some invisible bitmap. Uh, I can’t imagine that it does much of anything because, um, well, you’re, you’re going to see why I imagined that here, but, uh, he’s way better at explaining it than I am.

Uh, what I’m going to try to explain to you is how the degree of parallelism of your query can make bitmaps more or less effective. And basically the, the tall and short of it is that when you have a tall DOP like eight or 16, bitmaps are more effective than if you have a short DOP like two or four. Uh, I skipped six on this one because six didn’t really change much of anything.

So I’ve got the same query four times. I got max DOP two. Count them off. Do pushups. I want you to do DOP pushups right there. All right. So you owe me two pushups and here we have max DOP four. And this is where you owe me four sit-ups. All right. And here’s DOP eight. And this is where you owe me eight jumping jacks.

All right. And then we have DOP 16 and this is where you owe me $16. Remember those numbers because we’re going to look at them again when we look at the query plans too. All right. So way up at the top here, we have a parallel execution plan.

And, uh, there is a bitmap in this plan. And remember, if you’ve watched other videos of mine about bitmaps, you know that the bitmap, well, it gets created here. Where it gets used is typically down somewhere in here. Some bitmaps can get stuck at the repartition streams. Some, some bitmaps get pushed right down to the, uh, the table that we’re hitting.

Which is the case for this one. We have this predicate. We have an in-row bitmap, which means that SQL Server can start filtering out rows way down when it starts reading pages that don’t, that, like, obviously don’t match the bitmap. Even with, you know, bitmap in place under certain conditions, you still have to have a residual thing here. We don’t though.

Because, uh, the ID column in users, uh, is a, uh, not nullable integer. And the, uh, user ID column in the badges table is also a not nullable integer. So the not null integerness of those two columns, it would also work for not nullable big ints, uh, and some few other data types.

It’s not strings, uh, but numbers generally. Yeah. Uh, you could get similar behavior where you don’t need the residual predicate there because SQL Server’s like, I got it. So, with that out of the way, uh, let’s look at how effective that bitmap was.

And the answer is, not very. Uh, that bitmap was not able to filter out most of these rows. We go from 2465710 to 2465701.

So that bitmap at DOP2, where we, where we built exactly, like, two hash buckets, uh, well, uh, that got rid of nine rows. So, not very good there. Um, I don’t know, maybe SQL Server had, like, more, maybe SQL Server thought that would do better.

I don’t know. But that, that clearly stinks, right? Not, not a good time there. If we go look at the DOP4 query, well, we got a few more out.

At least we got, I don’t know, about four, 39 rows that time, right? 2, 4, 6, 7, 5, 1, 0 to 2, 4, 6, 2, 4, 6, 5, 6, 7, 1. So, we, we got down by about 39 rows there, I think.

If I’m, if I got, if I got my finger maths right, um, that, that sounds good. 39. Maybe I’m off by 10, one way or the other. And the high, high school dropout thing makes on-the-fly math a little tough.

So, DOP2, right here, this one, right? Remember, you owe me two push-ups once again. Uh, DOP4, again, not terribly effective.

You owe me four sit-ups here. And then if we scroll down a little bit, we have this query at DOP8, where you owe me eight jumping jacks. This one does a little bit better.

Not a lot better. Slightly better. Uh, this one at least changes the second number, right? We went from 2, 4 to 2, 3. So, that, that one actually got us down a little bit.

So, the DOP8 bitmap so far has done the most work. Good job, DOP8. Now, in real life, most queries, uh, I would be happy to stop at DOP8 and be like, well, you know what? Uh, you know, there’s usually kind of diminishing returns after DOP8.

Sometimes it’s just, it’s just not as good, right? Like, you just, like, you’re not, you don’t keep scaling linearly as you add threads after DOP8. The stuff like that totally happens all the time.

I know, because I, I, I test stuff like this all the time. Uh, it’s part of my job, figuring out what the best set of query stuff to do is so that they go as fast as they can. Sometimes part of that is testing higher DOPs to see if anything remarkable in the plan changes.

Now, for this plan, and I, look, for this plan, right? And I, I’m, I’m totally with you on this. Uh, it, it, it finishes and, well, I mean, so let’s start back up at the top a little bit.

But this one at DOP2, right, because it only uses two threads, this takes about three seconds. So DOP2, obviously not a very effective use of parallelism or bitmap. This one down here at DOP4, that, that did get quite a bit better than the one at DOP2.

Just still not a very effective bitmap. This one’s 1.18 seconds, 1.181 seconds. At DOP8, we do still just about twice as good.

We go from 1.1 to 673. That’s close enough to twice as fast for me. But that’s still without even getting rid of twice as many rows with the bitmap, right? So usually when I’m dealing with queries that do this sort of thing, the differences are far more profound.

There’s a lot more rows flowing around. This is just a StackOver 2013 database with a little bit of stuff in it. The data that I deal with is typically much, much larger, which makes things like testing higher DOPs, like, a lot more attractive in a lot of scenarios.

Because I want more, I want to take those, like, you know, it’s like from, you know, if you’re at DOP8 and you have an 800 million row table, you’re still looking at, like, 100 million rows per thread. At DOP16, you’re at 50 million rows per thread. And that you can do a lot more work across that, you know, it’s a lot more effective spreading those rows out further.

So, like, that was, again, that was high school math, right? That was straight division, baby. Mmm.

Felt good. Felt real good. This one down here, it gets faster, but not twice as fast. We go from about 700 to about 4.5, which, you know, give or take 100 milliseconds, that’s still a pretty good reduction. And, you know, if in a much, much bigger query, you would find, like, that that difference might be more profound.

It might be, like, you know, 4.4 and a half minutes versus almost 7 minutes, right? So, like, that would be noticeable if you were dealing with, like, a big process that did a lot of ETL and moved a lot of stuff around. You’d be, like, you know, even bigger time span.

Let’s say that was almost 7 hours versus 4 and a half hours, right? There’s, like, a lot of timescales where it would make a difference that just doesn’t make a difference in milliseconds. But look at this one, right?

I don’t know how much the bitmap made a difference here. It’s hard to tell. But if you look at this part, oh, gosh darn it, tooltip, why do you do that to me? Why do you have to bury it to the hilt?

Ugh. Look at this. Look how much more effective that bitmap got. That’s a little over 700,000 rows, I think.

Right? A little over 700,000 rows. So, going, like, from here where we got, like, I don’t know, maybe, oh, that’s the wrong button. There we go.

Control key. Now we got it. From here where we go from, like, 2.4 million to 2.3 million, it’s not that great. But this one down here where we go from 2.4 million to 1.75 million, or 2.4 to 1.7, that’s probably a much more fair comparison. Because that is 2.46 and that’s 1.75.

So, you know, decent baseball player money, I guess. If you look at that, then, like, that higher dot contributed to a much more effective bitmap operator. So, what is the takeaway here?

If you have parallel queries like this, and you’re able to, you know, sort of fit this basic plan shape where, you know, you probably scan and index because you’re not going to do a lot of seeking with hash joins usually. You can if there’s other indexing stuff involved and there’s other where clause stuff involved. But just, you know, let’s just say you have a big parallel seeker scan over here, and you have a big parallel seeker scan down here, and somewhere in between them, you have a bitmap operator, which is going to look a little bit different if your queries are running in batch mode.

If you’re running in batch mode, you’re not going to see a bitmap operator in the query plan. You’re going to have to, like, right-click on the hash join operator and go to the properties, and it’ll say, like, is bitmap creator equals true if it did create a bitmap. So, this is like a row mode plan.

If you’re running stuff in batch mode, it’s going to be a little bit different. But let’s say that you have a seeker scan and then a bitmap and then another seeker scan, and there’s a hash join involved. It’s something to look at how effective that bitmap is.

Usually, the lower the number of rows that come out compared to – so, like, bitmaps don’t always do cardinality estimation. Or, rather, bitmaps can make cardinality estimation look like it’s just really wrong here. But really, it’s the bitmap being really effective that makes it look really wrong.

It’s just, like, the better a bitmap gets, the worse this estimate looks, right? Because this thing is basically just, like, table cardinality, right? Because there’s, like – we’re not, like, a seeker scan or a predicate down here.

We’re not – there’s no, like, where clause that’s saying, like, you know, where users’ reputation is greater than 100,000. There’s nothing to, like, you know, say it might be less than table cardinality, but we’re reading out of there. The only thing that you get is the bitmap, and Seawil server may not know how effective that bitmap’s going to be, obviously, because it uses it at DOP2 where it sucks.

And down here – so, like, the worse this estimate looks, the better the bitmap was usually, right? As long as there’s no other predicates involved that might have been right or wrong or somewhere in between. So, when you’re messing with queries that have this particular pattern in them, sometimes it is worth trying to adjust the DOP up to a higher number to see if, A, the query gets meaningfully faster, right?

Because adding more CPU threads in, spreading the rows out on those CPU threads generally can get more efficient up to a certain point. The other thing is that sometimes it can make that bitmap way more effective, which means you’re reading far less out of this table, and you’re putting far less into the join up there. So, you know, there’s a lot you can do that, you know, makes SQL Server’s job a lot easier as bitmaps become more effective.

And again, going back to the example of, like, you know, bigger ETL processes, larger overnight batch stuff, even, like, big reporting queries that may run for a long time, being able to get this sort of impact on both the overall time and the effectiveness of the bitmap, which probably does contribute to it a bit, at least a little bit, can really help with performance.

So, keep all that stuff in mind as your tuning queries, because, gosh, if I have to do it, you have to do it, too. It’s not fair if only I have to do all this stuff, remember all these things. I have to write stuff down and record videos, and I have to, God, I have to read my own writing sometimes, and I have to watch my own videos sometimes, and the one thing I hate doing is hearing my own voice.

Not fun. So, anyway, that’s about enough for this one. Thank you for watching.

I do so very much hope you enjoyed yourselves, and I pray to whatever might be out there that you learned something. If you didn’t, I don’t know what you did for the last 16 minutes. You don’t have to tell me.

Personal choice. Just sit here and watch this if you’re not learning anything. And so, enjoy yourselves. Learn something. Hire me to do stuff. My rates are reasonable.

I am going to trademark that phrase, because I feel like it is now synonymous with the Darling Data, Erik Darling brand. And I will see you soon in another video where I will talk about something with equal effervescence. So, there we go.

Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.