Fun Query Plan Friday

Fun Query Plan Friday



 

Video Summary

In this video, I dive into an absolutely fascinating query plan that Erik Darling from Darling Data explores on a Friday afternoon, perfect for winding down and enjoying some SQL Server shenanigans. We delve deep into the intricacies of a segment top operator within a cross-apply, revealing how it handles ties in user IDs and dates, which is both amusing and enlightening. The query plan itself is a delightful mix of expected and unexpected elements—like the batch mode hash join running alongside row mode operations, leading to some very peculiar bitmap handling that challenges our understanding of SQL Server’s execution plans. It’s a must-watch for anyone interested in the nitty-gritty details of how SQL Server processes complex queries, especially those involving multiple aggregations and joins.

Full Transcript

Erik Darling here with Darling Data. And today we are going to have an astoundingly obscene amount of fun, looking at what I think is a fun query plan. And since I have all the power here and I can do whatever I want, we’re going to be publishing this on a Friday. So everyone has a fun Friday and gets off to the start of a great weekend. Down in the video description, all sorts of useful links for you and me. You, so you can have SQL Server help or training, and me, so I can get money. It’s a wonderful trade-off. You know, there’s all sorts of labor theories involved here. If you want to hire me for consulting, buy my training, or support this YouTube channel, you are, I mean, I would say you are free to do all those things. You are free to do all those things, but none of those things are free. If you want to ask me office hours questions, the price of office hours questions is going to go up to $10,000 a question at the stroke of midnight, 2026. So get the free ones in while you can. And then, of course, if you like this channel and all the shenanigans therein, please do like, subscribe, tell a friend, all that good stuff.

After a long winter of solitude, I will be springing to life first at the Data Tune conference in Nashville, and that’s March 6th and 7th. And then Data Saturday Chicago, March 13th and 14th. Tickets are on sale for my Advanced T-SQL pre-cons at both of those. So if you’re in the area or just planning on attending, you don’t even have to actually attend. You can just buy a ticket and then have yourself a little Ferris Bueller day, and I won’t know the difference.

Just how many people showed up. I don’t know how many people bought tickets. I guess there’s an extra lunch. I don’t know. Now everyone gets an extra taco. Donate a lunch. You did a good deed. Look at you. What a sweetheart you are. Anyway, it is December. We have all our arms. We’ve got a ghost. Let’s go look at what I think is an interesting query plan.

So there’s an old Paul White blog post about the segment top query plan. And I want to explain a little bit about that before I get into the other parts of why this query plan is interesting. So we’ve got this sort of thing inside the cross-apply here, right?

This whole thing is inside a cross-apply. And we are selecting everything from the badges table where the date column in the badges table equals the max date column for each user, right? So we might expect to see the badges table referenced twice in the query plan because we have to get the max from the badges table here.

And then when we do all this stuff, we have to get the max per user ID, right? Because we’re correlating this here. So we might expect to see a join in this query plan, but whoa, we do not, right?

We have a scan of the one scan of the badges table. There is not a hidden scan anywhere else. I promise you there is nothing hidden anywhere else in here, which is great.

And then we have this segment operator, which I’ll show you the details of that in a moment. And then we have a top operator. So this is all fun stuff.

Now, if we look at the segment operator, we’ll see the segment operator is grouping by user ID. So this is getting the max. This is sort of getting the max date per user ID. We don’t have that aggregation in here just yet, but that’s what the top is for.

And the top is a very interesting top because the top is running as top with ties, right? And there’s a good reason for that, right? User ID and date.

And I’m going to show you the good reason for that now. Now, we have to digress from this query for a moment so I can show you. So if we were to run this query, this is… So, like, the thing whenever you’re finding…

Looking for data like this, what you might expect to see is there is some chance, because user ID is not unique and date is not unique, that there might be some users who have, like, a max date that ties, right?

So, like, you might have multiple badges that you got at the exact same time. And that does actually occur in the data, right? So if we run that same query, but we group by user ID and date, and I’m going to filter out down here and say only things where the count is greater than one, and we run this, you see that there are a whole bunch of users who have gotten a whole bunch of badges at the exact same time.

It’s like they were just awarded all these badges in one go. I think this is because the process in Stack Overflow that awards badges runs at some interval and looks for people deserving of badges and says, hey, you get a badge.

So some of these people got, like, a whole mess of badges all in one shot, right? Back when Stack Overflow was a popular website that people used and, like, you know, like, did stuff on, you know, you get all these badges for doing stuff.

So there are a lot of those. And even if we go a step further, right, and we look at this grouped by an additional element, now we’re also going to group by the badge name, right?

If we run all this, right, we still find that there are people who got a whole bunch of badges awarded at the same time, right? Just look at these top sort of numbers here.

Someone got four popular question badges at once. A bunch of people did. Look at this. Then a bunch of people got these revival badges.

I don’t know what the revival badges mean. I don’t know. Honestly, it’s been a long time since I cared about what any of the badges mean. But, you know, they got a whole bunch at the same time. Good for you, right?

You revived and populared a whole bunch of stuff. But let’s come back to this query because this is the one I think there’s interesting things in. And so we’re going to highlight this and run it with the whole thing this time, right? We’re sort of done with that initial middle section, the stuff going on in the cross apply.

And where this one gets interesting to me is we’ve already sort of examined this part, right? We have the index scan. We have our segment top section over here.

But where things get kind of neat to me is that over here we have a batch mode sort, right? You know, that’s maybe not the biggest deal. No, that’s not very exciting.

But over here we have a batch mode hash join, right? So we can see this hash match running in batch mode. But where things get kind of weird and funny to me, and this goes back to, I don’t know. I don’t know how far this went back, but so SQL Server has had bitmaps forever, right?

We have row mode bitmaps and we have batch mode bitmaps. And if we right click on the hash join operator, we’ll see that the hash join is a bitmap creator and that we created optimized bitmap 1025, right?

So this is all like running and created in batch mode. And you would normally expect to see a bitmap applied like way down over here. But apparently batch mode bitmaps do not agree with rowstore indexes, right?

There’s just like they just can’t sort of line up on that stuff. That’s just sort of, I think it’s just a bit of a technical limitation with things here. So because this runs in row mode, we can’t apply the batch mode bitmap over here.

If this executed in batch mode, then I think we could. But we got row mode execution on the reads. So we don’t get the batch mode bitmap pushed all the way down here.

Instead, the batch mode bitmap runs in this kind of funny row mode filter, right? So we have this filter operator over here. And that is where the bitmap gets applied, which is very strange, right?

It’s just a weird thing to think. Like, what happened to this bitmap? Like, did you get converted to row mode? Like, are you still batch mode? Can you be created in batch mode and you can only run in row mode?

Like, what is happening in here? It is a very, very strange, very silly query plan to me. So, of course, like when you see a lot of these plans, right? And you’re like maybe troubleshooting performance.

This might stick out to you as looking very strange. Because I always tell people that you should always be suspicious of filters and query plans. Because like almost any time you see one, it signifies some like non-relational thing that had to be expressed via a filter. It could be a very long, complex, almost non-sargable where clause.

It could be something where like you need to filter on a row number or filter on like a filter on a count. Like if we go back to the count query down here, we’ll see there’s a filter operator. But that makes total sense for there to be a filter operator here.

Because we don’t know how many rows grouped, right? We don’t know how many of the grouped rows will have a count of more than one. So, we have to generate that whole result and then filter out after we have like done the counting.

So, like sometimes you get a filter operator and you expect this when you would like expect it. Like things like this. But other times you get these weird, you get these filter operators.

And I always tell people to be very, very suspicious of filter operators. In this case, this filter operator is not that big a deal. I mean, granted it would be nice if, you know, any of this could run in batch mode.

Or this did run in batch mode and we could push the bitmap down further. But instead we get this row mode, batch mode bitmap thing over here. And we do a whole bunch of filtering.

You’ll see that this reduces rows from 1, 333, 900 to 10,000 rows. But coming back to what I was talking about earlier with the top that runs with the ties, right? Top with ties over here.

What I think is pretty amusing is that this being an inner join and all, right? We get 10,038 rows from this, right? Just from seeking into the user’s table.

But then we get 10,054 rows down here. So some of the people in here would have that same, like, tied max creation date per user ID thing going on. So I thought that was very interesting as well.

Because we end up with 10,054 rows from down here. Rather than eliminating any rows because we only got 10,038 from here. So I just thought this plan was adorable and interesting and had a lot of funny things going on.

And I wanted to talk about it. And I chose to talk about it with you. Because you’re the only people who would care if I tried to tell my wife and kids about this. They would just leave the room.

They might even leave the state, right? Country even. Who knows? I would never see them again. Anyway, happy Friday.

Thank you for watching. I hope you enjoyed yourselves. I hope you had… I hope you… I don’t know. What else? That’s it. Anyway. Get back to drinking, you lazy bums.

Alright. Goodbye. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.