Signs Your SQL Server Execution Plan Is Lying To You: Merge Joins

Mergin’ Around

Video Summary

In this video, I delve into a fascinating aspect of SQL Server execution plans—merge joins that can be misleading when observed in the plan cache. By examining an example where a query took over a minute to execute, I demonstrate how adding indexes can reduce overall runtime but still leave significant time spent on merge join operations. I explain why many-to-many merge joins are particularly problematic and show you how to identify them using SQL Server’s execution plans and tools like SP_BlitzCache. This video aims to equip you with the knowledge to recognize and address these tricky scenarios, ensuring your queries run as efficiently as possible. Stay tuned for more updates on the latest SQL Server performance tips and tricks in my next video!

Full Transcript

Hello! We are here in this video to talk about how merge joins can be a little weird and misleading when you see them in the plan cache. Now, I’ve jumped ahead a little bit here, and the reason is because there’s really not a whole lot of reason to have you sit around staring at queries that run for 30 seconds. So what I’m going to show you is is the execution plan prior to adding these two indexes. If we look over at this execution plan, it runs for about a minute total. If we zoom in over here, about a minute and two seconds. There is a fairly, I mean, this would make an okay slow repartition streams demo too, right, like 13 seconds there. But we can see that we spent a fairly significant amount of time in these two indexes. And these two sorts, and they spill off the disk. Things generally go poorly here, you know, going from 13 seconds to 26 seconds, and from 5 seconds to, geez, almost 10 seconds there. That sucks, and that takes a long time. But if we add up 26 and 10, we are about 10. Let’s just call that 36 seconds. We still spend 20 seconds in the merge join. And the reason that we do is that we spend 20 seconds in the merge join.

So we do that. Now this is where the plan cache doesn’t lie to you. If we hover over the merge join, and we scroll down here, and we look at this, we can see that many to many is set to true. And what many to many means is that there are duplicates on the outer side of the merge join. This is where many to many matters. Now, SQL Server has choices. It could have stuck some sort of aggregation on one side or the other, and it could have eliminated duplicate values to make it a non-many. Many to many to many join. You won’t see this if you join a unique index to a non-unique index because this won’t be many to many. It’ll be one to many. But anyway, for this merge join, we can see that it is many to many. And if we go into the properties of the merge join, and we go into actual IO statistics, we can see normally if you were to turn set statistics IO on, you would see in the output a work table for the merge join. Newer versions of SQL Server have that IO information in the execution plan. So I find myself less and less using the set statistics time and IO stuff on newer versions of SQL Server because so much stuff is in the execution plans.

But if we go and we look at the logical reads that the merge join is doing, you can see just how much work went into, was done on the work table. Right? Like a lot of work went into that work table. That work table did work. Right? Work table did work. Now, to sort of screen out some of the noise here, I’ve, like, if you look at the merge join, when we look at our where columns, we can see it’s where the post ID column on the comments table is equal to the votes, the post ID column in the votes table. If we go back and look here, I created indexes on comments and votes that lead with post ID. So these columns will now be in the order that the merge join wants them in. We don’t have to sort them.

Now, if I go look at the execution plan after that happened, we can still see we improved the speed a bit. Right? It’s now down to about 40 seconds from about a minute. So we shaved off about 20 seconds. Unfortunately, that 20 seconds was all over here. We didn’t have to sort data. We know those two sorts didn’t spill. We didn’t have to, you know, because we had the data in order, we didn’t have to sort it and spill a bunch of stuff to disk. But we still have a lot of time spent in this merge join. And this merge join is still of the many to many variety. And if we look in the properties again of the merge join, we go here properties.

And we go to IO statistics, we do the exact same amount of work in that hidden work table that gets spun up to deal with the duplicate values in there. So we do quite a bit of work in that merge join regardless of indexing because it is the many to many variety. SP blitz cache will warn you about many to many joins, but because we don’t have information about the per operator breakdown the way the execution plan is, we don’t have that information in the cache plan.

I can’t tell you how big, bad, ugly, or consequential the work table for the many to many merge is. So if we look at the query plan here, we’ll still see that the many to many merge join is true. But if we go to the properties like we did in the actual plan, we won’t have all that good information about the IO work that it did, which is, you know, a bummer, right?

It’s not fun stuff there. But at least Blitzcache will warn you about many to many merges. It’s up to you to figure out if you have a plan with lots of merge joins in it, you will have to do some hovering. It could be one, it could be multiple. And there’s not really like a whole lot of good hint in the metadata about how many, about, you know, how much work the merge join did. There’s just nothing in here that, nothing in there that tells us really.

It’d be nice. I mean, I say it would be nice, but it’s kind of a niche thing anyway. I don’t think it’s worth the engineering time that it would take to add a column to like work table IO or something, work table reads or something in the DMVs. I don’t know. Maybe it would be.

Maybe, maybe work table reads would be the most fantastic piece of troubleshooting telemetry that SQL Server performance students have ever seen. I kind of doubt it though. I have my doubts there. I have some doubts. Anyway, that wraps up how many to many, or how merge joins can be tricky. Remember to keep an eye out for that many to many thing.

Blitzcache will show you under expert mode if your merge joins are of the many to many variety. And sometimes, you know, you know, sometimes indexing can help other times, you know, making sure that, you know, one input, you know, it gets aggregated in some way first can, can take away the many to many thing. You might have, you know, a unique column that maybe you didn’t tell SQL Server was unique, right? Like you have a regular index on it, not a unique index on it or something. Or maybe SQL Server is just choosing the wrong join type and you need to dissuade it from that in some way. Maybe use one of those awful query hints that people get all worked up about. Me, I love query hints. I love them to death. Put them everywhere. Sprinkle them everywhere. Do what I want. Goodbye. I don’t have time for this. Anyway, that’s it for that. In the next video, we will start talking about, gosh, more new stuff. I have so much new stuff to tell you about.

It’s going to be fantastic. It’s going to be the newest stuff you have ever seen in your life. Anyway, I will see you in the next video. Thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Mergin’ Around

Video Summary

Full Transcript

Going Further

Related Posts