The SQL Server Performance Tasting Menu: How Multi-Statement Table Valued Functions Hurt Performance

You’re A Drag


Video Summary

In this video, I delve into the common pitfalls of writing bad code in SQL queries, specifically focusing on joins and how shortcuts can lead to suboptimal performance. I walk through a complex query that uses OR conditions within joins, highlighting its inefficiencies and explaining why such constructions often result in poor execution plans. By dissecting the query plan, I demonstrate the impact of constant scans, hash joins, and nested loops, ultimately showing how breaking down the problematic join with UNION ALL can significantly improve performance. This video serves as a reminder to always be cautious when using shortcuts in SQL queries, especially those involving joins, and encourages viewers to consider alternative approaches for better query optimization.

Full Transcript

You all write bad code. Every last one of you. That’s why you’re here watching these. Joins are a part of queries that are particularly sensitive to bad code. And when I say bad code, I mean shortcuts. Shortcut code. It’s like taking a shortcut and getting stuck in traffic. Everything that’s a shortcut to you, anything that makes writing a query easier for you, often makes it far more difficult for the optimizer.

Things like, you know, functions or case expressions. When you use those relationally, I don’t mean like in a select list. I mean like, you know, if you use those in joins and where clauses, you can severely mess things up. It’s not pretty. on the or tiny.

something really funny? a pretty short. And he that petit a. It this is态. at. or build and joins. And this is a somewhat complicated query. Not like very complicated.

It’s just a lot of stuff. There’s a lot of things going on in it. There’s this table which is going to get us information about questions, and then this CTE which is going to get us information about questions, and then this CTE which is going to get us information about answers, and then this CTE which is going to make sure that we did not self-answer anything.

We’re going to join those together. And then this CTE which is going to join to the votes table, and this is where our problem begins. We’re going to join votes on the answer ID being equal to the post ID, or the question ID being equal to the post ID.

Whew. Yeah, let’s run this query. Let’s see just how awful this is. Because I’m willing to bet that on the scale of awfulness, I mean it’s not going to take an hour or else I wouldn’t be here. I’d be doing something else. I’d be taking a bath letting Calgon take me away.

Now, in query plans there exists a chunk of XML which can help identify when stuff like this is happening. Unfortunately, it shows up for a lot of things. I was going to write a rule in it for SP Blitzcache, but unfortunately the XML parsing that it would take to correctly identify when this happens in a join would have been pretty bulky and unwieldy, and there’s enough going on in there already.

I mean like you can just look at the query text and it’s that you see a join with an OR in it. Just smack whoever wrote it as hard as you can. All right. So that runs for 14 seconds. All right. We will go look at the query plan. It is not a fast query.

14 and a half seconds. 14.573 if we’re going to pretend to be exact here. And where does the problem start? Boy oh boy. Where does it start?

Well, we have a lot of stuff going on over here. But by the time we get anywhere interesting, we get up over here, we hit about 4 seconds.

Hmm. Hmm. Hmm. Hmm. Hmm. Hmm. And then when we get to here, we are now at 14 and a half seconds. And if we scroll all the way down and all the way over, we can start to see where our problems arise. So let’s take a quick step back.

Notice this hash join right here. The number of rows that come out of this hash join is 205 4907. If we boogie on back down to where we just were, we have two constant scans that each produce 205 4907 rows out of one.

A slight misestimate. Itty bitty misestimate. So that happens.

We get a constant scan here. And SQL Server is like, ah, well I can turn that into fewer rows. And I can take that 2 million rows.

205 4907. That’s about 2,500,000. And I would say, and I bet I can, I can squish that down into less, but we, we, we never actually do. We, we concatenate that and we, we continue to make a poor guess.

So 4, 4, 4.1 million out of, out of two. And we, we have that again here where we spend about six seconds sorting 4.1 million rows. All right.

You can see, you can see that happening. Those order buys, right? Fun stuff. And then we spend another 200 or so milliseconds trying to merge things together. And we do, we do a pitiful job there. And then we do with this index seek right here.

This doesn’t just happen once. Of course, this happens 4.1 million times. So the reason why we spend about 10, about 14, about what was it?

Let me make sure that I’m being accurate here. The reason why we spend about 14.5 seconds in this branch of code down here is mostly trying to, um, create a unique set of, uh, rows from the results of that hash join. So that hash join is actually what’s feeding into these constant scans, right?

That’s join right here. And then all those rows emit from over here. We do this nested loops join 205, 4907. And what happens over here?

We execute this index seek. Uh, come on. Where are you, Jimmy? Number of executions, 4109814. So lots of good fun times there, right? Lots of awesome stuff.

And we can avoid all of this awful, awful work if we just break that join up. So we’re not going to touch any other part of the query. This is going to stay the exact same.

This is going to stay the exact same. This is going to stay the exact same. What we’re going to do is take that or and have one side of it represented here. Union all.

Have another side of it represented here. And then we’re going to do the exact same thing selecting out of it. So let’s do that. Let’s highlight all this. And now let’s run this query. And I’m going to bet that this is faster.

Not only because I’ve done this demo a bunch of times, but because it just usually happens when you take an or out of a join that things get better. So let’s go find where that was happening or let’s go look at the plan to see what happened instead of all that malarkey.

Well, pretty nice. We have one scan of the votes table over here rather than 4.1 million Sikhs of the votes table. So for all you people out there who get on your obviously high horses and talk about how Sikhs are better than scans.

Well, not always. Not always. Not always.

If Sikhs were always better than scans, then why would they even build scans into the product? What would be the point? If Sikhs were always faster, why would there be scans?

Like why would there be an intentionally slow thing? Why? Just to mess with you?

No. I don’t think so. Anyway. Whenever I’m tuning queries, whenever I’m looking for obvious anti-patterns, things to start picking apart, or conditions and joins are one of the first places that I stop and look because they will almost always be a pretty big tripping point and a great way to identify when they are, you know, in a query plan when they’re causing issues.

I’m not going to rerun this thing and make you sit there for 14 seconds again. But when you see that pattern in query plans where you see constant scans and all this other stuff and nested loops in a Sikh, that’s usually when I start getting nervous and when I start wanting to rewrite things either as union or union all, kind of depending on, you know, what kind of mood I’m in.

Well, you know, union makes things distinct and union all does not. So there’s generally a penalty with union distinctifying things. We’ve spent a lot of time in some other videos talking about the pain that comes from distinctifying things, but there are times when distinctness can be useful.

So if you’re going to choose between union and union all, I usually start with union all, see how it goes. Then if that goes successfully, I might also try union just to see if eliminating duplicates buys me anything. And if not, then switch back to union all.

But always test between one and the other. Make sure that you realize that there is a logical difference between the two and that there are times when you might need duplicates to get correct results. Anyway, that’s all I have to say here.

Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.