A Little About FORCESEEK Hints In SQL Server

Thanks for watching@

Video Summary

In this video, I delve into the world of tuning queries that use OR clauses in joins, specifically focusing on a scenario where SQL Server’s natural query plan choice is far from optimal. I explore why SQL Server might choose such a suboptimal plan and demonstrate how forcing index seeks can significantly improve performance. By walking through detailed examples and explaining the reasoning behind each step, I aim to provide practical insights into when and how to use hints effectively in your queries. This video is part of my ongoing series on query optimization and performance tuning, designed to help you navigate the complexities of SQL Server and ensure that your database operations run as smoothly as possible.

Full Transcript

Erik Darling here with Darling Data. In case you’re wondering why I am recording at night, well, it is because it is Wednesday, September the 4th? 4th. 4th. And tomorrow morning I fly out for Data Saturday, Dallas, because I have to be there to teach a pre-con and stuff. And I just want to make sure that I have sort of a clean slate of stuff when I come back from that, because I’ve just got a few videos that I want to get nailed down before I leave, both to make sure that I have the blog queue and the video queue amply done, because whenever I’m gone for a few days and I don’t have time to record and do stuff, I need to make sure that my fiduciary duty to you is complete. And tonight’s video, I think this is the 10,000th video I’ve recorded tonight, we’re going to be talking about tuning or clause joins.

And the reason why we’re going to do that is because I have a few videos about joins with or clauses in them, and some of the pain they cause, and some different ways of approaching them. But this one’s a little bit different, because in this one, rather than doing this sort of, what I would consider the Darling Data standard rewrite, where I do a union all of things and work off that rather than working off of, and really doing anything else, it’s pretty much just that. We’re going to look at things in a slightly different way.

The slightly different way is going to assume that we have pretty good indexes in place, because without them, using what we’re going to use in this video is going to be pretty fruitless. But before we do that, as usual, if you like this video, if you like this channel, or this video, or whatever enough to start a membership, you can do that for the low, low cost of $4 a month. Absent $4 a month, you can contribute other parts of your body and soul and mind by liking, commenting, and subscribing. All very noble things to do.

If you need help with SQL Server in any of these ways, or I don’t know, maybe any other way, just don’t call me about like a problem with replication. My rates are reasonable. If you would like some high quality, low cost SQL Server training that can take you from your obvious beginner level to your next intermediate level, and then to expert level and beyond, you can get all 24 or so hours of mine for about $150 for your entire life. Or until I stop paying the bills, because no one gets a membership to the channel. We’ll see what happens.

But I’m kidding. It’s prepaid for like a decade or something. And in 10 years, if you still haven’t watched them, I’m not the one who’s messing up there. That’s you. So, yeah, there’s that.

And of course, this November 4th and 5th, I will be at Past Data Summit with the amazing Kendra Little. And we will be tag teaming two days of performance tuning. What’s a good T word? I don’t have one off the top of my head.

Performance tuning toughies. That didn’t go well. And of course, if there is an event near you where you would like to see me live and in person, tell me what it is so that I can submit to it.

Because if they choose me to do a pre-con, I will probably show up relatively sober. With that out of the way, let’s begin our voyage into performance tuning these ridiculous queries. Now, the reason why I care about this stuff is because these queries are notoriously slow on their own.

This is a join with an OR clause where I have no hints in this query. And SQL Server will every single time naturally choose the most god-awful query plan. And I’m going to break this down a little bit, but not this one, because this one only runs for about 30 seconds.

So, like, depending on the size of the tables, like, this is joining users to posts. The users table is fairly small. So nothing too, too awful happens here.

30 seconds is, you know… Oh gosh, that’s a slow query. I mean, it does just about reach the, like, usual application timeout threshold of 30 seconds. So, you know, it has that going for it.

I couldn’t put that in its dating profile. But the one that I would much rather focus on is down here a touch. And this is where I have joined… I don’t know why you decided to refocus, Management Studio.

This is where I have joined comments to posts. And these are two much larger tables. And things get much, much worse here. All right.

So what I want to point out a little bit about this query plan pattern that I find so noxious is… Look at… We do, like, what I would consider one too many joins.

All right. Not that I have, like, a number of joins in my head where I’m like, Oh, there’s too many now. It’s more that, like, for the particular query that we wrote, There’s a stupid join in here that we just shouldn’t have.

Right? Because we have a nested loops join here that goes to another nested loops join here that goes to the comments table. Okay.

So how did we get from… Oh, sorry. Oh, over here, where I can’t quite reach with my arm because of screen limitations, all the way down here to a seek into the comments table that took six minutes? Good question.

Well, we started by taking all 17 million rows from the post table. Right? That’s this number over here. 17, 1, 4, 2, 2, 0, 0.

17 plus some. And we took all 17 million of those rows and we broke them up into two parts. There’s a constant scan here with 17 million rows and there’s a constant scan here with 17 million rows.

And each of these constant scans represents the… What I’m going to say is the different join criteria that came out of that. So if we look at the query that got written, it’s going to be this one.

So it’s where the p.ownerUserId equals the userId in the comments table or the last editorUserId equals the userId in the comments table. Okay?

So if either one of those columns from the post table matches that one column from the comments table, we need to figure that out. So one constant scan is all of the ownerUserIds. The other constant scan is all of the last editorUserIds.

All right? So coming back over to the plan, you will see 17 million rows come out of this one. And 17 million rows come out of this one. And SQL Server slaps them all together.

So we have 17 million times two right here. And then SQL Server decides to spend 20… Let’s just say about 20 seconds sorting.

Both of those inputs. Why did it spend 20 seconds sorting all of those inputs? Well, because it was trying to remove some rows from those inputs.

So we start out with 3, 4, 2, 8, 4, 3, 3, 8. That is an 8-digit number of rows. And we end up with 3, 0, 7, 6, 6, 4, 2, 9.

That is still an 8-digit number of rows. Granted, we got from an 8-digit number that started with 3, 4 to an 8-digit number that started with 3, 0. But I don’t know if that was quite worth the now about 25 seconds of time that we spent doing that.

And, you know, perhaps, right? Because, you know, that would have an impact on this, right? We spend 8 minutes.

So, like, this goes from 25 seconds to 8 minutes. And we know we spent 6 of those minutes seeking into the comments table down here. Right?

So all of that nested loops join, that’s an 8-digit number of nested loops. That’s 3, 0, 7, 6, 6, blah, blah, blah, blah, blah. And if we look at this thing over here, and we look at the number of rows that ended up per thread, that is just a ghastly amount of work.

In this case, the parallel nested loops join was a pretty rough trick on us. So in all, this whole query, once we finish, you know, with this other nested loops join, we add about another 2 minutes on there.

And we end up taking almost 11 minutes to complete this whole query. The reason why this is so absolutely frustrating is because SQL Server could naturally choose a much better plan, but it just doesn’t.

So one thing that we can always try, assuming that we have adequate indexes in here, and by adequate indexes, I mean stuff like around this makes about sense, where we have the, you know, the tables and columns that we’re joining on all properly indexed for stuff.

We can run these two queries. And the only difference between these two queries and the two that I ran up front, where I stuck a force secant on the post table here and on the post table here.

So remember that first query took about 30 seconds and the second query took almost 11 minutes. But just throwing a force secant on these two queries, you know, that does, does the Lord’s work. By the Lord’s work, I mean makes them faster.

By the Lord, I mean me. I’m the Lord of SQL. I suppose that involves some dancing. So for the first query where, again, this is a plan shape that SQL Server could have completely validly chose. Right.

But it just didn’t. Something actually, you know, I’m kind of interested in doing. Kind of, kind of want to see what’s, if something that I forgot to look at actually was the costing. Right.

So let’s look at the cost of this 30 second query was, oh, why did you go away tooltip? Uh, 2883.55 and the 30 second query was 28. So just around 2800 query bucks a pop.

If we come look at these, these plans, look at the estimated subtree cost of this. 30,000. Right. Look at the estimated subtree cost of this. 11,000.

So this is where things get really annoying. SQL Server think, like costed these two plans out of existence. Even though the first query was 30 seconds and we got that down to four seconds with a query that cost 30,000 query bucks. And then we got this from 10 minutes down to four seconds with a query plan that cost 11,000 query bucks.

But SQL Server won’t ever choose those plans naturally. SQL Server, just the costing for this thing sucks. Right.

And this is why I spend a lot of time either applying for sequence or rewriting queries with the union all technique that I’ve described in other videos to make these queries faster. Right. So just throwing a force seek on it on this saves.

Oh, I don’t know. Uh, let’s just call it about 10 minutes and 38 seconds here. Right. So that’s, that’s pretty good.

And, uh, it, you know, saves about 25 seconds here. So don’t be afraid to play with hints when you’re tuning queries. If a query plan looks stupid to you, ask SQL Server why it didn’t do that.

That’s what the point of these hints are. Uh, if you, especially if you have queries that are joining with an or clause and you, you know, you have good indexes in place. Stick a force seek hint on one of those tables and see where it gets you.

A lot of the time you will get a much, much better plan than SQL Server is willing to come up with naturally. Um, again, it’s the costing mechanism behind this, which is so totally boned that you like, you really have to like, uh, you really have to intervene here. The other thing that I want to bring up with this is that, you know, something that I’ve, I’ve tried to get across to people who watch this channel and people, anyone who will listen to me or read what I say or anything.

Uh, is that, um, never look at query costs to figure out if a query is fast or slow, or if one query is better than another. Never look at operator costs to figure out what the slowest part of your plan is. Always get the actual execution plans and validate what you see.

Because remember for all of those costing mechanisms, there’s no actual counterpart. All the costs are pre-execution estimates and none of those, some of those estimates might make no sense. They can be the entire plan cost.

Like for these two query plans, they cost way, way, way, way more than the crappy slow plans that we got that were costed much lower. Right? Granted 2800 query bucks is perhaps not low to most people, but of 30,000 and 11,000, that’s definitely not low to most people. But it doesn’t matter here because those much higher costed plans are much faster.

SQL Server just estimated the work wrong. Right? And this isn’t, this isn’t necessarily an intelligent query processing thing. And beyond that, this isn’t necessarily something that would have been helped with like batch mode or anything.

Because nested loops doesn’t support batch mode. Even in an adaptive join plan, nested loops happens in row mode. Only the hash, only the, only the adaptive hash join could happen in batch mode.

So this is really something that I think Microsoft needs to work on. But until then, until they stop stapling clown nose features like dot feedback and stuff on SQL Server, it’s up to you and me to make sure that we have good indexes, that we try good appropriate hints in our queries when we detect BS from the query optimizer. Remember, query costs are useless.

Query costs are how we got to these plans. Right? Operator costs are how we got to these plans. And us saying, no, those costs are wrong. I just watched what happened.

That, that’s how, that’s how we, that’s how we get good at tuning queries. That’s how, that’s how we make real impacts on our SQL Server workloads. So anyway, hoo-wee.

Sometimes you just get, get talking and you can’t stop. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video. I haven’t decided, I think I might get these, might get some of this function stuff out of the way first.

I don’t know yet. We’ll see what happens. It’s going to be, going to be fun. I think I’m going to save the sort spills one for last because I need SQL query stress for that. And I’m a little lazy right now.

So I’m going to, I’m going to, I’m going to do these other ones and then we’ll get there. But anyway, thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.