Row Goals: Part 4
Video Summary
In this video, I delve into the world of query tuning for Rogals—those pesky tables that can sometimes cause performance issues. We explore how to identify when and if using a top hint or row goal is appropriate to optimize queries. By analyzing a specific example where SQL Server scans 11 million rows unnecessarily, I demonstrate how adding a seek hint with the top operator can significantly improve query performance. Additionally, I discuss the nuances of using row goals in conjunction with cross apply and for hints, showing how they can help SQL Server make better decisions about execution plans without resorting to overly aggressive hints that might not always be allowed by strict DBAs.
Full Transcript
All right, welcome back to the fourth and final installment of the Rogals talk that I’ve been recording and hopefully presenting to the adoring masses. We’ve looked at a number of different things about Rogals, like sort of some like what they are, some good stuff, some bad stuff. We’re going to look at some cool query tuning stuff that you can do with them now, and ways that you can sort of figure out if Rogals are appropriate for you.
All right, so we’ve got this query right here, and this query is kind of funny because it only returns one row. Again, going back to the rows per second ratio thing, and it takes about 900 milliseconds to run. It’s not disastrous, but like, I don’t know, maybe we’re just unhappy with it for whatever reason.
The reason why I would be unhappy with this query is because I’ve got a perfectly good index on the post table that we could seek into and life could be grand with. Okay, but no, we start off by scanning the post table, and it takes about 900 milliseconds.
If we look at the work that SQL Server did to, on that scan, it was 11 million rows. All right. Why would SQL Server scan 11 million rows when it could seek to them?
Well, if we add in the for seek hint here, we will get a different execution plan, arguably better and faster execution plan back, right? So we say, oh, well, there’s 15 milliseconds now.
We started seeking into the users table. We found some rows. We did a seek into the post table. We did a seek into the comments table. Note here, there’s a top above a seek, and this is not a disaster the way a top above a scan was. I mean, this one’s, oh, this is okay for us.
So maybe for whatever reason, we were like, ah, we’re better than for seek. We can, we can get away without that. Maybe, maybe we have terrible DBAs who would yell at us for, for using query hints.
Cause some DBAs are like that. Some DBAs who couldn’t, who couldn’t actually tune the query themselves. It’d be like, you never use a hint.
Uh, so a way that we could get around our DBAs prying, hinting, hating eyes, uh, would be to, uh, use a top. Right? So just like we’ve done in other queries, we’re going to use a top with a big number here to tell SQL Server that we want to do a particular part of this query together.
And then, uh, we want to make sure that, um, SQL Server sort of like puts a little like hug around this and does this stuff and then does the other stuff or not, not like then, but it does this stuff on it in its own sort of world. So if we run this, uh, with the top in there, we get back a, another, another query plan. Again, this like, just like using the four secant and the last one, this one also takes about 15 milliseconds.
Now we don’t have the outer, we don’t have an outer top out here, right? We just have a top down here, right? There’s no root level, uh, row goal, but there is one down here in the top.
And the top puts this part of the plan together. What, what this does that’s kind of cool is it, again, it starts off with the users table, right? So we seek into the users table, we find the stuff we care about there, and then we go into our nested loops join.
And then for each row that, uh, comes through here, we go into the, we go into this part of the plan and we seek into the post table without having that four secant. So the combination of sort of cross apply in the top, um, with the cross apply with the top, and we have the, the correlation within the cross apply here. If we take out the top and we rerun this, we’re basically back to the original execution plan, right?
This one’s back to like 900, 900 milliseconds. SQL Server makes the same kind of boneheaded choice to scan the post table out here first, and then do a hash join to the users table, which is not what we want, right? But having the top in here is what kind of gets SQL Server on board with doing what we want within the cross apply.
Uh, one thing that’s actually, uh, kind of worth noting here, because, uh, someone actually asked me about this, uh, at, uh, the, the first time I gave this talk, which is in New York, I think. God, it all feels like 10 years ago now. I’ve slept like five minutes in the last three months.
Uh, but it was about the, the, the star in here, right? So we have a select P dot star from this, but obviously, uh, the only thing that we’re getting in the outer part of the query is the ID column, right? So SQL Server is smart enough during sort of like, uh, like some, some of the early optimization stages where like, I don’t know, parsing, binding, algebraizing, simplifying stuff, all that stuff that we usually skip over.
Well, I mean, simplification is cool, but the rest of it, uh, SQL Server figures out pretty early on that we only need the ID column in the, like the outermost project of this. And so it throws away all the star that we don’t need in here. So kind of a nice thing.
Uh, anyway, um, I’ve spent a lot of time in my videos, making fun of, uh, the optimize for unknown hint. Uh, because people use optimize for unknown, uh, I mean, or some, some equivalent of it to fix parameter sniffing, right? Uh, they’ll either use optimize for unknown or they’ll set, uh, some local variable equal to a formal parameter passed into the store procedure.
And they will put a quote in there that, Hey, I fixed this whole thing. Optimize for is cool for a lot of reasons. Uh, but not that.
So optimize for is neat because you don’t have to optimize for unknown. You can optimize for specific values. And one place that you can optimize for specific values is with top. Now this of course deserves, uh, the biggest shout out in the world to Adam Mechanic.
Because if you look through SP who is active or some of his other code and stuff, you’ll see this pattern all over the place. And it’s a very cool one. So thank you, Mr. Mechanic for, uh, introducing the world to this one.
Uh, so all the, all the credit goes there. So let’s say we have this query. And we run this and we have sort of a similar situation to the last one.
SQL Server, once again, chooses violence. It starts off by scanning the post table. And again, this is reading 11 million rows and it takes 900 milliseconds, right?
And we are, well, just about 900 milliseconds, 863 milliseconds, which we’re rounding up, uh, just to keep things in order, right? 900 milliseconds. This sucks.
We hate it. We’re mad at it. We don’t want this thing to keep doing what it’s doing. No, we could, of course, like in the previous example, we could throw a force seek hint on the post table. But again, we have mean, crabby DBAs who do not allow us to use force seek hints. They only allow us to use, uh, code from crazy people.
So let’s say we want, uh, in a different way for SQL Server to use, uh, the, to start with the users, kind of start with the users table and the filtering that we’re doing on the users table and then move on from there. All right. Let’s say that that’s just our goal for this.
All right. It’s our, it’s our, our goal. And we’re going to use a row goal to do it. So, uh, what we’re going to do is, uh, we’re going to use some again, top magic, right? We’re going to say, we’re going to declare the top and we’re going to set it to the nine quintillion number, right?
The big, big number, the hugest number in SQL Server. I guess, I guess some decimals and stuff could be bigger, but who has time for that? One, one could die thinking about these things.
Some, some probably have, you know, if you think, if you think carefully about it, but here we’re going to use top. Uh, we’re going to use this top, well, local variable, uh, in this part of the query. So we’re going to cordon off this select at this top at top from users where creation date, blah, blah.
Uh, and reputation is greater than blah, blah, right? And we want SQL Server to give us the top, uh, nine quintillion rows for this. But maybe we don’t want SQL Server to come up with a query plan for nine quintillion rows.
Maybe, maybe if we gave to see if we gave SQL Server that big number, maybe SQL Server would do something terrible. Maybe it wouldn’t use our index. Maybe it would freak out and just be like, uh, I can’t, I can’t wrap my head around it.
I’m going to give you the worst query plan I can. Ha ha ha. This is the most expensive one sucker. Right? Like you don’t know, like it just freaks you out when you think about it, keeps you up at night.
So let’s say we wanted SQL Server to, uh, do this, but, uh, Oh, sorry about that. Do this. Right.
We want, we want to get the top big number, but we want SQL Server to act like this is a small number. And maybe we don’t want to add a fast end hint to the outer query because again, the crabby DBAs, the discipline crabby DBAs will come and pinch us at night. So what we can do is something that’s very cool.
And we can say, we can tell SQL Server to optimize that for that top being equal to one. So even though we have the top with the big number, right? We’re going to keep, we’re going to get all the rows in the world back.
We’re going to opt. We’re going to ask SQL Server to optimize for a query plan that only asks for one row. So it’s sort of like saying fast one, just with the top and the row goal. And when we run this, we get back a much snappier execution plan.
Right. And we indeed have a top operator here and we have a seek into our index on the users table and then a key lookup to get some other stuff. And then we go and get the rest of the query plan done, going into a nested loops, seeking into the post table and then doing the not exist portion of the post table thing.
So row goals, fun, fun, fun. Uh, row goals are sort of like promises from the optimizer about how many rows will be produced. Um, you can set them, uh, or rather you may set them explicitly using top offset, fetch fast.
And those are the most common. Uh, there’s also like the set row count thing, but we don’t need to talk about set row count. That’s you use that.
I know, no mercy and no pity. Uh, and the optimizer may set them implicitly. Uh, if you use, uh, things like exists, not exists in, not in, or equals any equals all or equals any or equals some equals all is much different. Uh, and they can act at times like optimization fences to help you reshape query plans.
Why they matter is because they change how the optimizer costs and build plans. Uh, they can make some pretty impossible queries very, very fast by forcing, uh, better join orders or just sort of better, like things happening together that, you know, especially if you can get it. So they reduce rows very, very early on.
That’s a very powerful thing that you can do, um, to, uh, start getting better query performance without having to throw hints everywhere. Cause hints often break down over time. Um, exists will, uh, generally already have a role goal of one.
You don’t need to say select top one distinct order by group by all the other stuff. Cause SQL Server just throws it right in the garbage. Uh, top 100%.
Like we talked about, if you put this in a view, the optimizer completely throws it in the garbage, slam dunks on your head, gives you a wedgie. It’s a mean time. Um, if you are going to use top and you are expecting data back in a specific order, make sure you have use deterministic ordering. That is like, you know, the order by elements that you have, there is at least one unique tiebreaker within them.
Otherwise you have non-deterministic ordering and you may have silent bugs and stuff in your query plan. There are some very cool things that you can do, uh, especially with cross supply and top. Um, you can get nice, easy nested loops joins.
You can get SQL Server to seek into stuff when, uh, you thought that you would never convince the optimizer to do the right thing. Uh, if you were hitting problems with row goals and stuff like if exists, or, uh, you see that top above a scan, um, for the, if exists, we used count big to fix stuff. Uh, for the top above a scan, it’s usually creating an index. So you have a top above a seek.
That will make life a lot easier. Uh, and if you wanted to, uh, maybe ask SQL Server for a large number of rows, but only come up with an execution plan, like it’s going to hit a small number of rows, then you can use parameterize top and, uh, tell SQL Server to optimize for top equals some small number, right? It doesn’t have to be one, just has to be a smaller number than nine quintillion.
So, uh, that wraps things up. Uh, thank you for watching. Uh, again, we use row goals wisely and as always, I’m Erik Darling and my rates are reasonable. All right.
Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.