Row Goals: Part 1

Row Goals: Part 1


Video Summary

In this video, I dive into the fascinating world of row goals in SQL Server, a topic that might not get as much attention these days but is crucial for understanding query optimization. I share insights from my presentation at the PASS On Tour events and co-present with Kendra Little on two days of SQL Server performance tuning pre-cons, promising to be the best T-SQL pre-cons ever. Alongside explaining row goals in detail—both those introduced by us and those set by the optimizer—I also discuss the difference between row goals and row limits, emphasizing why understanding these concepts is vital for effective query optimization. The video includes helpful links for consulting, training, and becoming a channel member, as well as my usual humorous advice on how to contact me correctly. Enjoy exploring this essential aspect of SQL Server with me!

Full Transcript

Erik Darling here with Darling Data. And today’s video we’re going to get started going over the material that I presented at the Pass On Tour events, plural, about Rogols. Why? Because they’re fun. And it was kind of amusing and no one really talks about them anymore. So I decided I’ll give it a shot. I don’t know. You tell me how I did. If you look down in the video description, there are all sorts of helpful links. You can hire me for consulting. You can buy my training. You can become a paying member of the channel. You can ask me questions for free on my Office Hours episodes. And of course, if you enjoy this content, you can like, subscribe, and tell a friend. In the honored tradition of liking, subscribing, and telling a friend, I’m going to tell you about Sport Drink. Not because they paid me to and not because they asked me to, just because I like it. I’ve got the orange, you glad it’s not chemicals flavor. It’s quite nice. There we go. This November, I don’t even, I don’t know when this video is even going to come out. That’s how much fun life is. But this November, this might be after November. I don’t know. Who can tell? Time is a strange thing, right? Time is a flat circle, as OI’s man once said. I’m co-presenting with Kendra Little. Two days of SQL Server performance tuning pre-cons. Sorry, my back just did something strange. You get old and laughing hurts. Two days of performance tuning pre-cons with Kendra Little. They will be the best two days of T-SQL pre-cons that have ever existed. And you should come, or else. Right? Sorry. I got a text from Joe Sack. I can’t show you what it is, though.

Shh. Secret. Anyway. All right. Anyway, let’s go talk about row goals. So, row goals. Very interesting thing. Now, when we talk about query optimization, there are two ways that query optimization can be talked about. Right? There’s the query optimizer that goes and does stuff. Right? Makes a bunch of cost-based decisions. Thinks about your query. Says, hmm. I think this is a good way to go get that. And then there’s query optimization, which is kind of like what you and me do. Well, I don’t know what you do. I know what I do. And when we optimize queries, we think, hmm. I don’t like the way that worked the first time. I need to go fix it.

But row goals are one thing that sometimes the optimizer will do that will change your query and your query plan. And row goals are also something that we can do to change our queries and our query plans and hopefully get better ones. So, normally, this is where I would introduce myself to the audience because I’m a very professional presenter. I would tell them about all the great ways that they can hire me. I will tell them that I am a consultant with reasonable rates. And if they ever want to hire me or follow me or any of that stuff, these are all the ways that they can do that.

I usually also make a joke about making careful note of the fact that when you want to contact me, my name is Eric with a K. If you go look for Eric with a C, you might find the love of your life. You might, I don’t know, you might get married, you might have kids, you might have the greatest future that you could dream of, but it will not be with me. So, Eric with a K. Also, be very careful because whoever you meet, if their name is Eric with a C, they will likely be a sociopath because their parents spelled their name wrong.

So, be careful. Anyway, moving right along. I’m very funny in person. I told a very funny joke at the Pass on Tour event in the Netherlands. I told the Europeans that I reformatted all the material for a European audience by taking out the jokes.

They all laughed. The first time they’ve laughed. Crazy. Anyway, row goals come in two forms. There are ones that you may introduce commonly with top offset fetch or fast enhance.

And there are ones that the optimizer may introduce commonly with not exists, exists in, not in, stuff like that. You can think of a row goal as a bit of a short circuit for the optimizer. I don’t mean short circuit the way that it’s glitching.

I mean, sometimes you might think it is, but it’s not. But it’s sort of like the optimizer can say, hey, well, wait a minute. Normally, I would have to go get like 3 million rows and 10 million rows.

And I get to figure out a plan to get that many rows. But the optimizer can then be like, hey, well, wait a minute. That nice person at the end said, I only need to get the first 10 rows.

I don’t need to think of a way to get 3 million times 10 million. I just need to think of a way to get 10 from 3 million times 10 million. So we can kind of think about strategies a little bit differently about like there’s a quick, like probably maybe a faster, different way or different approach to getting 10 rows.

And there is some getting like 30 million rows, right? So that’s the first thing. You can also think of a row goal is sort of like a promise.

It’s like when you sell, right, that only some number of rows will be produced or that you’ll run out of rows to produce, right? So you’ll either say, well, I want the top thousand rows from this expression. And SQL Server will be like, well, you look like a trustworthy person.

If you asked for a thousand rows, I’m going to go find a thousand rows. I’m not going to stop until I find a thousand rows or I can’t find any more rows. So you can kind of use them to fence things off.

If you use them in a table expression that be derived or common. And you can often use that to force the optimizer towards a plan shape that you prefer. Of course, it does take some work to figure out what plan shape you prefer.

But once you find that plan shape, you’re golden. Now, one thing that it’s sort of important to establish up front is that there is a difference between a row goal. Say, I need you to hit this number of rows, right?

These rookie numbers, you hit this row goal and a row limit, right? So the terms may feel interchangeable. But think about this.

Just about every other SQL dialect that you will come across uses the word limit instead of the word top. T-SQL is the only one that uses the word top as a row limiting sort of device. Everyone else says like limit whatever, right?

It goes to the end of the query, right? It’s not like select limit something. It’s like select your stuff limit 10 or 100 or something. So there’s a difference between like saying I only want to get 100 rows from this and something like the optimizer saying, well, I only need to find one matching row. An example of that would be like a semi-join, right?

Because semi-joins don’t need to get all the rows. We’ll talk about that more later though. But some examples. This is a row limit, right? You’re saying I want the top 100 columns from some table ordered by some column, right?

Top without order by is usually frowned upon, but we’ll talk more about that later. So this is limiting the rows. This is limiting the rows and the results to 100 rows or maybe, I don’t know, if the table only has 99 rows in it, you’re kind of naturally limited anyway.

But we don’t often run into those tables. This is an example of a row goal. So when you say option fast one, you are not limiting the number of rows in the output, but you are telling the optimizer to devise a query plan as if it only needed to find one row.

This is set at the very root of the query plan. This and that specific wording, that pedantic wording will come in handy later. This is an example of, again, potential row goaling by the optimizer.

So when you say select something from table where exists something else, well, exists and not exists and in and not in, we often see them expressed in the query plan as a semi-join. The reason it’s called a semi-join is because it’s not a full join. And I don’t mean full join in the sense of like full outer join.

I mean that when you have tables that have one-to-many or many-to-many relationships and you join them together, the many’s need to be respected. When you have a semi-join, you don’t need the many. You just need to know if one thing is there or if one thing is not there.

Right? Either that row exists or that row doesn’t exist and you don’t need to find all the other matching rows if you’ve already acknowledged that a row is there or not. Right?

So that is an example of a row goal that the optimizer may set. Query optimization by the optimizer. Right? The query optimizer’s query optimization is largely driven by how many rows are expected to come from tables, survive, where and join clauses, get past group by and having clauses. And setting a row goal is a way to influence the optimization choices by the optimizer without having to use query or table hints and all that.

Because the number of estimated rows that SQL Server gins up during query plan exploration and all that stuff or cardinality estimation and all that stuff has a pretty huge impact on how things get costed. And of course, as we discussed during the lost in cost videos, the lost in cost videos from last week, if you have not watched that playlist, you can if you want. It is not required material for this.

Completely separate material. Except for some of the things I’m going to say here. Query costs are estimates and nothing but estimates. You’ll hear a lot of very smart and knowledgeable people prove that they’re in the cool kids club by calling them query bucks. Right?

But the important thing to understand, again, if you watch lost in cost, this will sound terribly derivative, but cost does not equal time. Cost does not equal, does not measure speed or efficiency or anything else useful. They are, costs are all unitless metrics.

They are not durable performance tuning metrics. All right? Even in an actual execution plan, costs are all estimates. There are no actual equivalents for costs derived after queries execute like other metrics.

Make sure query plans are turned on here because wouldn’t it be nice if there were a setting or a button you could press to just keep query plans turned on? Like maybe if in presenter mode, there were an option for like enable actual execution plans and just have them on for all your SSMS tabs. Sure would be great.

If only someone opened an Azure feedback issue on SSMS about that. That’d be cool. Anyway, if you, if we look at this query plan, we pause for a moment with a, I mean, you know, technically this is on topic because there’s a top 10 in here. So, you know, we do, we do set a row goal and a row limit here, but if we run this query and we look at this index scan, we will see all sorts of things that have the word actual next to them.

And we will see one thing that should have the word actual next to it, but does not. And then we will see some things that have the word estimate next to them. These things that have estimated next to them do not have an actual equivalent, right?

There is no counterpart to these where the actual costs are shown anywhere because they don’t exist. They are, they are not a thing. But these ones down here, well, you’ll, you will see actual stuff for those.

Now, what it’s, what is important to understand is that costs are only how we got the plan that we’re looking at, right? All the costing stuff is how SQL Server figured out what plan it wanted to execute for our query, right? The optimizer did all its costing and compared choices and you got the cheapest combination of choices or the optimizer timed out and say, you get this combination of choices because I’m sick of thinking about it.

Not, not an uncommon thing. I’ve seen your queries. So you tell SQL Server which tables you want via the from and join clause.

You tell it which rows you want via things like the where and on clause. I guess even having would, would make sense in there too. I gotta, I gotta fix that in post, I guess.

You might tell, you tell SQL Server which columns you want via the select, you tell it which columns you want to summarize in the group by and you tell it which columns you want in what order you want your results via the order by clause. So when I go to conferences and then I talk to people and then I go home, I will, I usually tell the conference organizers, hey, I’d like to talk to some people who wanted to talk to me. Maybe, maybe you could give me this information, right?

Because my rates are reasonable, I just say, hey, just give me anyone whose consulting budget is over zero, right? If you have zero or negative consulting budget, we’re probably not a good fit. But if you have more than zero consulting budget, you could maybe work something out.

Could do a little shuffle for you. And of course, I would want to get this information. I don’t want to get all the information because too many, too many is too crazy.

But I just want like the top hundred people and I want them by consulting budget descending so that I make sure I maximize my input. But queries are just descriptions of what we want to see. And the optimizer goes and figures out how to best do it.

It’s impossible to escape people teaching you about SQL or databases without telling you that SQL is a declarative language. And other programming, whereas other programming languages are largely imperative or procedural programming languages. Meaning that you tell the computer exactly how, when, where, and why you want things done.

And it goes in and follows your instructions. Whereas queries are a completely different world. Queries, we have to describe all these things to the optimizer.

And the optimizer has to do a good job of going and getting it for us. There are some corollaries in databases that sort of match that a bit. Indexes contain data.

Statistics describe data. Databases contain data. And we describe what data we want to see from that using our queries. Costs are really just a bunch of internal algorithms that SQL Server uses to shape and choose execution plans for us. And the hope, the grand hope, the big hope that we have every day is that those costing mechanisms are correct enough.

And that we have provided our optimizer with good enough information about the data contained in our databases in order for it to come up with a good plan to answer our question. Costing will consider, and of course, the important one is first. Costing considers rows, right?

Cardinality estimation. How many rows are going to come out of this table? How many rows are going to qualify after this where and join clause? What’s the data distribution that we have from our statistics?

Costing will also consider the type of I.O. Because, you know, different types of I.O. cost different things to the optimizer. It’s random I.O. versus sequential I.O.

It will consider how much CPU effort it thinks will go into something. Of course, parallelism may reduce the cost of CPU effort because you’ll have multiple CPUs working on something. And, of course, memory requirements will also figure into things.

Costing is a highly generalized computation. Again, it’s based on one very old, very specific piece of hardware. And the point of that hardware was, well, I mean, the point of that hardware was for someone to do their job, obviously.

But the point of the costing algorithms was to come up with a good execution plan on any piece of hardware. So if you have two cores and 128 gigs of memory, or if you have 128 cores and two gigs of memory, the optimizer will come up with the same plan for either one because it doesn’t think about much of that.

If you ever want to see the old piece of hardware that costing was done on, it is here. It is this thing, right? It is this cantankerous contraption.

There might be like a where in the world is Carmen Sandiego CD stuck in there. There’s probably like an Encarta thing in here. Like it’s an ancient, ancient thing. Your servers at home, I mean, not your servers at home, but your servers like wherever they live.

And I would say at work, but they’re probably in some lousy cloud somewhere. But they do not have a lot computationally in common with this old hunk of junk, right? This old plastic thing here.

Thank you for showing that. That was very helpful. So now you know all the questions that I’ve answered. Lucky you. Anyway, all plan decisions are based on these costs.

Some costs are fixed per unit like CPU and IO. And there are other costs are based on things like statistics and metadata, things like table size, histograms, either system created or attached to indexes. Or I guess user created statistics would be another one.

And of course, uniqueness. Describing uniqueness to the optimizer can be a very helpful thing. Some things can really help the optimizer make better enough plan choices. So useful indexes, targetable predicates, unique constraints, enforced foreign keys, value constraints, up-to-date statistics, limited query complexity, a goal we should all strive for.

We yearn for non-complex queries, don’t we? And avoiding things that don’t have terribly good costing support. You know, XML, JSON, string splitting, built-in functions.

That means like left, right, substring, replace, up, down, reverse, that stuff. User-defined functions, local variables, table variables. And of course, the list could go on.

But we do have to go home someday. Okay. Row goals. Oh, come on back here. There you go.

Zoom it. Row goals are just another thing that affect costing and the query optimization process generally in SQL Server. So we’re going to stop here because we’re about at the 20-minute mark. And the next kind of section gets into the live demos and stuff.

So we have talked for long enough, right? We have gone through enough green text. Our eyes are now burning green and we need to go look at some execution plans. So I’m going to get this one uploaded and then I’m going to carry on from right here.

And I’m going to highlight that so I don’t forget because I’m probably going to go do something and then come back. All right. Thank you for watching.

Hope you enjoyed yourselves. Hope you learned something. And I will see you in the next video, which will be Row goals part two. Electric Goaloo.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.