Lost In Costs: Part 1

Video Summary

In this video, I delve into the concept of “Lost in Costs,” a term that highlights how query costs can sometimes mislead us when tuning SQL Server queries. I start by explaining what query costs are and why they should be treated as estimates rather than definitive performance metrics. Using my favorite demonstration, I illustrate how even simple changes like adding a hint can drastically affect execution plans and performance. By walking through this example, I aim to show that while high-cost queries aren’t necessarily slow, understanding the underlying costing mechanisms is crucial for effective query optimization. Throughout the video, I also touch on various factors that influence costs, such as row projections, I/O operations, CPU usage, and memory requirements, emphasizing that these are all part of a highly generalized set of algorithms designed to generate good-enough plans across different hardware configurations.

Full Transcript

Erik Darling here with Darling Data. And it occurred to me while I was traveling overseas, while I was being the European correspondent for Darling Data, that there was some session material that I had presented that I had not yet recorded. And that includes this session that I’m going to do now, which I presented a few different places. And the Rogel session that I presented at the Pass On Tour events. So the next few videos, aside from office hours, are going to focus on those. So that’s a good time there. So this is the lost in cost, lost, actually, I don’t know if that’s supposed to be plural. Let’s skip that. Anyway, as usual, down in the video description, all sorts of helpful links. If you want to hire me for consulting, purchase my wonderful top-notch, best-in-class training materials at a reasonable price. Become a subscribing channel member. Ask me office hours questions. You can do all that down yonder. And, of course, if you enjoy this content, please do like, subscribe, tell a friend, all that good stuff. Since I just got back from Utrecht, I trekked to Utrecht, and I trekked all the way home.

The only thing that I have left through the end of 2025 as far as speaking engagements will be the Pass Data Community Summit taking place in lovely Seattle, Washington, November 17th to 21st, where I have not one, not two, actually exactly two, days of T-SQL pre-cons with Kendra Little. So that’s great for everybody. Except you if you’re not going. So that’s your fault. Anyway, it is still October, so we can still have this lovely Halloween database party going on, can’t we?

All right. So, oh, that’s a web browser. So if you ever wonder, this is how I generate my ASCII art, and this is the last thing I had to generate ASCII art for. So we’re off to a good start. Anyway, oh, see, it is plural. I didn’t mess it up. Anyway, this is going to be part one of Lost in Costs, and we’ll talk all about what this means coming up.

But, you know, so since I’m a professional presenter, I always have my contact info up so people can bother me, but we don’t need to talk too much about that. Because you are already at this one, and you might have gotten a link from this one or this one to get here. So it’s not a lot of point in that, is there? So the first question that we must ask ourselves when we are looking at query plans and we are trying to figure out exactly what we are trying to tune about a query, we must ask ourselves what are query costs. And of course, query costs, no matter where you are looking, are estimates.

Right? So cost does not equal time or anything else. They are unitless measures. Right? They are not durable performance metrics in any way, shape, or form. Even in actual execution plans, costs are all an estimate. There are no actual costs derived anywhere during query execution.

SQL Server does not change its mind and say, oh, I thought that was going to cost 25 cents and it costs $70,000. My bad. It just doesn’t happen. Costs are merely how we got to the plan that we are currently looking at. All right? So when you think about how SQL as a language works, you tell SQL Server what rows you want, what columns you want, which order you want.

Maybe you might even say, summarize these columns to the group by or something. But you basically tell the database, I’m looking for this, and the database has to go find it. Boy, I should probably fix that at some point. Right? It’s not 2024 anymore. Let’s pretend that’s a five.

All right. I obviously didn’t go through this with too fine of a comb before deciding to record it. Anyway. So queries are really just descriptions of what you want to see. Right? They are just a description of the data that you require from the database.

And then the job of the query optimizer is to figure out how to best enough find it within a reasonable amount of time. It’s a bit like how indexes contain data and statistics describe data. Databases contain a whole bunch of data and we describe what data we want to see from them.

Costs, as we know and love them, are just a bunch of internal algorithms that SQL Server uses to shape and choose the execution plans that it ends up executing. Right? The hope is that all those costing mechanisms are correct enough and the optimizer has correct enough information about the data underneath.

Right? The statistics, statistics, histograms and things like that to get you a, well, let’s just, let’s call it a good, a good enough plan. Right? Good enough plan found here. But all of this is to answer the question your query is asking as efficiently as possible.

Costing considers all sorts of fun things. Right? So like the number of rows that are going to like project out from various things. It might be a seek or a scan.

It might be like a where clause. It might be a filter. It might be a join. Like how many rows are going to come through stuff? What kind of IO is being done? Because there is different costs involved or there are different costs involved for random IO versus sequential IO.

CPU effort is also a factor in this. Parallelism is considered a CPU effort reducer because you have multiple CPUs working on the same set of data. Right?

Rather than having one CPU deal with all of the giant amounts of data that you are reading through. And of course, memory requirements are part of that as well. Cost is all just a very highly generalized set of algorithms. The thing that I always have to tell people is that cost has nothing to do with you.

Nothing to do with your hardware. It doesn’t matter how much your hardware costs because the costing algorithms are not looking at your hardware and thinking, Oh, that’s a nice computer.

I bet this would be easy. It just doesn’t do anything. It’s meant to come up with good enough plans on any set of hardware. Right? That is the whole point of a generalized cost-based system. So what I always like to show people is this image, which is, of course, fondly referred to as Nick’s computer.

There’s a note there about SP2, I think, of something. I don’t know SP2 of what. But there might be like an original SQL Server activation code on there.

But this is the computer that all of the costing stuff was originally done on. Now, aside from some very generic terms like CPU or motherboard or RAM or disk or something like that, that has very, very little in common with the type of hardware that SQL Server runs on today.

At least we hope it does. We’re not sure what’s actually going on up in Azure. It could be a whole farm of those things. We don’t know.

They won’t tell us. But anyway, all planned decisions are based on these costs. Some costs are fixed per unit, like CPU and I.O., like a CPU and an I.O.? N.I.O.?

N.I.O.? Whatever. Other costs are based on statistical information that our databases contain, which describe the data in them, like the size of tables, the histograms that are attached to columns or indexes. And, of course, uniqueness is a very good descriptor of data that we might have.

There are lots of things that we can add to our database in order to better describe it or better contain data for it, sometimes more efficiently. Useful indexes.

Useful indexes. We can write our queries with sargable predicates that do not require extra work to locate rows that we care about. We might want to think about adding unique constraints or even value constraints or, you know, unique indexes would be something else. We might want to consider having up-to-date statistics.

At least some of us might. And we might want to even do something as crazy as limit query complexity so that cardinality estimation is easier for the optimizer. We may want to even avoid things that do not have good costing support.

Maybe something like XML, JSON, string splitting, built-in functions, user-defined functions, local variables, and table variables, just to name a few off the top of, well, not my head anymore. It’s just stuff that I wrote down quickly.

So I’m going to start this off by showing you one of my favorite mis-costing demos in the world. Since I am smart, and I’ve made a note saying if I’m smart, but I am smart. So I already ran this.

This is like, this is one of like my grand mal, the optimizer is not always right type of demos, where I have two queries that do essentially the same thing, except one of them has a hint, not this one. This one does not have a hint, right?

There’s no like with or, you know, hey, do this thing here. And then this one down here, which does have a hint, it uses the force seek hint to tell SQL Server it does not always know better. But if we look at the query plans for this, we are going to see two, well, not very different, but different enough to make a big performance difference.

For example, zooming in over here, we can see that the first query does not have a force seek hint on it, ran for 25 seconds. And the second query, which ran for about one and a half seconds, this did have the force seek hint on it.

Now, what I want to point out before I go too much further is that what’s going on in the outer reply is a correlation. And the correlation is from the users table, which has a unique, sorry, a clustered primary key on the column called ID. It’s an integer, it’s an identity.

And I am correlating that to the clustered primary key of the post table, which is also an integer and an identity column. They are very, very similar columns in most ways, aside from which table they belong to. If you’ve been poking around the Stack Overflow database long enough, you’ll realize that this is not how you join users to post.

This is not how the users table correlates to the post table. So it would correlate either to the owner user ID column or the last editor user ID column. But I wrote this demo specifically to show you a failing of the optimizer, where essentially joining two tables together on two unique primary or clustered primary keys results in a really strange execution plan unless you supply a hint to it.

So what make what really slows this first query down is SQL Server looks at this, looks at this and says, hey, you know what? Man, what I’d really love to do is build an index off the index that I already have. SQL Server says, nah, you know, that clustered primary key wasn’t good enough for me.

I need a brand new index and I’m going to store that index in tempDB and we’re just going to hope that this goes well. Right. And of course, it doesn’t go well.

You can see that this like most of the time in this plan is spent here. Right. There’s about five seconds scanning the clustered primary key on the post table. And then since this is a row mode plan, right, there’s an additional 20 seconds here adding up to 25 seconds total.

Right. So we had we spent a lot of time building the spool. It’s a 17 million roll spool.

Part of the reason why the spool takes so long to build is because when eager index spools are doing their thing. Oops, that’s the wrong one. That’s the right one. What happens is all of the all of the rows end up on a single thread. Right.

You can see the rest of these. These are all zeros for the rest of the threads involved here. And this this isn’t just a quirk of this demo. This is every single eager index spool that gets built. Not a good choice, SQL Server. Right.

For the fast plan, SQL Server said, oh, yeah, I guess I could just use that index. Right. I mean, it’s a little it’s a little strange because SSMS does this thing where it cuts off the name of the like or like rather the what like certain operator text. I know Plan Explorer used to do this better, but I can’t use Plan Explorer anymore.

Functionally worthless for this type of activity. But this is a clustered index scan. Right.

And this is a clustered index seek because we said, hey, SQL Server, please force seek. Please force a seek into this index. And so it does. And we can when we tell SQL Server what to do, we end up with a much faster plan. Right.

We don’t spend 25 seconds in this sort of zone of the world. We spend about one second in this zone of the world. All right. And this is because when we ran this query, SQL Server applied a bunch of costing algorithms and said, I can make this faster by creating another index. No, you can’t.

No, you can’t. SQL Server. People always want to do ridiculous things with the various places that SQL Server stores query plans. Right.

So, you know, like back in the plan cache days when I used to do a lot of work on SP Blitz cache. And when I first started working on Quickie Store, which uses Query Store, people would say all the time, hey, I would like to order this result set by query cost. And I always say, why?

Say, well, I want all my expensive queries first so I can make them less expensive. Like, OK, well, you can have expensive queries that run very fast. You can have inexpensive queries that run very slow.

Query cost has nothing to do with how fast your query actually runs. Right. It’s just the amount of effort that SQL Server expects to use to run it. But expects and actually happens is two very different things.

Right. That’s the old everyone has a plan until they get punched in the mouth thing. So I’m not saying that you can’t take a high cost query and make it faster. That’s certainly untrue.

Right. Tune things all the time. High cost, low cost doesn’t matter here at Darling Data. But it does bring up an interesting question. If costs are stupid, what should we look at in our query plans? Well, the first thing that you should find are maybe some things that aren’t even the query plans themselves.

Right. Maybe you should look for things that run at times that you care about. Right.

But once you start finding those things, the stuff that I usually go after are things that have a high average or maximum CPU or duration. You know, there are a lot of metrics that you could look at to figure out various things about a query. Right.

You could look at reads if you were that type of person, but you would be reading the wrong thing. Because reads don’t often tell the story of how long a query executed for. If you do a bunch of logical reads, it’s stuff that’s already in memory.

It’s not going to be particularly slow. You might coincidentally reduce the CPU or duration of a query and also reduce reads. But that’s not necessarily guaranteed.

Not really something that correlates incredibly highly. But anyway, the things that I generally look for, because the things that people appreciate you doing when you tune a query is making it faster. It’s a great way to figure out if something is slow.

Look how long it runs for. Right. Look how much CPU it uses. These are the things that you can use to tell it like, hey, I made that this query took 30 minutes. Now it takes three minutes.

Right. I reduced 27 minutes from this query. You might not reduce a single read from that query, but you might greatly impact the total duration. You might want to maybe even focus in on stuff that executes a lot.

I don’t know if you’re into that sort of thing. If you’re very OLTP-ish. So once you figure out what you should tune. Right.

Not what you want to tune. Because there is often a pretty big gap between what people want to tune and what they should be working on. Right. It’s, you know, there’s lots of stuff that I should be doing right now, but I wanted to do this. So we’re here.

Right. But generally, you know, and I’m not saying that there’s no such thing as like a red flag in a query plan. Because there are certainly some query plan patterns that I can, when I see, I’m like, oh, that’s, that’s probably it. But getting an actual execution plan is the most valuable thing that you can do because that will tell you where the, like where the query is spending time.

No one is going to complain about the number of logical reads that a query is doing. People are going to complain about how slow a query is. What you can figure out when you get an actual execution plan is where things are slow.

If there were any weight stats that help you figure out why they were slow. If there are any big differences between estimated and actual plans that maybe because those estimates were incorrect, SQL Server chose a not so great execution plan. You might look at if the query had any spills.

And you might even look at like what we did earlier with the eager index pool plan about parallel thread distribution. Right. If all your rows are ending up on a single thread and a parallel plan, not a good sign, not a good sign at all. Logical reads are largely a vanity metric these days.

I say it’s like SQL Server 2008 mentality. It’s not, not what you want to focus on in the sort of modern world, especially where people put such a high emphasis on reducing cloud spend because reducing CPU is generally the best way to reduce cloud spend. Physical reads, maybe logical reads, not so much.

So we’re going to pause here before we move on and we’re going to, well, stop because I’m going to do this in three parts. I’m trying to limit the parts to about 20 minutes a piece. So we’re going to stop here and we’re going to pick up here in the next video.

All right. Thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.