A Little About Trivial Plans In SQL Server

Video Summary

In this video, I dive into the fascinating world of trivial execution plans in SQL Server and how they can sometimes lead to unexpected query behaviors. I explore why certain queries might be considered “trivial” by the optimizer and what that means for performance tuning. By walking through specific examples, I demonstrate how even simple changes in query structure or indexing can significantly impact whether a plan is marked as trivial or fully optimized, revealing the intricate decision-making process behind SQL Server’s cost-based optimization.

Full Transcript

Erik Darling here with Darling Data. Here bright and early on what I presume is a Friday morning, but could be temporally inaccurate depending on how much you actually think time is real. Might not be. Might not be. I’ve heard various thoughts on the subject recently that really made me question the whole of existence. Alright, so I’m going to switch things up. a little bit. I’ve been recording like 10 videos recently about stored procedures that I write, how to use them, how to get information from them, stuff like that. And I wanted to do a few more videos just purely about SQL Server interestingness. Things that, I mean, I’ve generally have found interesting over the course of my long and storied career with SQL Server. And things that I like sharing with people. This is a material that I do during, you know, conferences and stuff like that. But, oh, you know, just trying to democratize things a little bit, make sure, make sure everyone can see it. You know, it shouldn’t just be, no, it shouldn’t just be limited to people who can take a week off work and spend five grand showing up somewhere. I don’t know. I guess I’m like Jesus in that way. Leave any further jokes about that alone. Alright, so let’s actually read the script.

Let’s turn reading comprehension up to 11. Read the script and turn on query plans here. Oh, this video is going to be about trivial execution plans. Trivial execution plans are a thing that have been not solely, but certainly partially responsible for a lot of strange things that I’ve seen over the course of my query tuning life. And not only query plans, but sort of by extension, simple parameterization, which can occur when a plan is marked as trivial. So first I’m going to show you exactly where you can locate information about a plan being trivialized by the optimizer. This is sort of like the optimizer just saying, yes, dear. I don’t, I don’t, I don’t really have time. This is, you know, agreeing to agree. Yes, I’ve got your query. It’s very important. Super important. I just, yeah, we’re going to take care of that.

So if we look at the properties of this right here, floating above my head is the important part. We’re going to try and move this thing around a little bit so that we can get to where it should be. Look at that. Oh, beautiful. All right. It’s going to put that right next to me. There we go. So this plan’s optimization level was trivial, not full, just trivial. Just, we don’t really have any choices to make. SQL Server is driven by a cost-based optimizer. And if there are no cost-based choices to make, like which index to choose, which operators would be best to use in the query plan, right?

There’s a whole bunch of different operators that are possible of, they’re capable of doing the same job. And sometimes SQL Server has to think about how much each one would cost and say, oh, well, I choose any, me, me, me, you’re the cheapest. There are no, there are no choices like that for this execution plan. This, it’s the user’s table. All it has is the clustered primary key on there. And because that is the only source of data and because we are only asking for 100 rows unordered, all SQL Server has to do is say, okay, give me those first 100 rows.

You can do some pretty funny looking things that are still considered to be trivial. Now, if we look at this query, one might think that generating a row number would make SQL Server have to think long and hard about the best way to generate that row number. But it doesn’t because in this case, at least, and let’s just make this super clear here, do some formatting and live formatting. Make sure our query is nice and pretty for the camera.

In this case, we are partitioning by and ordering by, in all cases, the ID column. And the ID column is the clustered primary key of the user’s table. So, it is unique.

So, partitioning by this will not really do a whole lot. Ordering by it is free because it is the leading key column of an index. Right?

So, like, just without a where clause or anything else to pollute this thing, we have everything that we want in order in both of these cases. So, if we run this query, and again, we come back to the execution plan, even though it looks like SQL Server, well, I mean, it doesn’t just look like SQL Server, even though there are a couple more operators between here, there was no cost-based choice in generating these operators.

This is how row numbers and many other windowing functions are produced in row mode. In batch mode, you might see a window aggregate. Sometimes you might even see, well, not in batch, but in general, you may also see a window spool for some windowing, windowed, windowing, window, whatever, some windowing functions.

And so, these two operators are responsible for generating that row number, but since we have an index that completely supports the partitioning and ordering of our windowing function, SQL Server has no cost-based choice to make when it comes down to this.

Likewise, since we have a clustered primary key in the table, and that’s going to be part of the select star here, you can actually see it right there up, oh, it disappeared on me, you bum.

There we go. And I don’t know why, but sometimes zoom in is a little bit laggy when I’m recording. It’s kind of strange, but whatever. Can’t have everything, so they say.

But since the ID column is our primary key, if we had another column that was marked in unique some way, like a unique nonclustered index, constraint, something like that, we might see something similar happen.

But if we select the distinct top 1,000 everything from the users table, because we have that ID column in there, SQL Server has a pretty good guarantee that absolutely everything will be unique because of that index, and we don’t have to do any work to try and produce a unique result set.

And so if we do a little flip here, and this is something that annoys me quite a bit about SSMS, is that this context is just not this context, and I can’t click on this context to get this context.

I have to click over here to get this context, and then I have to click back to get this context. But in case my beautiful hair, which is long nearly to the point of a pompadour at this point, I’m getting a haircut in a couple of days, don’t worry.

If we look over here, and let’s put that right next to me again, because I want to be the star of the show. Trivial Eric, they call me. In the pursuit of all things trivial.

So this is, oh, wrong hand. So this is, again, a trivial plan. We didn’t have to do any work to find a plan for that. Where this changes is when SQL Server has choices to make.

So I’m going to run all three of these queries in a row. And while those run, which, I mean, they’re already done, but I just want to explain a little bit up front.

And the reason that I’m forcing compat level 140 here and here is to avoid the automatic batch mode on rowstore feature from kicking in. I have that in developer edition.

So lucky me. And the reason why I’m using this hint down here is because I want to show you what a stream aggregate plan would look like for this query. Because these are the three choices that SQL Server gets to make when it runs this query.

After this, I’ll show you what happens with batch mode on rowstore. So if we look at these execution plans, the first execution plan chooses a hash match flow distinct.

Why did it choose that? Good question. Because the top that we chose completely met SQL Server’s cardinality estimate for how many unique rows we might get out of here.

This next one, or we were asking for a number of rows less than or equal to the number of rows that SQL Server thinks will be distinct in the table is the more correct way of putting it. Every time I say something, I picture people tutting at me for saying it incorrectly.

So I do try to correct myself when I catch myself. Since this query is asking for more than the cardinality estimate of the grouped or the distinct tuples in this index, SQL Server chooses a full-on hash match here.

See the little difference with the flow distinct and the aggregate here. Now, good job us. If we look down at the bottom query where I told SQL Server, I really, really want you to use a stream aggregate, this order group hint will do that.

SQL Server had to sort all that data because stream aggregates expect data to show up in order. So we had to sort the whole table, do a stream aggregate, do more parallelism, do some more stream aggregating, and then produce those rows.

So we had sort of a double aggregate on that one. It’s a bit strange, but we got the result back. But this was a cost.

So like this is the stuff that the optimizer thought about while it was coming up with execution plans. If we just compare the cost of the top 5,000 plan right above it without the flow distinct choice being in there, we can see the estimated subtree cost for this query is 47.5-ish query bucks.

And we can see the estimated subtree cost for this query was 94.7 some odd query bucks. Not that I care so much about how much a query costs.

This is just how SQL Server chose the plan it was going to use. If you’re the kind of person who cares about query costs, you should really stop.

They are not good things to measure a query by. So if we had an index on reputation that put reputation in order, then all of this stream aggregating would have been a whole lot cheaper to the optimizer because we wouldn’t have had to sort the data.

It would have been sorted for us in the index. Now I know I promised you, I’d show you what happens and why I have this hint in here. So just to remind you, if I run this query, we get the hash match flow distinct operator up there.

But if I run that query without the hint, then I get a different query plan. I get a hash match aggregate right there, right?

That’s the full hash match aggregate. The real reason that I get that is because SQL Server has chosen to execute this operator using batch mode. So if we highlight that, this happens in batch mode.

Batch mode on rowstore kicks in, thinks that this query is going to be, is heuristically expensive enough to require batch mode to be quick. And so it kicks in and we get it. And does it make a huge difference?

No, not for that, not for that query. Makes a big difference in some other queries. This one, not so much. So, there are some perhaps obvious things that will get you full optimization.

In this case, we’re going to try, we’re going to, if we order by a column that has nothing to do with any index currently on the user’s table, if we ask SQL Server to give us the top one row ordered by account ID, this will be a fully optimized plan.

The reason that we can tell it’s a fully optimized plan without even digging into the details of it is that it’s parallel. If a query’s cost to the optimizer is greater than the cost threshold for parallelism setting, the plan will not be trivial.

There are a whole bunch of heuristics and other things that kick in. You know, SQL Server looking at like the logical trees and stuff for a query to figure out like an initial cost so it can figure out some other stuff.

It’s a little bit beyond the scope of what I can cover in this video, but it’s, it’s definitely, there’s definitely a lot going on in there. A lot of math. I don’t know math. There are also some not very obvious things that might get a full optimization.

So let’s just run these two queries. They’re both logically identical. The only difference that we have is this little set of parentheses around a select for the ID column here.

But this little nudge is apparently just enough to make SQL Server think real long and hard about how it wants to run this query. So if we look at the details of this query plan, this optimization level is trivial there.

And if we look at the details of this query plan, this optimization level, which I believe is the first time we’ve seen, oh, actually no, because it was in, uh, it was in those distinct queries.

This optimization level will be full. So SQL Server had a, um, a cost-based decision to make apparently, like maybe where to put the compute scale are.

I don’t know. Um, couldn’t tell you on that one. So let’s try an even simpler subquery here. Let’s just, uh, let’s run these two queries.

We’re just getting a count. And then this count where one equals select one, one will always equal select one, but SQL Server is kind of, I don’t know, uh, kind of picky about these things.

And if it’ll let me grab it, no, that just moved that query plan. There we go. All right. So if we look at these two query plans, they are absolute, well, they’re not absolutely identical.

Um, this is another one of those funny things with batch mode on rowstore. Uh, and, uh, and prior presentations of this, where I wasn’t using, uh, the, the newer compat levels, uh, these plans would be exactly the same.

If I, if I set this back to, uh, the, what do you call it there? The one 40 compat level, we would not get batch mode here, but, um, this is, uh, perhaps, a cost-based decision that is of some, uh, interest to you because, uh, one thing that cost-based optimization will do is figure out if batch mode on rowstore would be a good choice for your query.

Uh, in this query, we do not get that, uh, this query, if we look at the details over here, you can probably see it right over my head.

It is trivial. Uh, and this query down here, if we look, if I duck down a little bit, it has full optimization. So, cost-based optimization hides a lot of stuff, right?

Like, a lot of stuff is just, like, on the other side of that door. If you get a trivial plan, you don’t go through that door, you just, you know, get whatever gets handed out that door. Uh, there’s a lot of cool stuff behind it. It’s like the, the backstage area, right?

All the good, that’s where all the good stuff happens. So, the brown M&Ms and pharmaceutical grade cocaine are. So, one other thing, and this is, this is going to lead into a minor tangent, uh, is if we run these two queries, and we look at the query plans for them, whenever my, RDP decides to become responsive, maybe my RDP needs some pharmaceutical grade cocaine.

Uh, so if we look at these two queries, this one up here, scans the clustered index, this one down here, scans the clustered index, but only the query that will get full optimization decides that an index might be useful.

Now, I bring this up, uh, because I want to show you other things that are on the other side of, in the backstage of SQL servers, I’m trying to, the full optimization party, uh, and SQL servers optimizer, but also because, uh, I, I, I need y’all to chill with adding every missing index request that you see, or that you might find via scripting, because a lot of the times, uh, they are not quite, uh, go, they’re not quite going to do as much, uh, good for your query as they, you might think they’re going to do.

You might see this show up in front of you someday. You might say, wow, this thing has a cost of 36 query bucks. That sure sounds pricey. Uh, and you might look at this and say, wow, I can make this query cost 99.2491% less if I had this index.

The thing is, uh, do we really need to improve a 288 millisecond query? Is that really important to you?

Is that worth it? I don’t think so. Don’t believe so. But anyway, back to, uh, why, and this is something that I, I, I, I foreshadowed ominously earlier in the video, back to sort of why this query, uh, does not have a missing index request.

Is, part of, part of it is the, uh, simple parametization. Part of it is the trivial plan. And part of it is, uh, that SQL Server just didn’t care all that much about our query.

It did not, did not break, not, not, did not get full optimization, not break any boundaries.

So when trivial plans, uh, can be a real letdown, is when we have a constraint like this, where we are telling SQL Server, exactly the upper and lower boundaries of the reputation column.

And, uh, this constraint is trusted, right? If we look at, if we look at the, this, this data, right?

This is our, this is our index. This is our constraint rather. And we can see that this can, this constraint is trusted because there’s a zero for is not trusted. If it were one, then it wouldn’t be trusted.

But zero for is not trusted means it is trusted. All right. Cool. So if we run this query and we look at what happens, and, uh, I got ahead of myself a little bit when I was talking about the last thing, because it’s really more pertinent to when I talk about this thing, uh, is the reason why, uh, this query, despite knowing the upper and lower boundaries of, uh, the, the reputation column, and despite us looking for a value in the reputation column that, uh, falls within, falls, uh, out of those boundaries and would not possibly exist in the table.

Because SQL Server gives this thing a trivial plan. And because SQL Server chooses to, uh, simple, simple parameterize this plan, the simple parameter plan has to be safe for any other, uh, value that gets plopped into, uh, the where clause and gets also simple parameterized.

We could search for a value that’s within the boundary and we would have to scan the clustered index. So this is one of those funny things about, you know, like, you know, all the moving parts in SQL Server that can certainly make things, uh, tough to, um, tough to deal with a little bit.

But if we, so actually one thing I want to look at real quick, we should go over here and we should actually validate that SQL Server read every single row, uh, in the, in the user’s table when it did that work.

And it did, it read all of them. Good for us. We figured it out, cracked the case. I’m going to make, make detective sergeant major, whatever.

I don’t, I’m not good at ranks like that, but, uh, this query, if we look at this, if we run this query and we get it to full optimization, this query will, will bail out, uh, because all we get, uh, we can see the full optimization again, floating over my head right there.

Full. Uh, the reason why we get a constant scan rather than a clustered index scan is because rather than, um, being simple parameterized and SQL Server having to come up with a plan that’s safe for any other value that might go in there. Uh, we have a literal value here and SQL Server is able to evaluate that literal value, uh, not have to come up with a plan that’s safe to reuse for that literal value and say, no, we don’t have any rows for that.

Thanks for asking though. So anyway, uh, trivial plans are kind of weird. Um, you know, I don’t, I don’t like them very much.

In, in a lot of cases, uh, I find that they pop up at very inopportune times. I find that they are not, um, they are used when they shouldn’t be.

They are misused, uh, quite a bit. And, um, the only thing that you can do is, is look at your query plans. Uh, look at the, uh, look at the, uh, the operator properties of the root operator.

See if your plan is getting full or trivial, uh, optimization and then, uh, work from there. Uh, the one equals select one thing is a really cheap and easy way out.

You know, I’ve, and if you are able to modify any queries that, um, that, uh, that are not running well because of the, the trivial optimization level, then we might, you might want to do that.

Anyway, uh, I’ve been talking long enough. I got stuff to do. Uh, I’m going to try to upload this video. This is my longest video yet. I’m surprised the camera is still, still operating.

Uh, and, uh, yeah, uh, I’m going to, I’ll record another one in a little, in a little bit. I got, I got some, I got some actual work to do after this. Anyway, thank you for watching. I got some quite strong Раз soloing rhythm. I got some good heart.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.