YouTube Days: Cardinality Estimation For Local Variables And Parameters

Cardinality Estimation For Local Variables And Parameters

I’m working on ramping up my recording and streaming setup now that life has settled down a little bit, and publishing the results to YouTube while I work out the wrinkles. Enjoy some free video content!

Video Summary

In this video, I delve into the nuances of local variables in SQL Server stored procedures, offering a fresh perspective on their usage and potential pitfalls. Typically, my discussions focus on how local variables can affect cardinality estimation when used instead of parameters or literal values for filtering queries. However, this time around, I explore scenarios where local variables are mixed with parameters, highlighting the impact on query plans and performance. By running through several examples using parameterized dynamic SQL, I demonstrate how local variables can lead to poor cardinality estimates, especially when dealing with large result sets. The video also touches on a new optimizer feature in SQL Server 2022 called parameter-sensitive plan optimization, explaining why local variables can break this functionality and suggesting that they should not be used as a workaround for parameter sniffing issues.

Full Transcript

Erik Darling here with Darling Data. I’ve been busy the last few months fending off various hostile takeover attempts by BeerGut Magazine. Despite their best efforts, I don’t think that a six-pack is quite enough compensation for the glorious Darling Data brand, despite what GoDaddy tells me my domain is worth, which may be about the price of the six-pack, depending on local factors. Erik Darling here with me, I’m recording this video today to talk a little bit, well, talk about local variables in a slightly different way than I usually talk about them. Usually when I talk about them, it’s in the context of, like, there just being one local variable, how SQL Server deals with cardinality estimation when local variables are being used in place of parameters or literal values for filtering queries. Erik Darling here with me, I see this a lot when helping clients tune store procedures where, and again, the clients are nice people who keep my lawyers afloat so that I can fend off BeerGut Magazine hostile takeover attempts. But I see them a lot in store procedures to do something like this, like, where someone just didn’t feel like typing data at day minus one over and over again because they’re going to use it in a bunch of different queries to filter stuff out. So the query, having a good time. The query is end up looking something like this, where on one side you have a local variable like this being used and on the other side you have a parameter that gets passed into the store procedure being used. So I’m going to use a slightly different example to demonstrate what happens sometimes when you mix local variables and parameters in your where clause. Now, for these queries, I’m going to use the nice, safe, happy parameterized dynamic SQL that will keep your SQL injection attempts at bay. So everyone can rest safely and soundly knowing that. And I’m going to start off using values for my filtering that only get one row back from the user’s table. I’ve got two parameters that I’m going to pass into my dynamic SQL here. I’ve got one for upvotes and one for downvotes. And I’ve got a few different iterations of the dynamic SQL that I’m going to run to sort of use as examples here. Now every time I run this, the code in this, I’m going to clear out the plan cache. So they’re going to get a brand new query plan for each time this group of queries executes. But we’ve got this first one here. And this one is going to use parameters for both filtering on upvotes and downvotes. So the values that get passed in from here are what get get get get uses parameters in here.

And then I’m going to switch off having one use upvotes as a local variable as a filtering parameter. And then another one where I do the same thing with downvotes. So I get that there and we can see the local variable being used there. And then I’ve got a final one where both of them use local variables. So these both get the local variable doodad cardinality estimation used. Doodad. Professional, right? So I’m going to run this. And we’re going to look at the query plans together. And what did I do? Oh, did I have something highlighted? I did probably. There we go. Second time’s a charm. Much like me.

So let’s look at these query plans. And hopefully my zoom it skills are still fairly sharp. So we’ve got not too bad. So we’ve got this first query up here that uses parameters for both upvotes and downvotes. And we see that we get a guess of one for both of these. We also get a guess of one for both of the two, both of the queries where one of the parameters was substituted with a local variable. So we’ve got one here and we’ve got one here, but we’ve also got a regular parameter passed in on the flip side. So like there’s downvotes as a parameter there and there’s upvotes as a parameter there.

Only the ones that end in underscore v are the locally declared variables to the batch. Cool. So these both also get a correct guess of one because both of, because SQL Server is able to use the upvotes and downvotes. So it’s a cardinality estimation for the parameters to kind of guess one row and everything goes okay because of that. There’s not too much room for error when you have a single row that gets evaluated. But the last one, all the way down at the bottom where local variables get used for both upvotes and downvotes predicates here, you get a guess of 45.

So SQL Server where it no longer uses like the good part of the histogram to make a cardinality estimate. It uses the density vector stuff that the density vector math for both local variables. Things get thrown off a little bit. In this case, it’s just not the end of the world. Getting one row back when you expect 45 rows, it’s not that big a deal. Go away, zoom it. I’ve got a couple posts that talk about some of these things. They’re in this script file, but they’re here to remind me to put them in the show notes.

So those will be links in the YouTube video blog post wherever this ends up. We’ll see. See how wild I feel later. But those will be available there. Anyway, let’s change this a little bit because when you get when you’re just getting one row back, pretty rare for everything to like fly really badly off the rails. But what I’m going to do is I’m going to replace both of these with zero. And the reason I’m going to do that is because there are a lot more rows where upvotes and downvotes are zero for users because a lot of folks join stack, stack overflow, stack exchange, ask a question, get downvoted into oblivion, never come back, never cast votes, never do anything.

So there’s a whole lot of people qualify for these zero filters here. And this is where things get a little bit more dangerous, right? So like for low numbers of rows, almost any plan is going to be okay. For a big number of rows, it’s less okay. And I see people use this a lot and try to say, well, I did it to fix a parameter sniffing problem. Like, well, if all you’re ever getting back is a small number of rows, you probably don’t have much of a parameter sniffing problem anyway, because the parameter sniffing problem comes from when you have giant variations and how many rows might come back depending on the search argument.

So if you’re just always working with a relatively small number of rows, like you can’t just say, blanket, I have a parameter sniffing problem, because most of the time parameter sniffing is a good and fine thing. You get plan reuse, SQL Server uses the same plan, doesn’t set your CPUs on fire, generating execution plans over and over again, much like a recompile hint would or something like that. So let’s look at these queries now when we are dealing with more rows than just one.

Now, we’re going to get all four results back. And since I don’t have an order by on any of these queries, like we don’t like we don’t, we’re not going to get results back in the same order for any of them. This is actually also kind of a good lesson, like aside from, you know, the local variable thing is this one, the reputation is all scrambled here.

And this one reputation is in fairly decent order. This one reputation is a bit more scrambled here. And in this final one reputation is back to sort of being in order.

And we’ll talk about why when we look at the query plans. But let’s go look at how these performed. Now, the top query makes a startlingly good cardinality estimate only off by 10%. Not bad SQL Server, right?

Good for you SQL Server. But this all finishes pretty quickly. Since I’m using SQL Server 2022, I’m in compat level 160. I get and I’m using developer edition, which is the equivalent of enterprise edition.

I get batch mode on rowstore for some of these queries here. You can see the actual and estimated execution modes are both batch. So batch mode for rowstore kicks in and SQL is like, oh, I have to do some extra work.

Let’s batch mode this baby up. Talking more about batch modes a little bit further than I want to get in this thing. But, you know, whatever.

We can talk more about that later. But what happens is for the other queries SQL Server makes less and less of a good guess for what’s going to come out of these operations that do the filtering. So the second query, the second query plan rather, this is an astoundingly bad guess, right?

SQL Server kind of only made an okay guess for one of the parameters, not for both. And when you combine the cardinality estimates that it came back with and SQL Server was like, well, there’s two of them. I carry the two.

450. But 450 is an astoundingly bad guess. And because of this, we get a less than ideal query plan that takes round about three seconds to finish. Now, there is no mix of batch mode and rowstore.

All of this thing runs entirely in rowstore. I believe one of the seeks might be batch mode, but it’s really neither here nor there whether that happened. But the important thing is that the reason why this second query result set, which is this one.

Well, it’s identical to this one, but we’ll see they have identical query plans. The reason why this comes back in order is, of course, because SQL Server performs a sort by reputation to satisfy the stream aggregate that happens next. The hash match aggregate returns rows back whatever.

There’s no ordering, nothing like that. So you just get rows splatted back to you and you can sort them in the application. Right? That’s what all the smart people say to do.

This third query where, again, result sets were a bit scrambled. We also get a pretty lousy cardinality estimate on this one of 852 rows. But we can see very clearly from the results that we got 1782285.

So 1.78 million back from that. It’s a seven digit number, right? Yes.

That many. Seven. 10 minus three. Got nervous there. I thought maybe I was missing a finger or something. Anyway. So again, we get the hash match aggregate back. We get the hash match operator in this query plan.

So this one, again, was another one that came back with the sort of scrambled results because there’s no ordering to what comes out of a hash match. The third query, which uses both, or sorry, the fourth query plan, which uses a local variable for both upvotes and downvotes, gets, again, a pretty astoundingly bad cardinality estimate of 45 rows out of 1.78 million. Things fall apart a little bit more here.

Sorry, I don’t know why this thing keeps jumping around like that. I’m going to have to highlight. Just the magic of highlighting for context. The sort spills a little bit here, and this thing ends up taking around about three seconds. So even when we use both local variables, that’s what happens in this fourth one.

Things are obviously at their worst because we get this sort of lousy cardinality estimate for both of the local variables. But even for when we mix and match or mismatch or whatever, however you want to talk about, we still get pretty bad cardinality estimates. We don’t get very good query plans.

The top plan, even though it’s a clustered index scan, makes the most sense here because we don’t have to do a bunch of lookups to get a bunch of other rows. And especially because the cardinality estimates for all of these lookup plans are so low. We also have relatively low subtree costs.

So this one, SQL Server thinks this one’s going to cost about $1.50. SQL Server thinks this one is going to cost about $2.8 query bucks. And we get like just low costs on all of these.

So SQL Server is not even choosing like a parallel lookup plan, which probably would have helped here. But again, is that really what you want all the time? Maybe, maybe not. Probably a little bit more than we can talk about here.

Anyway. Yeah. So using local variables often results in pretty bad cardinality estimates. Sometimes you will not notice them because other parameters or other arguments might filter, either filter out enough rows so that a bad plan choice might not matter.

Or you might just get like you might just be working with a generally small set, small number of rows anyway. So you don’t really have all the problems that you run into when you start working with a larger number of rows. One small thing that I want to point out here before I sign off is that when you use local variables, you also break a new optimizer feature in SQL Server 2022 called the parameter sensitive plan optimization.

And I’ll show you real quick what I mean by that. All three of the queries that have at least one local variable involved just sort of end where the query text ends. This top query plan, return control ever comes back to me, has a little bit more stuff in it, right?

We can see all this plan per value object ID, blah, blah, query variant ID. If we click on this little ellipsis over here and we get all this stuff back, we’ll see what SQL, this is what SQL Server injects into the query text to enable multiple plans for the same query so that we get, we get a better optimized plan for different sets of parameters that have different cardinality estimates involved with them. But using a local variable, even just one of them does break the parameter sensitive plan optimization.

So just something to be aware of there. If you’re using SQL Server 2022 and you expect this magical parameter sensitive thing to kick in and fix all of your bad parameter sniffing queries, using local variables will mess that up. So please don’t do that.

Please don’t fix parameter sniffing by using local variables and then expect the parameter sensitive plan optimization to kick in and do any more work. For you, it just won’t do that. It does not, does not work that way. Anyway, I think that’s about good for today.

It’s about 15 minutes on this thing, which is about five minutes more than I was aiming for. So, yeah, I don’t know. I’m just gonna, I’m gonna go, I’m gonna go now and think about my life. Think about why I talked for five minutes more than I should have.

Hopefully you didn’t find it to be a waste of time. But anyway, thank you for watching. I hope you learned something. And I will see you in the next video. I’ve got a bunch of stuff lined up to record. So, yeehaw.

Hopefully that’s… H

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

3 thoughts on “YouTube Days: Cardinality Estimation For Local Variables And Parameters”

Jason says:

April 14, 2023 at 9:50 am

Erik, I want to start off with saying that I love your stuff and visit your blog frequently. I don’t feel good about “complaining” about free content, and maybe it’s on my side, but when I listen to your videos, I have my laptop speakers all the way up and the video volume maxed out as well and it’s still very hard for me to hear many of your videos (this one in particular). I don’t have an issue listening to other youtube (sql bits) or Brent O videos. Seeing that you’re posting more great videos , I just wanted to share some feedback in case it’s a minor issue with your setup. Thanks for all you do for the sql server community.
1. Erik Darling says:
  
  April 14, 2023 at 9:56 am
  
  Hi Jason — yeah, this was an early test video.
  
  If you check out my YouTube channel, there are a bunch of newer ones with vastly improved quality.
  
  Like 👍 and subscribe 🛎️ etc.
  
  Thanks!
  1. Jason says:
    
    April 14, 2023 at 10:04 am
    
    Will do, thanks Erik

Comments are closed.

Cardinality Estimation For Local Variables And Parameters

Video Summary

Full Transcript

Going Further

Related Posts

3 thoughts on “YouTube Days: Cardinality Estimation For Local Variables And Parameters”