Why Expressions Are Better Than Local Variables For Performance In SQL Server
Thanks for watching!
Video Summary
In this video, I delve into a common issue many developers face: the preference for using local variables over expressions in SQL Server queries. I explain why expressions are often superior for performance and cardinality estimation, providing practical examples to illustrate the differences between using expressions versus local variables. The discussion includes how these choices can affect query plans and execution estimates, especially when dealing with date columns. Additionally, I address common pushback from developers who worry about repeating themselves or maintaining consistent values across their code. I also touch on membership options for my channel, offering ways to support the content and upcoming events, while emphasizing that even without financial contributions, viewers can still engage by subscribing, liking, and commenting on the videos.
Full Transcript
Your best friend, Erik Darling here with Darling Data. Magnificent, the wonderful, the wondrous, omnipresent Darling Data. Up in all your data business. In today’s video, we are going to talk about expressions versus local variables. Because this is one that comes up a lot with clients where, you know, in, you know, programming in general, people have this fear of repeating themselves. Deep fear of repeating themselves. And, apparently, they’re still under the weird impression that typing something once and assigning it to a variable is going to do something magical and magnificent for them. But it’s not going to. Right? And I know that I’ve recorded a lot of stuff about local variables, but this is a very specific case to comparing date, date, date, date, date, time, whatever columns. And so we’re going to talk through all of that good stuff. Before we do that, if you feel so obliged, so kind, if you feel just so beneficent in your ways, you can get a membership to my channel. There’s a link for that in the channel description. It is the one that is the name of my channel with slash join at the end.
Then you can choose just how generous you’re feeling every month when it comes to a little bit of channel support here. At some point, there will be member exclusive stuff. I’m just a little bit crunched right now. I’m preparing for, well, we’re going to talk about it in a few slides, but I’m preparing for a few events in the near future that just have a lot of my attention right now on top of the normal client work, on top of having a wife and kids and all that other stuff. So I haven’t had a lot of time to put thought into how I want to structure the membership stuff here, but I will probably be getting to that probably around the holidays here. So if you get in now, you’ll probably get some sort of bonus multiplier thing. I don’t know yet. I got to figure it all out. So if you have no money, if your pockets are empty, if you cannot pay for a cheeseburger today or next Tuesday, you can do other things that let me know you care and that you love me and that you’ll never leave me.
Like you can like and comment on, you can like my videos. You can comment on my videos. You can subscribe to my channel and join an excess of 4,300 other data darlings out there in the YouTube-averse. God, I feel awful saying that. I don’t know why it came out of my mouth. So, and subscribe to my channel for free and just watch the stuff and maybe, you know, enjoy yourselves, learn something, that song and dance. If you need consulting, I am pretty good at this stuff. I can also do other stuff, SQL Server related. If you need something not SQL Server related, I would wonder what it is.
But we can always work on that. And as always, my rates are reasonable. If you want some cheap, well, you know, not cheap like quality, but you know, if you want some cheap money-wise SQL Server training, you can get about 24 hours of it from me and me alone for about 150 US dollars with those discount codes and stuff. There is beginner, intermediate and expert level content in there. So no matter where you are, you can learn something and enjoy yourself.
Thank you for watching. Let me just end it right here, right? Like I said, this is a chat GPT image. Deal with it. I do have upcoming live events. Friday, September 6th, I will be at Data Saturday in Dallas giving away pokey treats and, oh, where are they? Sticky treats.
So I have all sorts of treats for people who show up. November 4th and 5th, I will be at Past Data Summit in Seattle co-hosting two. Oh, those banger days of performance tuning content with Kendra Little. And you can spend both of those days with us because it’ll be the best two days of your SQL Server life.
All right. And that’s all there is to it. And with that out of the way, let’s begin the festivities. Let’s get to partying here because that’s what we do best.
All right. So what we’re going to talk about, again, is why expressions are better than local variables in SQL Server for performance and cardinality and all sorts of other important things. So what I’m going to do is I’m going to create a table called Express Yourself. Sort of a little bit of a joke there, a little play on words, you know.
Express yourself. But because we’re talking about expressions. That wasn’t obvious. I don’t know what is.
And that table is pretty simple. It’s just two columns. And really the important thing is that we have this thing in there. Right. Great. We did it.
We rule the world. We rule the seas and the skies and the dirts and all that stuff. And if we query that table and we look at this query plan, because that’s largely what we do around here at Darling Data is look at query plans and go, oh, not going to make it. When we look at the execution plan, you will see that we got a rather good estimate when using expressions in our query.
I actually framed this probably badly for that particular zoom in. When we use expressions in our aware clause, we get pretty good estimates about what will happen for things. And I know I’ve talked about it before.
If we use local variables rather than expressions, we get actually just do both of these at the same time because they’re both equally annoying. If we use local variables rather than expressions. And also I have two slightly different expressions in here for the later local variable.
One adds one year and the other adds, oh boy, Zoomit is struggling today. One adds one day or maybe it’s just my mouse that’s struggling. It’s tough to tell right now.
Struggles abounds. We get the same kind of goofy estimate for both of them of 171024, regardless of how many rows are actually liable to come out. Now we saw when we use the expressions that this, at least the first one, I didn’t try the second one, obviously.
I was just trying to get to this demo. But we got a good estimate before. Now for this one, we are off by 606%.
And by this one, well, this one is just way lower compared to what actually comes, or way higher compared to what actually comes out of the table. But notice we get the exact same number for both of them. The same thing is true if we use a stored procedure.
Now this cardinality estimate is going to be a little bit different because the start date in this one is a parameter, whereas before they were both local variables. So we’re going to get a slightly different estimate on this one, but it’s still going to be pretty wrong compared to when we just put the expression in there. So if I run this, right, and if I just go get the minimum start date in the table, we get the same result back for both of these.
But notice that the one that we use the expression in, right, is a lot closer to reality than the one that we use the local variable in. Now, before we had that guess of 171,000 some odd something there. This change is because we have a parameter for start date.
SQL Server is able to do good cardinality estimation for start date, but it does bad cardinality estimation for end date because it’s a local variable. So having the expression up here yields better results than storing something in a local variable like we did down here. Where I run into pushback sometimes with these things is someone will inevitably say, well, yeah, but I need everything to be the same date going across all of them.
The good news is if you operate with your start date parameter and you add time to that, it’ll always be the same no matter when a query runs. But where it won’t be is if you use, of course, one of the non-deterministic built-in functions like get date, right? If you use get date over and over again, it’ll be blah.
If you build off the start date, it’ll be blah. That was a good blah, not a bad blah. I thought the smile kind of would help with that.
Maybe it did. But now if we run this, we can sort of illustrate the problem generally. We’ve got to wait like five seconds because there’s a five-second wait for in there.
So let’s put these a little bit closer to each other and let’s zoom in. So the start date parameter for both of these is before and after the five-second wait for is the same, which means that if we use an expression to calculate later using the start date parameter, that’ll be the same for both of these two.
If we set later to a value based on start date, that will also be the same for both of these. Notice that these two blocks in here line up across both of these columns. Where that doesn’t work is when you build something off get date over and over again, right?
So if we just add five seconds to start date, that’s a lot different than adding five seconds to get date before and after the wait for. There is indeed a five-second difference here between 41 and 46. So that’s where people get kind of weird and screwed up.
Now, another thing that people will generally do in these procedures is have some sort of handling in here for start date, right? They’ll say, oh, if start date is null, set it to get date. I don’t know.
I guess that’s reasonable. But you’ll get the same results from this, right? Like this won’t change anything. This will make for a weird cardinality estimation, though. And the reason why this is going to be weird is because if you pass in start date up here with a null value, SQL Server doesn’t do cardinality estimation based on you fixing it here.
SQL Server does cardinality estimation based on start date being null. So if this is the kind of thing that you have to do, well, we’re going to talk about your options in a few minutes. I promise.
But you would end up with the exact same situation here as you did before, where if instead of passing in the start date with a value at, you know, when you execute the store procedure, you pass it in and you correct it within the store procedure, everything turns out the same, you know, including this column being five seconds different, right?
So, okay. Let’s say, for some reason, you want to use that local variable to insert data into a column. Let’s say it’s like the run date for a batch process and you, for all the rows that you put into a table for a particular run of that batch process, you want all the times to be the same.
It’s totally cool to use a local variable for that. It’s also cool to have a parameter for that. If you need to update column, like you need to do like a bunch of updates across a bunch of tables to say, hey, like these, you know, tables all changed at the same time, it’s totally cool to use a local variable for that.
You’re not going to mess with performance there, right? That’s not where local variables cause problems. If your local variable is getting used as a filter somewhere, that’s kind of where you should sweat it, but only if you’re having a performance issue.
You know, while it is nice to go around and correct all of these problems so that your code base is just, you know, a golden god where there’s nothing wrong with it and everything runs perfectly, most of the time when you’re trying, when you’re like me and you’re a consultant trying to make something go faster, you have to kind of set a threshold with things.
And a lot of the times if, you know, queries just aren’t going slow, I’m not going to spend a lot of time typing to fix them. Excuse me. Because it’s just not worth it because I have other slow queries to go fix.
I’ve got to fix the slow queries before I can circle back and start, you know, sort of fine-tuning everything. Your options when you have this problem and if there’s a performance issue are things that I’ve talked about a whole bunch on this channel before. You can use option recompile at the statement level.
You can use parameterize dynamic SQL and pass your local variables into that. Or you can pass those local variables to another store procedure because that other store procedure or the parameterized dynamic SQL will, of course, accept your local variable as a parameter, right?
Because those are different modules and they sort of change the context of things enough for you to get the better cardinality estimates from that. You can also just use a recompile hint and not bother with all the extra typing and testing that dynamic SQL involves. If you do go the route of turning local variables into parameters and passing them into store procedure or to dynamic SQL, this is where you have to be careful about, especially with time-based ones, right?
With, you know, if you have an integer or a string and you’re looking for a quality predicates, you know, it’s where you have to worry about skewed data within the column as a problem. When you’re dealing with date, date time, stuff like that, date time to zero through seven, when you’re dealing with that, you need to worry about the skew being in the range that people look for.
And that’s not something that the parameter-sensitive plan optimization that came up in SQL Server 2022 addresses. Parameter-sensitive plan optimization only works for one parameter and only if it’s an equality predicate and only if the heuristics kick in to say, yes, you should do this.
You have to meet those three things. And then you get, I think not coincidentally, three different execution plans based on the small, medium, and large buckets of values. But range values don’t get that.
So equality predicates, yes, greater than, no, less than, no, greater than, equal to, no, less than, equal to, no. If you are worried about parameter-sensitive plan stuff and you have date range predicates, one thing that you can do is build your procedure with dynamic SQL sort of like this.
And I’m going to highlight this and run it and then I’m going to talk to you about what I did. So normally, when you have a date range, you’re worried that someone will be searching for a small range and then suddenly start searching for a large range or be searching for a large range and then start searching for a small range.
And either way, in either direction, sharing the query plan between those can be pretty bad. One thing that you can do with your dynamic SQL is set it up so that you execute a slightly different query based on how many, like, you know, based on some spectrum of things like this.
So for this, I’m just saying if it’s less than one month, less than or equal to one month, add one equals select one onto the end of the query. If the difference is between two and six months, add two equals select two.
And this is just to keep things simple. These aren’t necessarily the date ranges that I would always use or the month differences that I would always use. This is just to make things kind of simple when we’re looking at stuff and testing.
And then if it’s more than six months, then we want to attack and three equals select three on. And what that gives us when we execute the procedure is this. Let’s start with one in here.
And let’s highlight this. Remember to highlight this whole thing and run it. What I want you to pay attention to is the cardinality estimate in here. We’re just doing a simple single table query.
We already have a good index in place. It’s not like SQL Server has a big variety of plan choices to choose from here. SQL Server is just going to seek into this index no matter what I do. I’m just getting a count.
There’s no need to think about lookups or anything else. What I want you to pay attention to here is the cardinality estimates. Because under normal circumstances, if we weren’t adding that and one equals or and something equals select something onto the end of the query, SQL Server would reuse the cardinality estimate for all three of these examples that I’m going to show you.
Because we’re adding those slightly different strings on with slightly different literal values, those are going to hash out to different queries. And SQL Server is going to come up with different cardinality estimates for each one. Within that, if we were to execute the same one multiple times, SQL Server would totally reuse those query plans for each individual one.
But it’s not going to reuse the plan for one month and the one for six months. You’re not going to have that parameter sensitivity in there. So now if we change this to do three months and we rerun this.
Actually, before I do that, let me come over here and show you the messages tab. Because we’re printing out our dynamic SQL. And there’s our one equals select one.
And then if we change that to be three months, which falls into our two and six bucket, then in the messages tab, we’ll see our two equals select two in here. And in our execution plan, we’ll have our accurate cardinality estimate for that.
Right? So good news across the board there. And now let’s change this to nine months. Because that will get us outside of the final bucket of six there.
And if we rerun this and we look in the messages tab, we will see the three equals select three at the end here. And if we look at the execution plan, we’ll have an accurate cardinality estimate for that. And this is generally, like the name of the game here is to just find a generally good number for cardinality that makes sense across whatever, however many buckets you want to create with your different things in here.
Are there more complicated schemes for this? Yes. Could you build your own histogram for this? Yes. Could you figure out all sorts of other things to do with, like, optimize for? Absolutely.
You could totally do that. But this is one thing that I use quite a bit when working with clients because this helps me come up with a fairly quick and dirty solution to generate different but reusable execution plans based on how big of a date range we’re dealing with. So anyway, in general, for most things that you will do in SQL Server, using expressions is better than using local variables for performance because of cardinality estimation.
You can get very wrong estimates, and depending on, like, where those estimates are used, you could have all sorts of trickle down bad plan choices, bad, you know, bad choices in, you know, how many rows or pages we decide to start locking, all sorts of things. And you can have bizarro performance issues because of that. I can’t say it enough that you should just avoid that situation altogether.
Most of the time, what I tell people is if you’re using local variables and you’re having trouble with a query that’s using local variables and a join or where clause, slap a recompile hint on it and see if SQL Server comes up with a better execution plan. If it does and you’re cool with the recompile hint, leave the recompile hint on. If it does, but you want plan reuse, then turn that into dynamic SQL that’s parameterized, and you will be able to do all sorts of neat tricks to get the good execution plans across the board.
So, with that out of the way, from your best friend here at Darling Data, me, Erik Darling, thank you for watching. I hope you enjoyed yourselves. I hope you learned something new.
I hope that you will continue to watch all of these, all of this wonderful free content, regardless of if you sign up for a membership or hire me for consulting or buy my training. Come to see me perform live at Jocko’s Comedy Hut, Paramus, something, I don’t know. Anyway, these lights are getting hot and my brain’s starting to fry, so I’m going to call this one and I’m going to figure out exactly which video I’m going to record next.
So, as always, thank you for watching. You’re too kind.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.