Deadlock Archives | Page 35 of 138

IF Branching In SQL Server Stored Procedures: Fixing Parameters, Recompiling, And A Good Enough Plan

Posted on December 19, 2024January 22, 2026 by Erik Darling

IF Branching In SQL Server Stored Procedures: Fixing Parameters, Recompiling, And A Good Enough Plan

Thanks for watching!

Video Summary

In this video, I delve into the complexities of if branching within SQL Server stored procedures and explore various strategies to mitigate issues related to parameter sniffing. We discuss the differences between `WITH RECOMPILE` and `OPTION (RECOMPILE)`—how they can be used to address performance problems but also highlight their limitations. By sharing practical examples, I illustrate how these techniques might not always provide a comprehensive solution and may even introduce new challenges. Additionally, I touch on the importance of maintaining a balance between optimizing query plans and ensuring efficient execution, especially when dealing with null parameters or complex search patterns.

Full Transcript

Erik Darling, Darling Data. So many Ds, so little time. In this video, we’re going to continue on with talking about if branching. In the first video, we talked about some patterns and some problems that can arise when you use if branching in SQL Server, stored procedures. In this one, I want to talk a little bit more about sort of different ways I see people try to intervene with if branches in stored procedures. A little bit about the width recompile versus option recompile, and one potential solution to the problem of if branching in stored procedures. It may work for you, it may not work for you, but we’re going to talk about it, and we’ll let you make up your mind if that’s the right thing for you. Before we do that, of course, we need to talk about more about me, and me wanting to buy Christmas presents for my kids, or just keep recording videos so that you don’t have to pay exorbitant amounts of money to get training, which is probably nice too. You can sign up for a membership to this channel, and for as little as four American dollars a month, you can say thank you for all the hard work that I put into these things. If you don’t have 4x of dollars a month for whatever reason, I don’t know, maybe you owe credit card debt, maybe you spend too much money on clothes and booze and vacations, and I don’t know, maybe bad divorce settlement, I don’t know, whatever it might be.

I can’t tell you why you don’t have 4 bucks a month. I wish I could. 4 bucks. Pretty small amount of money, even on a monthly basis. You can do all sorts of things to say thank you, I love you. You can like the videos, you can comment on the videos, and you can subscribe to the channel and join the over 5,150, 60, I forget, data darlings who love this channel enough to get notifications when I post new things.

So that’s all cool. It’s all fine with me. If you are the type of person who has a SQL Server and needs help with that SQL Server, I am available for all sorts of consulting work. I do all of these things for a very reasonable rate, and I’m the best in the world at them. So if you’re thinking about hiring someone else, you’re screwing up. If you would like some very high quality, very low cost SQL Server performance tuning training, I’ve got over 24 hours of it.

You can get it for life for about 150 USD with that discount code right over yonder. So that’s a pretty good deal too. Now, I do enjoy speaking live to people, traveling, going places, talking to people, pressing the flesh as it were, but I don’t have anything right now. I finished up past data summit, and we’re getting towards the holidays, and I got nothing.

So if you’re planning events for 2025, and you want someone to come and talk about SQL Server performance tuning, I’m pretty good at it, just like consulting. And I don’t know. I tend to sell a lot of tickets if that’s any help. So we might be able to benefit each other this way.

But with all that out of the way, let’s talk a little bit more about this whole if branching thing. Now, if you watched the last video, or if you’ve been working with SQL Server long enough, or I don’t know, maybe you read a couple blog posts and now you’re an expert, you probably think parameters have a really, really bad name.

They get sniffed. They get snorted. They are sensitive. All sorts of bad things happen when parameters get involved, it seems like. But that’s not really true. Parameters are often a great thing.

They’re often wonderful things to help you have stable query plans and performance and not have to compile all the time. And not just, you know, I don’t know. Look, it’s usually a pretty good thing. I don’t understand why you’re freaking out.

Now, almost everyone who I work with has a bunch of store procedures, either written by them, someone else, that person who hasn’t worked there in 10 years that they’ve been meaning to fix. And they all do something to try to beat parameter sniffing.

They’ll use option optimized for unknown. They’ll declare variables inside of the store procedure and set those variables equal to the parameters. And some people will even have the gall to tell me that it’s a best practice.

They’re like, oh, we had some parameter sniffing, so we just turned off parameter sniffing, and we got a crappy plan for everybody, and it was better. Well, I refuse to accept that.

If you’re calling me to talk about performance problems, and we see a performance problem with a store procedure doing this, it’s not fixing the problem. It’s not. You have not fixed it.

You have disabled parameter sniffing. Anyway, it’s probably not a good thing. Anyway, one thing that I see a lot of people do is try to intervene inside of a store procedure and do something with it to right a wrong.

Now, I want you to notice that I’m saying if creation date is null, we’re going to set creation date equal to 2008-0101. We’re just going to give it a canary value so that we find something.

Up here in the store procedure definition, I’m even giving creation date a default value. The problem is that neither one of these things override an explicit passed-in value when the store procedure executes.

Right? So what I’m going to do is show you exactly what happens via this magic, and this magic, oh, this is a rather slow kind of magic.

This is that Sunday driver kind of magic. If you’re watching the little clock tick away over here by this armpit, you’ll see that we have been ticking away for almost 15 seconds.

Exactly 15 seconds on the nose. Not bad. I forget if I predicted that or not. Someone predicted that.

I don’t know who. But if we look at this store procedure, we’re going to see a very bad estimate here. A whole bunch of rows out of one row. Not a good estimate. And if we look at the properties of the select operator, we’re going to see our creation date has a parameter compiled value of null and a runtime value of 2008-0101.

So despite the fact that we had an if branch that intervened and changed our parameter value to something else, the store procedure still compiled and still did cardinality estimation for creation date being null. All right?

We did our best, but we were just like, hey, creation date, you are null, sucker. All right? And so we got a really bad cardinality estimate based on that. You can substitute this with like, well, I mean, it’s not a direct substitution.

It’s somewhat different. But imagine if you were the type of person who declares a bunch of local variables in your store procedure. Or maybe you are the type of person who declares a bunch of local variables and sets them equal to values and then uses those.

Or you’re the type of person who declares a local value and sets them equal to parameter values. You can end up with similarly bad performance because you are getting that awful density vector guess rather than an actual cardinality estimate based on the values that get passed in.

If we recompile this and we run this with the actual value 2008-0101 passed in, we get a much different execution plan that makes a lot more sense for this query executing, right? We get a good cardinality estimate.

SQL Server is able to figure a whole bunch more stuff out. And we get just generally better performance. Now, I do understand the sort of reticence to let SQL Server sniff parameters and do its thing with those because you can run into problems.

We’ll talk more about that in the next video. But one thing that seems to escape a lot of people is that making SQL Server use the density vector guess that you get from a local variable is not a direct fix for parameter sniffing.

You might get okay plans for some values. You might get bad plans for other values. But the thing with density vector guesses is that they assume sort of uniform data distributions in the table.

So if you have the type of wildly skewed data that leads to parameter sensitivity issues, then you assuming that the data is uniform is not going to fix that problem. It’s not going to get you what you want.

Now, there are all sorts of similarly bad search patterns that I see employed that attempt to, you know, do something like this, right? Essentially intervene with the optimizer and say, hey, if that’s not a thing, then do something else.

You might see people do isNull like this and say, hey, if that parameter is null, substitute this. Or you might even see people pass in the column over here too.

So rather than passing in a literal value, you might say, oh, well, if the parameter is null, then where it’s greater than or equal to the creation date column. Bad.

Same problem with coalesce. Same problem with doing this and saying the parameter, the column is greater than the parameter or the creation date is null or like, or the creation date is greater than or equal to creation date.

And especially noxious1 is using a case expression like this where the creation date column is greater than or equal to. Case when creation date is null, then p.creation date, lsp.creation date, right?

It’s like, there’s like a whole bunch of dumb things you can do in there. Now, one thing that I want to pause and talk about in a slight sort of digression about these, about sort of similar topics to this is the use of recompile.

There are two different ways to add recompile to a store procedure. And using recompile can solve parameter sensitivity issues because SQL Server, within the context of a store procedure, not an ad hoc query, within the context of a store procedure, if you create the store procedure like this with recompile, right, if you do this, SQL Server will not cache any plans for the store procedure at all.

It will evaluate parameters on each execution, right? So every time this runs, SQL Server will take whatever parameter values you pass in and do cardinality estimation based on those parameter values.

What this does not do is parameter embedding. And I’m going to show you an example of what I mean by that in a minute. If you use option recompile, it will only cache the most recent plan for any statements that you have the option recompile hint for.

It does not reuse those plans, but you will have the most recent version in the plan cache. Just like using with recompile at the store procedure level, it will evaluate parameters on each execution.

And option recompile also does allow for parameter embedding. So let’s just use like the example that we have above. And what I want to show you is what that means when you have a with recompile hint and you do something like this, okay?

Now this is the store procedure level with recompile, and we are, they have one parameter, and we are telling SQL Server to say where the column is greater than that parameter or the parameter is null.

Now before I run the store procedure, I want to show you that I do in fact have two indexes on this table that lead with the creation date column. So SQL Server should be perfectly capable of seeking into either one of those indexes to figure out, to find the data that we care about.

But when we look at the execution plan, we do use one of those indexes, but we don’t seek into it. We scan that index.

If we look, if we hover over that index, look what happens. We have a predicate instead of a seek predicate. So we have this residual predicate where SQL Server is like, well, it’s either it’s null or it’s not. Now that’s a crappy problem.

Note that we do get a good cardinality estimate from it, right? SQL Server does make a good accurate cardinality estimate from it. And if the tooltip will get out of my damn way, we’ll see in the parameter list, we have the compile and runtime value are exactly equivalent.

That’s because we used with recompile at the store procedure level. If we change that parameter value and re-execute, SQL Server will create this plan again, or rather we’ll compile this plan with whatever parameter value we pass in, but it won’t embed the parameter the way that it can with a statement level option recompile.

Let’s take a look at what that means, right? So now rather than having, like we took the with recompile out of this part of the store procedure and we stuck an option recompile hint at the very end, right?

So we’re going to, let’s do this and let’s execute this in the exact same way. Now we still get the same cardinality estimate, right? About 99%.

That’s right. But note that we don’t have an index scan anymore. Now we have an index seek. This is the difference between the store procedure level with recompile and the statement level option recompile. SQL Server is able to do parameter embedding with this one and change that or clause into a seek with the literal value that we pass in.

So there are some differences in how SQL Server treats option recompile versus the store procedure level with recompile. If the problem that you’re trying to solve for like an entire store procedure is just parameter sensitivity and you want SQL Server to come up with a cardinality estimate based on whatever parameter values you pass in, you can probably do that and get away with it.

If the problem that you’re having is a lack of parameter embedding because maybe you have catch-all store procedures that do stuff like this or repeat the search patterns that I showed you up here with is null, with coalesce, with a case expression, SQL Server does not do parameter embedding in those instances, right?

So the store procedure level with recompile won’t fix that stuff. It will fix cardinality estimation issues, but not parameter embedding issues.

So be very careful with how you use with recompile at the store procedure level versus how you use option recompile at the statement level. All right.

So one thing that you can do to get around to short circuit a lot of problems with either null parameters and like maybe fixing those with like say, hey, if this parameter is null, change it to this.

And maybe like another, you can also solve problems, a lot of performance problems with like compiling if branches with just like a regular query in there is to create store procedures that run to do the thing you want to do within those if branches, right?

So if you have like sort of a minimal amount of code and you’re not too worried about it, creating a store procedure that does the stuff in those if branches can be tremendously useful.

So let’s create a store procedure to do the post query. That’s going to be this one. And let’s create a store procedure to do the votes query. That’s going to be this one.

And now we’re going to create a wrapper store procedure that sort of runs and does something depending on what which post type ID or vote type ID is null or not.

Now, I realize that there’s a lot of like handling in here that should be done that’s not done just for the sake of brevity. I’m skipping over a lot of stuff that I would normally do if I was writing production level, production ready code.

This is just to make the example a little bit tidier. So this can be a pretty good way of not compiling a plan for an if branch unless that if branch is explored.

SQL Server will not compile a plan for a branch that doesn’t do a thing. The problem is that this does introduce parameter sniffing for each of those store procedures.

So if we were to run a little version of the store procedure, we would explore that if branch and we would do a thing in here. And then if we run a bigger version of the store procedure, right?

Like post type ID 8 doesn’t have nearly as many values as post type ID 2. Post type ID 8 is like a weird wiki thing that hardly anyone ever does. Post type ID 2 is questions.

So obviously there are a lot of, there are like 11 million questions in the post table. So reusing the plan from one to the other does not work out as well as we would hope. SQL Server, you know, we reuse the estimates for the first execution.

And we sort of have some problems with spills for the hash join. And this takes about 11 seconds rather than finishing very quickly like it does for post type ID 8 here, right? This finishes in just a couple of seconds overall, right?

So one and a half seconds there versus like 10, 11, 12 seconds for the other one. So you will still have parameter sensitivity potentially per store procedure that you have in there.

But at least you don’t have it sort of globally at the store procedure level when you just put queries into if branches. In the next video, we’re going to look at different ways of solving parameter sensitivity issues in here.

And actually, that might be it. I forget. We’ll, you know what? We’ll see when we get there. You have to watch it to see what’s in it. But I promise, it’ll be fun.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you over in the third video in this series, the final video in this series. We’ll talk about all that good stuff.

So thank you. Thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

IF Branching In SQL Server Stored Procedures: Patterns And Problems

Posted on December 18, 2024January 22, 2026 by Erik Darling

IF Branching In SQL Server Stored Procedures: Patterns And Problems

Thanks for watching!

Video Summary

In this video, I dive into the complexities of if branching within SQL Server stored procedures and how it can significantly impact performance. Erik Darling from Darling Data shares his insights on why these constructs are problematic, especially when dealing with parameter sniffing issues. He highlights that while simple if branches might not cause immediate problems, they can multiply the effects of parameter sniffing across multiple queries, leading to suboptimal execution plans and slower query performance. By walking through various scenarios and providing detailed examples, Erik illustrates how these branching conditions can introduce unexpected cardinality estimations, making it crucial for developers to be aware of these pitfalls when designing stored procedures.

Full Transcript

Erik Darling here with Darling Data. Boy, we just full of energy today. I’m not sure where it came from. Maybe it’s the Lord’s energy. It’s hard to tell. I’ve never had that before, so it would be brand new to me if that were the case. But in today’s video, actually for the next three videos, we have a three-part series coming up about if branching in SQL Server stored procedures. And in today’s video, I’m going to teach you about the sort of problems, about the problems that come along with if branching in stored procedures.

It is definitely not the Lord’s work to do this sort of thing, unless you write your code very, very carefully. But before we get into that, I do have to make a few sales pitches because I guess I’m a used SQL Server car salesman and that’s what we’re going to do. If you like the work that I do and you want to support the work that I do with money instead of just with like watching the video for a few minutes and then going to do something else, you can click the link in the video description, the very first one next to the words like become a member or whatever.

And you can do that for as little as four bucks a month. You can join the 30 other people who are contributing to my nearly 119 pre-tax dollars monthly that I make from this channel. It’s amazing stuff. Life changing. If you would like to support this channel, but the four bucks a month is too much for you. You need that for your lotto tickets or whatever else. Infant for your formula, whatever.

You can do other stuff. You can like the videos. You can comment on the videos. And of course, you can subscribe to the channel. The subscription thing is a one time effort on your part, but it goes a long way towards you getting notifications along with over 5,150. Over 5,150 other data darlings out there in the world who all keep up to date on SQL Server stuff by subscribing.

Oh, I skipped one. Don’t want to skip this one. This is the important one because if you need help with SQL Server in any way, shape or form, I am the best consultant in the entire world. I do all of this stuff very well. And as always, my rates are reasonable. If, oh geez, this thing is real skippy today. I don’t know what’s going on with. PowerPoint is like rebelling against me.

Usually it’s Zoomit. Today it’s PowerPoint. If you would like some very high quality, very low cost training about SQL Server, all about SQL Server performance tuning, you can get mine for life for about 150 USD. Good luck beating that other Black Friday sales. I don’t know. Maybe they’ll probably beat me in other ways, but it’s a hell of a deal. Check it out.

Upcoming events. Yeah. PowerPoint’s on a weird lag today. All right. Well, we’re not fixing that. We may be able to performance tune SQL Server. We cannot performance tune PowerPoint. That is beyond beyond our can. Upcoming events. I got none. And quite frankly, I’m not doing none until after the new year.

But if you are hosting any events in the year 2025 that, you know, involve in any way, shape or form SQL Server stuff, and you would like me, Erik Darling, as a pre-con speaker to show up and do this sort of training there. I can do that. You just have to tell me where and when those events are so I can make arrangements to show up there. But I think that about does it. And we’re going to wait. We’re going to click and wait 15 seconds for PowerPoint to progress.

And now I’m going to say, let’s party. All right. So if branches and store procedures. Bad news for performance. Like really cause a lot of problems. The trouble, the thing that no one understands about this is what happens when you put if branches in a store procedure. And let’s say the store procedure looks something like this. And of course, these tables don’t exist.

That’s why there is a cadre of red squiggles in this code. But if you write your store procedure like this, the very first time that this store procedure compiles, and on any subsequent compilations after a reason for recompilation, right, like a plan gets cleared out of the cache, you restart SQL Server, the plan gets evicted from the cache, no memory pressure, any one of those settings changes, any one of those things that clears out the plan cache, any subsequent recompilations, SQL Server will compile both of the queries, compile query plans for both of the queries that you see in these if blocks, based on the first set of parameters that it sees. It is classic parameter sniffing behavior, not parameter sensitivity, not yet, but parameter sniffing behavior.

So if the first time we, we run this store procedure, we say we want to select the data from this table where parameter one equals this. Well, that might be okay for that first query. But if you run that query again, like without any, without a recompilation thing happening, then SQL Server is going to use the plan that it compiled for the second query, it’s going to reuse that plan, most likely with the parameter value for p2 as null.

And that can cause a lot of problems. I can cut, oh, geez, mouse, calm down, buddy. So, yeah, that’s, that’s essentially what happens in there. And you’ve really just multiplied your parameter sniffing or parameter sensitivity problems for each query that you put into an if branch like that. So, uh, the first time this runs, we say p1 equals 100 and p2 is null. And the second time, and the, you know, then, uh, future executions, we decide to start looking at the other table.

Uh, you’re going to get a lot of, you might get a really, really bad execution plan for the second query. Now, this is just a very, these are just two very simple queries to get you used to the idea of what happens when store procedures like this run, uh, in compile plans. We’re going to look at, uh, in this video, we’re going to look at times when that matters and when that doesn’t matter. Okay.

There’s really one time when it doesn’t matter and it’s, it’s probably pretty rare. So, uh, this is the one case where it would make absolutely no difference, uh, if you were doing if branching and let’s, let’s, so let’s create this procedure and let’s look at what it does. Now, the important thing here is that the only thing happening in this, in this procedure is we are using if logic to figure out which branch of this procedure to run.

There are no parameter values in any of these queries. We are not filtering on any parameters. We don’t have any, like, I mean, if there were literal values, it wouldn’t make a difference either, but we’re just doing straight joins. Um, there’s no parameters, like, uh, there’s no parameters to sniff. Basically we are out of parameters to sniff in this one. So for every time this runs, the only thing we’re going to get is the full join cardinality for any of these queries.

But it still functions the same way where SQL Server figures out join cardinality based on that initial compilation. Uh, but in this case, it just doesn’t matter because the, the, the cardinality estimation is just the full joins. Right. But let’s look at just an example. Let’s look, let’s look at an estimated plan for just one of these.

So you can see what I mean by all of the plans get, get a plan compiled when this executes. This is what it looks like to SQL Server. When the, when the store procedure compiles, you have the store procedure itself. This is where we set no count and exact abort on. Right. So that’s our T SQL on and off.

And then you have this conditional, right? This little question mark, what’s going to happen. We don’t know. Is it going to be you? Do you get picked? Do you get picked? Who gets picked? But SQL Server like, like pretty much like that, like comes up and compiles a plan for every single one of these joins.

When the store procedure compiles, right? When you actually execute it, this is what the cast plan will look like. It’s going to look exactly like this. And you’re not going to really know what, which one of these ran. And it’s going to probably be confusing and you’re not going to like it too much.

So that’s what the plan looks like to SQL Server when it, when it compiles something and when it caches something for this, for this store procedure over here. Right. Because we just have an if, and then a, like a, just a query that runs, this is what it’s going to look like. A somewhat more realistic example is something like this. And I don’t know, maybe this wouldn’t be realistic to you.

I’m going to cover a few different scenarios that kind of like this, where we have just a single date filter parameter. Right. And we’re going to use just about the same set of queries up there, except now each one of them has a table where we’re using that date filter as a parameter, as a, as a parameter, as that we’re using that parameter as a predicate across a bunch of these queries. Right. So if we create this thing and we run it, we’re going to see that not only do we like, not only do we have parameter sensitivity for one query, right?

Like if I run this for badges, this is, this is going to run quickly because we don’t find too many rows. But then if we run these two where we get, we’re going to search for badges again. But we’re going to go back a little bit further and then we’re going to search for comments and just go back a little bit further.

Now we hit a parameter sensitivity issue and we’re going to have a parameter sensitivity issue for every single query in the procedure because we are reusing the cardinality estimation for the sniffed value here. Right. So both of the top for both of these further executions, we’re going to get really bad cardinality estimates because this is the, this is the parameter sensitivity thing. We get a bad cardinality estimate here and that translates into a much slower plan all around.

And then we get a bad cardinality estimate here and that translates into a slower plan all around. This plan is different, but this was, this, this would have been the initially compiled plan for this one anyway. Like if we go and we look at the estimated plan for this with the date filter in there, like that first query, this is the serial plan for that.

Right. And then when we hit the comments table, we got, this is the parallel plan that we got for the print for the parameter value compiled as 2013 0101. And then we’re going to see the same thing basically repeat for the other queries. Now you can, you can see that when you get the actual plan, if we do this one again, this only takes a few seconds, but we run this and this finishes.

And then we look at the, we right click on the select operator and we go to the parameter list. The, for date filter, we have the compile value down here, right? 2013 1231 and then the runtime value of 2013 0101.

So SQL Server didn’t wait to compile this plan until like, until it saw like, until the if branch got explored. SQL Server compiled this plan. The first time the store procedure compiled everything, compiled all four plans and you can reuse cardinality, reuse the cardinality estimate for that there.

So reusing the same parameter across a bunch of queries, regardless if they touch different tables, now you have parameters sensitivity for everything. A really common thing that I see with, with these sorts of branches is like maybe doing different sets of joins. But if like, we’re going to have essentially the same problem with these two, where let’s create this and let’s run these three just like we did sort of last time.

And we get the first result back very quickly, right? We get the execution plan and we get the, the join cardinality of this, which is pretty, or rather the parameter cardinality for this, which is pretty close. And then for the next two, we get, I mean, we’re going to get slightly different, we’re going to get slightly different estimates because of the, of where the parameter get kind of gets hit.

But like for this one, we get rather, sorry, for this one, we get the same cardinality estimate of one, nine, seven, nine, seven for both. But for the parameter that goes back further in time, we get more rows out of it, right? So this one gets 21,000 rows.

This one gets 61,000 rows, but we, we stay with the initial estimate of one, nine, seven, nine, seven. This isn’t that big a deal. The difference isn’t tremendous, but you can, you can see where the initial cardinality estimation stuff gets reused across all these. Now, what some people will do to try and safeguard against this is create different parameters for each, each like different table they might filter on.

So this, here’s, here’s an example where that happens, but this is where things actually get a lot worse than, than sometimes reusing parameters. Because the, for like, we’re going to sort of run into the problem that we ran into with the first, with the first talked about with like sniffing nulls. So like this one, like the first time that we run this for badges, we get, you know, we’re again, this is going to, you know, compile all four plans at the same time.

So if we run this, this comes back really quickly and we get pretty good cardinality estimation for this one. This is fine, right? This is typical parameter sensitivity stuff.

Good cardinality estimate for the first one, probably a good query plan for the first one. And then awful after that. But now look what happens when we run this for comments and we use the C filter rather than the B filter. This is going to run a bit longer.

And this is, you know, this is another one of the issues that you run into when you just if branch your queries and do this sort of stuff. This one’s still running. I forget.

Okay, 12 seconds. Usually it runs for like 15, but close enough. But for this one, we get this plan where this is just a hopelessly bad estimate and a hopelessly bad query plan for this. And the reason for that is what I talked about early on where this plan compiled and sniffed a null for the compile value.

And then the runtime value was this. So the difference here, the difference here is like between other examples where I showed you like using reusing the same parameter where that gets you like like parameter sensitivity across a bunch of different if branches. This does two, but in a different way.

The sniff nulls are probably almost worse than the parameter than like sniffed actual parameter values. So like for this one where we sniffed a null SQL Server just guessed one row, but we got a lot more rows back than that. So if branching and store procedures.

Well, it may. This is one of those like easy for you hard for the optimizer things that I talk about a lot. If branching might be very easy for you to write in this way, but for the optimizer, it is a very, very tough situation.

And we’re going to talk in the next couple of videos about different ways of approaching this and fixing this and then even fixing problems with performance across multiple executions of these things. So stick around for those. As always, I hope you enjoyed yourselves.

I hope you learned something and I will see you in the next couple of videos where we will dig more into this topic. Alright, cool. Thank you for watching.

Going Further

Why Does Erik Talk So Much About Scalar UDFs?

Posted on December 17, 2024January 22, 2026 by Erik Darling

Why Does Erik Talk So Much About Scalar UDFs?

Thanks for watching!

Video Summary

In this video, I delve into why scalar User Defined Functions (UDFs) are a frequent topic in my content. I explain how they can severely impact query performance by preventing parallel execution plans and causing excessive row-by-row processing overhead. By walking through simple examples, including a demo of a UDF that simulates waiting for milliseconds, I illustrate the non-trivial time costs associated with function executions, especially when called repeatedly. The video also explores scenarios where scalar UDFs are used in WHERE clauses or joins, highlighting how these can dramatically increase query execution times due to their lack of sargability. Through this exploration, viewers gain a deeper understanding of the performance implications and learn strategies to avoid unnecessary UDF usage in their SQL queries.

Full Transcript

Erik Darling here with Darling Data. And in today’s rock’em, sock’em episode? Is it an episode? Am I just having an episode? You may never know. I’m going to answer an email question that came in asking me, Erik why do you talk so much about scalar UDFs? And well, boy, boy, as a consultant, I could go on for ages about the problems I’ve seen UDFs cause. But, you know, and this is something that’s in all the videos where I talk about scalar UDFs. The problems that you run into are that they disallow a parallel execution plan from the query that calls them. So whatever query invokes the UDF cannot go parallel. That’s not a big deal for every query, but for a lot of the queries that I see, that is a big deal. Another very, very big problem, problem is that scalar UDFs do not run once per query. They run once per row that the scalar UDF needs to process to return a result. This is of course for any UDFs that are not inlineable. By inlineable, I mean if you are on SQL Server 2019 and you are in Compat Level 150 or better, SQL Server has a feature called UDF inlining that will make an attempt to basically manually re assemble your assemble your scalar UDFs is inline code to the query, which can have awesome results when it works, but there are a lot of limitations and restrictions on it. Going a little bit further than that, one thing that I see quite consistently in client code is not only like a scalar UDF that does something, but that scalar UDF might reference other scalar UDFs.

That scalar UDF might reference multi-statement table valued functions and that UDF might actually be recursive to some degree. So it might, the UDF might actually invoke itself. All of these things pile up and cause very, very real, very, very awful performance issues for queries, especially when those queries need to return a lot of rows. So in this video, I’m going to go over just a very sort of simple way to understand just how much overhead there can be in scalar UDFs, even relatively simple ones that don’t do like, like if you think about like a scalar UDF and I’m going to sort of, I’m going to use a bit of a shortcut here verbally to explain what I mean. But let’s just say that the scalar UDF executes in memory. I don’t mean using Hecaton or an in-memory feature. I just mean, let’s say the scalar UDF accepts an input and just does something to that input.

Like, let’s say it accepts like a date from a date time column from a date time column or something. And it reformats that date time to fit some description. Or it chops the time off the date and just returns a date. See, before the, before SQL Server had a date data type, it was very common to see functions that did that and like, like convert the date to like a VAR car 10 or 11 or something to remove the time portion of it. There’s a lot of stuff that scalar UDFs do that would might be like in memory, but even getting like entering that UDF code and performing whatever in memory processing you have to do to whatever data gets fed into the function has overhead to it.

That will get worse when your scalar UDFs start reaching out to other, to user tables, like, like objects in the database and accessing those to either, you know, fetch some data or set variable values inside of the UDF. Or even worse, I see UDFs written with, I saw not as recently as it is, well, I mean, it’s November now, so it was October when I saw it, but I saw a 3,500 line UDF with four different cursors in it. And people couldn’t figure out why this query was slow. As soon as you commented out the UDF call in the select, that query finished instantly.

You left that in there, it took 35, 40 minutes for this query to finish. This thing was awful. So a lot of people think that there is a freeness to writing and invoking a UDF that just is not there. They’re real big problems when it comes to query performance.

But here I’ve rambled on long enough. Here I go just getting off on my pretty pony, babbling, babbling, babbling. So let’s talk a little bit about this channel and your interaction with this channel, either emotionally or financially. You can, if you click the link down in the show description, you can get a very low cost membership to support my work doing these videos, sharing my knowledge, you know, for free.

It’s not free if you, if you, if you chip in, but it’s like four bucks a month at the low end. And we’ve got like 30 people on board for that so far. So that’s pretty good.

If, if you are not the type of person who is, is keen on, you know, supporting people who, who try to help, help you for free. All of this content can be liked and commented on and subscribed to, which are equally good numbers to see on the up and up every time I hit refresh on YouTube 72 times a day. If you need help with your SQL Server, even awful scalar UDFs, boy, I, I, I have a lot of practice rewriting them.

I am very good at this stuff. Probably the best in the world at this stuff. Uh, if you want to hire someone who’s like crappier at it than me or cheaper than me or whatever, go ahead.

Eventually our paths will cross. Uh, but this is also a good time to mention that my rates are reasonable. So suck on that.

Uh, if you would like some very high quality, very low cost SQL Server training, uh, even cheaper than anything that you will find on those other sites telling you that they, they’re having these massive black Friday blowout sales. Boy, howdy. You can get all mine for life for about 150 us dollars, uh, at that URL with that coupon code.

Um, and, and gosh darn it, that’s a link in the show description too. I’d make life, I make things so easy for you. Uh, since I am, I am back home now from past data summit, I’ve got nothing on my calendar, which for the winter months is pretty good.

I don’t think there’s like a SQL Saturday in St. Kitts or a, sorry, a data or, or a data Saturday in St. Kitts. I don’t want to upset anyone, but, uh, if there is, let me know. Come down there for free.

I, well, I mean, I, I, I’m not going to like stow away on a cargo plane or something or like swim, but, uh, that’d be hell on the laptop. But, uh, you know, I, I, I do like going places. So anyway, with that out of the way, let’s talk about why Erik Darling of Darling Data with his reasonable rates hates scale our UDFs.

So, um, I, I’ve talked about this before, and this is a sort of fully fleshed out demo, demo that I presented during, uh, one of the past pre-cons that I did with Kendra Little, which went wonderfully, by the way. You, you should have been there. It was my birthday after all, and, uh, kind of rude that you weren’t, but whatever.

Uh, so I started, started talking about how you can’t put a wait for delay in a function, which obviously stupid, right? But, uh, this is the function that you can write in order to simulate waiting for some duration of time. So what this function does is it accepts a delay in milliseconds, which is an integer.

And I’m not giving you a big int for this, you sick, sick, twisted, foul thing you. Uh, and then we go into a wait for a while loop. And while the sysdate time is less than, uh, the end date, than adding the delay in milliseconds up here, um, you know, we, we, we spin in a loop, which is, again, sort of like, in memory processing, right?

We’re just doing this stuff without touching physical database objects. Fine. Good. So let’s turn on query plans and let’s just validate that this function works the way we expect it to. Uh, I’m going to run this and it’s going to run for just about one second, even there.

Uh, notice that there’s a little bit more than one second on that though, right? It’s teeny tiny bit more. And if we, uh, execute this for three separate calls, um, of 100, 300 and 600 milliseconds, uh, notice that we, we, we hit the, we hit the one second mark accurately, right?

We did a hundred milliseconds and then 300 milliseconds and then 600 milliseconds, which is about a second. But now that, that went up a little bit higher than just entering the code once and going, uh, through one execution there. So now let’s create a table and let’s put some data in this table.

And basically we’re going to stick a thousand rows in the table. And in the wait for column, we’re going to put the number one, right? So that’s a one millisecond wait for all thousand rows.

When we call the, when we call this function or rather that’s a one millisecond wait for however many rows we take out of the table. So, uh, let’s do this and make sure that we have no function executions in there. And I’ve got a little helper view that is, that I use, that I’m going to use here just to keep the script a little bit shorter.

That will tell us, um, uh, how many times the function executed, uh, the script for that looks like this up here, um, where I’m just going to look at the sys.dmexec function stats view. And, uh, do some stuff to get useful data out of that. So anyway, uh, right now there’s nothing in there because I just cleared out the plan cache.

But if I run this now for 100 rows and I don’t even, I don’t even have an order by on here. I’m not doing anything particularly interesting. To get out a hundred rows, we end up waiting 1.5 seconds.

Note that nothing in this plan has any other time associated to it. It’s trivially, trivially, trivially easy to get a hundred rows out of this table with a thousand rows in it. Uh, but really where all the time gets spent is in the compute scalar.

If we right click on the compute scalar and we go to properties and we go to defined values, we’ll see that the compute scalar is what’s housing, uh, that wait for function, right? This is what’s, this is the compute scalar is responsible for, uh, executing or invoking the scalar UDF, right? So all the time in the plan, 1.5 seconds for a hundred rows waiting for one millisecond per row.

That’s kind of weird, isn’t it? I mean, it’s a bit strange to take, for that to take 1.5 seconds. If we look at the, uh, the function DMV now, we’ll see that, you know, we spent a total of 1.5 seconds in there.

And even though the only, we, we only ever passed in, uh, a one millisecond wait, look at the max worker time, 15 milliseconds. So we spent, we actually spent 15 times, like 15 times longer in there than we passed in for a millisecond wait, right? That’s very strange, right?

And it’s very, very odd. Just because getting into that function code path is not free either, right? So the more times you need to enter a function’s code path, the worse this sort of gets. If I were to do this for all a thousand rows, now for a hundred rows, it was 1.5 seconds for a thousand rows.

I’m going to share something with you, something awful. This is going to take 15 seconds, one five, right? That’s for, for waiting one millisecond per row for a thousand rows.

That’s a thousand milliseconds. That’s one second. Where does 15 seconds come from? Doing all that, doing all that function stuff. Look at this thing.

We spent two milliseconds, uh, scanning the clustered index pretty quick in the, in the compute scale are 15.617 seconds. All right. A lot of extra overhead in there, isn’t there?

A lot of weirdness, a lot of strangeness. All right. And if we go and look at the, the function DMV now, we will have the thousand executions because we invoke that function 1000 times or once per row.

We spent a total of 15 seconds in the function. And with a, again, with a max CPU time of almost 16 milliseconds now. So like getting into that function code path is not free.

Now, what I want to do is, um, one thing that I talk about a lot is there’s a, like, uh, how functions differ, right? So if like, what we just did was we invoked the UDF and the select list that made a difference for how many rows came out of the query. Right?

So for a top 100, it ran a hundred times getting all thousand rows out of the table. It ran 1000 times. I’m going to update the first hundred rows of the table. And I’m going to just, uh, update, uh, this column called did it with the, with the results of that, with the results of the function.

Right? Which this function always returns the number one. Not a big deal.

Like not doing anything terribly interesting. And if we look at the query plan for this, well, 1.5 seconds in the, in the compute scalar. Again, just invoking that function.

Now, when I talk about, um, different ways that functions can get called in queries, there’s obviously the select list thing that we just looked at up there. But sometimes people might stick functions in other places. You might have a function in a where clause or a, sorry, a scalar UDF in a where clause or in a join.

Now, if, if I just put the scalar UDF in the where clause here, right? And so this is going to produce, uh, a hundred rows where the did it column, which is for a hundred rows is set to one. And, uh, where it’s equal to the scalar UDF.

Something really interesting is going to happen. Really interesting. I mean, something really slow.

You’re not going to like it. You’re not going to like it one bit. I’m also not going to like it one bit.

Not, not a good time. So what we have is, um, this thing happens, right? The clustered index scan, four milliseconds, not bad.

Um, but now rather than the compute scalar taking up most of the time, remember this is a row mode plan. So every parent operator also includes its child operator’s time. The compute scalar does have 17.1 something seconds under it.

But this filter operator is now 15.6 seconds. Because this scalar UDF is, uh, I mean, if we have the easiest way to think about it, if you’ve been traveling around SQL Server circles long enough is this UDF is not sargable. SQL Server cannot like has to run that function for every row, produce a result, and then compare it to whatever our comparator is up here.

If we go to the properties of the UDF and we go to the predicate and just to make life easy, I’m going to click on this little ellipsis over here. And what this does, or rather what this shows us is the column that we searched on, that’s this thing. But then we can see where it equals the result of this wait for function.

So we have to feed the wait for column into the UDF 1,000 times and then produce a result 1,000 times and compare it to what’s in the wait for column. Since we only set 100 columns, or rather 100 rows equal to one, only 100 rows qualify and get hit by the compute scalar. So that’s why the compute scalar is only about a second.

If I said like where this equals zero, uh, that would be the other 900 rows. This, that, like this would all take a lot longer, right? And I don’t want to keep you sitting there, uh, dealing with all that. And if we look at the executions, we will see now that there were 1,100 executions.

There were 1,000 executions for the where clause to filter out, to run, run 1,000 rows through the function, produce a result, compare it to the predicate. Uh, and then another 100 executions to, um, uh, produce the function in the select list, right? So 1,000 on the where clause and 100 in the select list.

So putting scalar UDFs in the where clause can be particularly painful, be a particularly painful thing for your queries. This gets, this of course is, I mean, it’s sort of unlikely that your scalar UDF is going to be the only predicate in a query though. So what I want to show you is we’re going to run that exact same query, except this time we’re only going to get the first hundred rows that we own, that we set to one for this anyway.

When we do that, we have an additional, we’re going to have an additional sort of filter in the, or I’m sorry, an additional predicate for this query. So if we look at the execution plan for this SQL Server is able to seek into the clustered index defined where IDs are less than or equal to a hundred. So we have an initial seek that trims the, the rows that leave the clustered index down, but those hundred rows still have to go into this filter.

There’s 1.5 seconds, like we’ve seen over and over again for 100 executions of the function. And then just about another 1.5 seconds in the compute scalar, right? Because the, we have the filter is responsible for its time.

And then the compute scalar is its time plus the filter time. Remember, it’s a row mode plan. The compute scalar, like essentially 1.5 on the filter, 1.5 on the compute scalar, right? So this query doubles in time because not only do we, do we filter the result like we did before, right?

And if we look at the predicate down here, it’s just like what we saw when we looked into the properties of things. And so this, this is like a, just a really sort of awful, painful thing. This time we only have 200 executions of the function though, because only 100 rows left the table based on the initial seek that we did.

That would be true even if we, even if we scanned the clustered index and a hundred rows came out of it, those hundred rows would have been processed still by the filter for the scalar UDF. So this is just in like queries, get getting data.

Sometimes people will put scalar UDFs into computed columns or check constraints. Now for queries running that do this stuff, SQL Server does have the UDF inlining feature available, 2019 plus, compat level 150 plus. That feature does not work with scalar UDFs in computed columns.

And that feature does not work with scalar UDFs in check constraints. At least as of the time I’m recording this, maybe SQL Server vNext will make improvements on that. I don’t know.

I don’t have any futuristic information there. So let’s create a slightly different table. And what we’re going to do here is we’re going to put that UDF into a check constraint. Right?

And we’re going to look at how this impacts queries. So let’s put a thousand rows in here. And putting a thousand rows into this table is going to go a lot slower than it did that first time.

Because for all those thousand rows, guess what we have to do? We have to run that check constraint to make sure it did something. Now, it’s a bit different in this query plan because that happens in the assert operator.

Right? So like this happened quickly and this happened quickly. But we spent a lot of time in here now.

And I forget exactly how useful this assert is in showing us what it did. But there we go. That’s actually pretty good.

So you can actually see the scalar UDF invocation by the assert right there. Okay? So that’s not good. You know, maybe if you have inserts into a table like this that go like, you know, run for a long time, this might help explain why.

Now, the other thing that’s kind of annoying with scalar UDF. So I’m just going to run these two queries back to back.

These happen very quickly. Even selecting the wait for column. Right? That’s not that like you see that there. Right?

We selected the wait for column in this one. We did not select it in this one. For the query where we do call the wait for column, rather, the one that has the check constraint with the scalar UDF, we get this T-SQL user-defined function, not parallelizable warning in the query plan properties. If you’re on a version prior to 2022 or you’re not in the cloud or managed instance, that will just say could not generate valid parallel plan.

We got some more information out of this thing recently. The one down here, the query down here that doesn’t call the wait for column, does not have this problem. This is a lot different from scalar UDFs in computed columns, where unless you have that computed column persisted and unless you have trace flag 176 enabled, this would have caused problems for both queries.

Okay? This would have been an issue for both of them. If we look at the number of executions, we only have the thousand executions from when we did the initial insert, but we didn’t get additional executions of this scalar UDF here because there’s no reason to execute it.

But its mere presence for this table inhibits or rather disallows a parallel execution plan. So even if, like, you know, you’re just, like, say you’re selecting from your table and you’re joining it off to a bunch of other tables and doing all your, you know, fancy join-y stuff, you could end up with real performance problems. Even though this function doesn’t execute every time, this function will force you to use a serial execution plan.

There’s no parallelism allowed there. So you have to be really, really careful where you, both in how you write and how you use and how you call and basically how you do anything with scalar UDFs. Again, putting them in check constraints or computed columns makes for, like, workload-wide performance issues.

Putting them in queries usually makes for pretty bad query performance issues. So it’s, like, even though, like, SQL Server 2019, again, 2019 plus, compat-level 150 plus, does help a pretty decent chunk of functions that I see. Again, the number of limitations and restrictions on it, it doesn’t help.

It can’t help every single function. And, you know, it’s, like, the amount of pain that scalar UDFs cause in, like, the grand scheme of things really does make me wonder, like, if you’re unaware of, like, how bad they are at this point, should you even be touching a database? Should you be anywhere near a SQL Server?

Should you just stick to Excel? Should you collect rocks? Is there a better opportunity for you elsewhere in the world? Because I just see them constantly, and people wonder why I make the face that I make when I’m, like, looking through a query that runs slow, and there I see it.

Some dbo.fn underscore this and that, and I’m like, ah, here we go again. It’s you. It’s you.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something, and I’m going to hopefully record something a little bit less depressing next time for the next video. I’m not sure what it’ll be yet, but I’ll know when it comes to me.

So, anyway, goodbye for now.

Going Further

STRING_AGG vs SQL Server’s Optimizer

Posted on December 16, 2024January 22, 2026 by Erik Darling

STRING_AGG vs SQL Server’s Optimizer

Thanks for watching!

Video Summary

In this video, I delve into the intricacies of using `STRING_AGG` in SQL Server and its interactions with the query optimizer. I explore how `STRING_AGG` can lead to unexpected performance issues due to its design, particularly when dealing with large concatenated strings that exceed certain size limits. I demonstrate real-world examples where queries using `STRING_AGG` with a `VARCHAR(MAX)` data type perform significantly worse than those with smaller string types, highlighting the importance of being cautious about the length of the concatenated strings and their impact on query execution plans.

Full Transcript

Erik dishwaskeldarling here, Darling Data. We live to fight another day, don’t we? In today’s video, we’re going to be talking about StringAg versus the optimizer. Now, StringAg was a string aggregation function that was designed to replace all that sort of XML for path type value and VARCAR max stuff that people used to have to put into queries in order to create a list that aggregated strings, usually separated by commas or some other delimiter, or some other delimiter. I guess spaces would be equally valid there. And it’s got some funny optimizer repercussions sometimes, things that we don’t like to see. And it’s actually a very unfortunate side effect of the way StringAg was designed, where rather than Microsoft telling StringAg, hey, if this thing that we’re concatenating together, is big, we should just convert it to a big thing, right? We should just use, like, put a convert in StringAg for the values that we’re concatenating. Because if you’re, like, concatenating integers or something together, like, it’s not like you get, it’s not like you, the problem is, like, you can’t put an integer with a comma. The problem is that if you put a big enough string together, SQL Server’s like, whoa! Can’t do that. You need to convert it.

And then convert that to a string that we, a size that we understand. And so you end up having to write code that looks rather silly. I didn’t mean to exit out of that because, of course, we need to talk about this stuff before we talk about that stuff. You can become a member of my channel for four bucks a month. It’s a pretty good deal. If four bucks a month is, like, all your lunch money, well, like, comment, subscribe. It’s all good stuff in there. I am a SQL Server Consultant. That is how I make my money. Right now, even with the 25 very thoughtful, very generous people who have become members of my YouTube channel, that puts my monthly income from YouTube at roughly 123 pre-tax dollars per month that pays one of my cable bills. So that’s good. But, you know, rent is a much bigger deal. And that’s where the consulting end of my life tends to come in because for less than the cost of one core of Enterprise Edition, we can fix lots of problems.

And then you will have to buy fewer cores of Enterprise Edition or spend fewer monies on Microsoft or Azure’s incredibly overpriced cloud offerings. Screw them. Why give them all your money? Give me your money, then save money. All right. It’s a nice tradeoff. If you would like some very high quality, very low cost SQL Server performance tuning training, you can get all of mine for 150 bucks for life. That beats the pants off all the Black Friday deals you’re going to see. So you should buy that. You can either use that link in this discount code to do it or down in the video description, click on the link in there and it’ll do all that for you. So, upcoming events. Well, you know, I don’t have any dates right now. So if you would like to go on a date with me, you can tell me about your event and I’ll show up. Dress nice. Smell good. At least at the beginning. Don’t know about the end. Anyway, let’s look at this string ag nightmare.

So, this is the thing that happens whenever you have, whenever you write queries with string ag in them. This is where you run into stuff. Is that the, like, when you’re like, so ID is an integer, right? Just a four bytes, whatever. But when we create this thing, SQL Server does not make a big enough data type or something for the concatenated string. And it’s just like, whoa, whoa, whoa. We can’t do that. It throws a dumb error. And then you have to write convert varchar max whatever thing plus the comma thing. Notice that the convert varchar max is not around the comma at all. The comma space is just around that ID column.

So, this is where you can start running into dumb stuff, right? And really, the path of least resistance is, of course, to just say varchar max because who knows when you might have two something point whatever gigs of IDs to put into a string. It’s a really good use of SQL Server, right? That’s good use of licensing money there. Create a two gig string of integers. Great. Thanks. Thanks. We’re good with that. You could, of course, mess with things a little bit. And you can experiment with smaller values, like, say, varchar 500.

And this will, of course, get you around the error and, you know, give you the string that you want. So, the difference between these two things, and this is where you have to be careful, is that when you say max or where you say that the value is above a certain point, query execution times are way different. If you, sorry, not you, I’m not going to make you do any work here.

If we zoom in here, this top query took 17 seconds and this bottom query took 7 seconds. So, the bottom query is a full 10 seconds faster. Granted, there are things about both of these queries that we could fix a bit. But the big problem with the first query is there’s an additional operator in it.

See, there’s a filter here and a filter here. And the filters here are for the halving from the count, right? So, this is going to be a filter no matter what, because SQL Server has to run the query, come up with the counts, and then filter on the count. So, that can only be filtered out after the results are run, after the results are calculated at runtime.

The top query, where the string column is a varchar max, has an additional filter in it that is not down here, right? Like, there’s no thing between the sort and the compute scalar the way there is up here. And this filter is saying where post type ID equals 1. It’s kind of weird.

What’s this filter doing? Greater than 1. What’s this filter doing? Greater than 1. What’s this index scan doing? Absolutely nothing. Just scanning the whole table. There is no predicate in here. What’s this index seek doing? This is seeking to… Oh, gosh darn it.

This is going to be a tough one to frame up. I’ve got to move this over a little bit. There we go. We are seeking to where post type ID equals 1 here. The reason why this happens is because of optimizer costing.

When we do the string ag for a big string, and SQL Server computes a scalar, this is where we compute that big string for string ag. If you don’t believe me, well, that’s too bad. That’s exactly where it happens. For a big string, SQL Server is like, oh, I can’t push a predicate down past that.

For a small string, SQL Server is like, oh, yeah, I can do an index seek for that. No problem. I’m glad you asked. That’s a great idea. The thing is that there’s like a weird level.

Like, I’ve experimented with this number quite a bit. And, like, if I went up to, like, 700 or 800 or even 600, that filter would come back. If I went down a little bit, it would go away.

If I changed compatibility levels, the number at which this changed from having that additional filter with the index scan to having the index seek with no additional filter would change. Sometimes statistic sampling would have something to do with it.

There were all sorts of things that would make this number weird each and every time. So be very careful when you’re using string ag in queries. You might find that using varchar max or even a longer string type than is probably necessary for the convert up there, that you might find that SQL Server all of a sudden starts choosing really stupid execution plans.

You might find that bringing that number down helps quite a bit sometimes. And other times the optimizer is like, ah, changed my mind. Think it costs differently now.

Price has changed, right? It’s like the stocks. They go ups, they go downs. They ruin your chances of retirement. They give you a little bit of hope for retirement.

It never really says, yay, I’m going to retire. Anyway, we do have to be careful with these things. And this is honestly something that Microsoft should fix because there is nothing about this string or what the value of this string is that should prohibit the optimizer from being able to push a predicate down and seek into an index rather than scan an index.

There is nothing going on in there where that should be an optimizer limitation. And, you know, I guess somewhat thankfully I still see most people not use string ag and just use the XML version of it. I’m particularly fond of the XML version of it because, I don’t know, kind of an XML thing.

A little bit. You know, kind of my first SQL Server frenemy was XML and XQuery. So, got a little bit of a crush on that.

Anyway, be careful out there when you’re using string ag. Be prudent with this sort of syntax when, if you are, you know, using non-string values and all of a sudden you need to, you know, concatenate those into a list, whether they’re numbers, dates, bits, I guess. I mean, one thing.

That would be a choice. But, you know, whatever you’re doing in there. Be very prudent with the length of the string that you convert whatever column data to because, you know, you want to avoid the error and you want to avoid truncating things, but you also want to avoid situations where SQL Server no longer decides to push predicates to where they belong and just starts filtering all your data out and making all your queries take way longer than they should.

So, I hope you enjoyed yourselves. I hope you learned something. And I will see you, I think this video is scheduled to go out on a Friday.

So, I do hope and pray that everyone has a great weekend and enjoys themselves to the fullest. All right. We’re good.

Time to turn these lights off. Getting sweaty. Thank you. I love you. Goodbye.

Going Further

OPTIMIZE FOR UNKNOWN vs. OPTIMIZE FOR VALUES In SQL Server

Posted on December 13, 2024January 22, 2026 by Erik Darling

OPTIMIZE FOR UNKNOWN vs. OPTIMIZE FOR VALUES In SQL Server

Thanks for watching!

Video Summary

In this video, I delve into the pitfalls of using `OPTIMIZE FOR UNKNOWN` in SQL Server stored procedures and why it can often lead to suboptimal query performance. I explore how parameter sniffing issues arise when you optimize for a specific value versus optimizing for unknown values, highlighting that while `OPTIMIZE FOR UNKNOWN` might seem like an easy fix, it frequently results in less-than-ideal execution plans. Instead, I advocate for optimizing stored procedures with specific parameter values to achieve more stable and efficient query performance, even if it’s not perfect across all scenarios.

Full Transcript

Erik Darling here, president and CEO of Darling Data Enterprises. And in today’s video, we’re going to talk about Optimize 4. Why? Well, because I have kind of a funny angle on it. And it’s not that Optimize for Unknown is good. Optimize for Unknown is dumb and it stinks and everyone who I see use it because, uh, perimeter sniffing. I just wish that I had, I wish that there were like a zoom feature for me to send a boxing glove on a spring. out of their computer somewhere and just whack them. Um, it is, um, it is, it is, it is internally deflating to hear these words. Uh, but we’re going to talk about how sometimes, uh, optimizing for a specific value can be a better course of action, uh, than optimizing for unknown. Uh, you do have to kind of know and care and love your data and all that stuff, uh, in order to figure out what you’re doing. What values you should be optimizing for. Because that, that can be a tricky, a tricky enterprise. But once you, once you have found that enlightenment, once you have found that Zen moment, so you have experienced that, that spiritual release, nothing will ever top it.

Promise. Anyway, before we talk about that, let’s talk about four bucks. If you’ve got four bucks a month and you want to be a member of my channel, you can click the become a member link in the video description and do that, do just that. Uh, you can cancel any time as they say. Uh, if, if four bucks a month is more than you can stand apart with, if you, if you care very deeply about every George Washington that comes, comes into your bank account. Uh, liking, uh, liking, commenting, subscribing are, are just wonderful ways of becoming part of the, the darling data community of data darlings. Uh, you can join over 5,000 other people who have, who have joined the ranks of the, the darling data, data darling army. Uh, if you need SQL Server consulting, I am world class at all of these things. Um, we don’t even need beer gut magazine to tell us that anymore. We just, we just know from experience. Uh, and as always, my rates are reasonable.

If you would like some very high quality, very low cost training. And if you’ve looked around at the black Friday offers, uh, that other people have out there and you’re like, wow, that’s still hundreds or thousands of dollars. Uh, and that’s only good for a year. Uh, you can get all mine for 150 bucks for the rest of your life. Uh, or just about 150 bucks USD. Of course, we don’t, I don’t accept other currencies. Someone else has to do that conversion and translation for me. Uh, you can, you can either click on the link up there and use the discount code spring cleaning or click the link in the video description and you can get both.

You can get all of that. Uh, upcoming events. I guts none. Oh no, I don’t, I don’t have to go anywhere. Uh, shame. I’ll just stay in my underwear at home. But of course, if you would like me to show up to your event, either in my underwear or maybe in some sort of, you know, vaguely business casual wardrobe, it might look a lot like this. At least from, at least from the top up, uh, then, then, then let me know what your event is and maybe, maybe we can, maybe we can figure something out.

With all that out of the way, let’s, let’s get into this, this fun, fun thing that we have to talk about. Now, uh, I’m going to start this store procedure off written in a way that I personally don’t like. All right. Cause this will lead to all sorts of problems. This is one of those things.

When I say, when I, when I say the words, anything that makes your job easier makes the optimizer’s job harder. This is probably like at the top of the list of those things. Cause this is a pattern that I see in almost every single consulting engagement somewhere in a query that someone is having performance problems with. This is never a good sign. Uh, and it’s not, it’s not good for a number of reasons. Um, you know, uh, we’re going to, I’m going to say parameter sniffing is one of them, but, uh, you know, it’s, it just ends up with some really ugly consequences. Now I’ve already run all the queries for this.

And if you are, and if you’re able to look under my armpit there, you might see the number eight minutes and 31 seconds, right? Eight minutes, 31 seconds. And, uh, the, the general gist of this is that as long as you are searching for an owner user ID, as long as this is not null. And you might have a, you might have a store procedure with, you know, a pattern somewhat like this, where you are just guaranteed to always get an owner, like a, see the equivalent of that ID passed in, you might do okay. Right. It might, you might just not have ever have like a terribly big problem with this pattern.

But as soon as people start doing searches for other stuff that maybe don’t focus on a specific owner user ID, that’s when you run into issues. So the first three executions of this are all looking for an owner user ID. This is all populated in here, right? This one, this one, this one, uh, where we do, oh, that was a, that was a bad, that was bad zoom and etiquette on my part there. That did not go well. Uh, but for these bottom two, um, you’ll see that, uh, we do not have owner user ID populated in there.

And that’s where this query starts to run into trouble. Like this one does okay. Right. Up at the top where we search for 22656. And I want to point out something out here that is kind of fun, uh, is that these queries actually do all get the parameter sensitive plan optimization stuff. That’s why the ones, that’s why the ones that you see are slightly different, um, in, uh, in the first few queries, right?

So like this plan, uh, because parallel has certain amount of, it has some estimates associated with it. This plan is not parallel. It gets an estimate of 117. This plan is not parallel. It gets an estimate of 95. And like the, the, the serial plans are slower, right? That’s 1.7 seconds to scan that that’s one, again, 1.7 seconds to scan that the parallel plan is the fastest of the bunch.

But even if you look at the parallel plan, like we have an index up here that leads on owner user ID, right? That one, this fantastic index right here, this single key column index that I would probably also make fun of if I saw in real life. But we have an index scan here. We should be able to do an index seek, but we can’t because we’re doing the, you know, column equals parameter or parameter is null.

You can replace this with any variation on the, on the thing where it’s just like column equals is null parameter column, whatever. You’re still going to see this same sort of thing here where you, you can’t seek into the index book. Then like these two all have, these two both have the same thing, but there’s an, there’s an additional thing in all of these in the key lookup where we’re evaluating additional predicates.

This is part of what makes this demo sort of sparkle, but like this, like this is also not a good sign to see in your query plans, right? You don’t want to see this stuff over and over again. Now down here, this is where these queries really start to have problems because even with the parameter sensitive plan optimization kicking in, right?

If you look down here, these are the last two things that executed. Uh, we’re, we’re getting like weird, like whatever query variant we’re getting for this is not so hot, right? That’s kind of bad in there.

Uh, so we get the, like the query variant plan for, you know, 95 rows, which is this one. Um, and then we get the, we get that same one again down here, but these execution plans just don’t work well at all for, uh, the, the queries that we get. That’s seven minutes and 44 seconds.

All like all that time, like aside from like, you know, the 51 seconds that you see up to the nested loops join that all that time is in that sort spilling. And down here, we don’t exactly have the sort spill, but we do have like about the 40 seconds of time spent, uh, just like between the lookup and this and everything else. So this, this pattern obviously doesn’t work terribly well.

Okay. So avoid this one as much as you can. Uh, what I see a lot of people do is even is when they do this or when they have other store procedures where things sometimes act up is stick this optimized for unknown hint in there. Now this does turn out better than the query pattern I just showed you.

I admit that it does turn out better, but it’s still not great. Still don’t love it. Uh, and the plans change for these, right?

Uh, where now rather than do any sort of nonclustered index thing, we just scan the clustered index every time. Uh, granted, we don’t have anything that spills for seven minutes, but this is not exactly the plan shape that you would want to see, right?

This is not, this is not exactly fun. And this whole predicate in here is still a problem, right? So that query problem is still an issue with the optimized for unknown. We just take away the cardinality estimates that might’ve happened.

And we replace them with the, the optimized for unknown sort of thing. And then we just get these sort of crappy plans. The two down here still have problems, right?

This one takes a minute and 11 seconds with a lot of that time spent still in the sort, right? Not a fun time to spend, not a fun amount of time to spend spilling. And this one takes 1.2 seconds.

Again, just scanning the clustered index. So the optimized for unknown hint gears all these queries towards a clustered index scan away from a lookup plan, which helps a little bit sometimes, but it’s just sort of not what we want to see overall.

Over here, I’ve got this store procedure set up to optimize for a specific set of parameters that work out pretty well across the board, right? So this is like, rather than say optimize for unknown or let SQL Server do a thing every single time, we’re going to, or even cache and reuse a plan or use parameter sensitive plan optimization stuff.

We’re going to tell SQL Server, every time this runs, I want a plan for when these parameters being these values. And this works out a bit better.

Not perfectly. We still have a scan of, we still have the same problem with the query pattern itself, right? So like, like ideally fix the query pattern. But if you’re kind of hamstrung and that’s too hard for you or whatever, you might be better off just saying, hey, these values work really well to get me the plan that I want.

So we still have the problems in here. We still have the predicate stuff in here that we don’t want, but we get like at least a sort of stable plan across.

And even for the two second query, the two final queries in this, these end up better than the first time around. The last one down here is a bit slower than the optimized for unknown version.

But when you take into account that this one no longer takes like, you know, eight minutes or like, like almost two minutes or whatever that was, like the time you save on most of the executions for this is a lot more helpful.

Granted, this one, you know, we, you know, we’re going to be ivory tower about stuff. We should really fix that query pattern instead. But if you’re going to do something like this, like the time that you lose on this one is made up for by the time that you gain on this one.

So still not great down here and still not great here, but a lot better than we saw with the original query pattern. And from with the exception of this one, with the optimized for unknown pattern.

So whenever you’re, you know, sort of digging through store procedures that have this problem, and if you’re using optimized for unknown in places, then you should probably consider, you know, figuring out a good set of sort of, let’s just call them store procedure defaults and optimizing for those instead, because you can generally find a really good execution plan for a set of values that’ll work pretty well across a lot of other sets of values.

Might not be perfect. Might not, but there might be regressions in some places, but it’s better than like almost the full thing being a regression. Like you’re like, I often see with optimized for unknown.

So anyway, hope you enjoyed yourselves. I hope you learned something. I hope you will, I don’t know, maybe optimize for specific values rather than optimize for unknown. And I will see you in another video shortly.

Maybe we’ll see. It depends on how cute I’m feeling. Anyway, goodbye.

Going Further

T-SQL Shortcomings With Merge And Triggers And Stuff (In SQL Server)

Posted on December 12, 2024January 22, 2026 by Erik Darling

T-SQL Shortcomings With Merge And Triggers And Stuff (In SQL Server)

Thanks for watching!

Video Summary

In this video, I delve into the nuances of merge statements, triggers, and output in T-SQL, highlighting how they interrelate but often fall short in practical application. I discuss specific issues such as the lack of an action column in triggers when using `MERGE`, which can complicate debugging and maintenance. Additionally, I explore why `MERGE` alone allows referencing source tables in its output while other T-SQL commands do not, questioning the rationale behind these design choices. The video also touches on broader themes about the need for modernization in T-SQL to better serve developers, comparing it favorably with more developer-friendly languages like DuckDB and Postgres.

Full Transcript

Erik Darling here with Darling Data. And, um, forget it. Just, let’s just move on. In today’s video, we’re going to talk about some stuff. And the three things that I want to cover are sort of merge and triggers and output and how they all sort of work almost together, but not quite. And how there are just like various T-SQL improvements and interoperability features that would make working, with merge or triggers or output a whole lot easier on people who develop T-SQL code. Now, uh, I, I, I, I’ve, I’ve, I’ve said it once, I’ve said it a million times. T-SQL is a language that is in dire need of, like, improvement, modernization, uh, just, you know, making it a little bit easier for people to work with. Because as things stand, there are just so many gutches and caveats and just weird edge cases that can crop up. And I realize, you know, it’s computers. Computers are hard. Every programming language has this stuff. There are a lot of things Microsoft could do to get rid of some of the, like, more obvious, like, oh God, why doesn’t that work type thing and get us to the, like, oh, this is a really hard problem. We need to solve it in a very specific way type thing and leave that stuff to, you know, skilled T-SQL practitioners. But anyway, uh, before we go on, if you want to give me four bucks a month, like 25 or so other people give me like money every day, every month to do these videos, there’s a link in the video description for you to do that. If you’re uncomfortable with losing four dollars per month, uh, you can, you can like and comment and subscribe and you can join over 5,000 other data darlings, uh, in their, their happy voyage towards learning more about why they probably shouldn’t use SQL Server.

Because it’s annoying. Uh, if you need help with SQL Server, because it’s annoying and hard, uh, I am great at all these things. Best in the Northern Hemisphere, let’s say. Uh, and as always, my rates are reasonable. If you would like some great training on SQL Server stuff, I have a lot of it at a very reasonable price.

About 150 USD for the rest of your life. No need to, like, resubscribe or anything. Uh, you can get all that stuff with some combination of these things in blue. Or you can also click on the link in the video description and just bypass all the typing. Try to make things easy on you. Um, upcoming events, there are none. Tell me about them.

I’ll, I’ll come up. With that out of the way, let’s talk about this. So, uh, a lot of the, the code below is thanks to, uh, Aaron Bertrand, who has a, who I, I, you know, rather shameless.

Actually, there was absolutely zero shame involved in me copying and pasting code from Aaron Bertrand, aside from some minor reformatting, because his Canadian formatting is strange and bizarre to me. The exchange rate on American, on, on American to Canadian formatting is just like the exchange rate on whatever money Canadian, Canadian uses.

Um, but, so, all that is at this link. Uh, if I remember, I will copy and paste all of this stuff into the show notes. Uh, there’s a great, also a great roundup, um, uh, at Aaron’s site on all this stuff about merge. Uh, and then there’s, uh, a really awesome post by my dear friend and, um, and, um, I would say my concurrency mentor, Michael, Michael J. Swart, uh, on what to avoid if you want to use merge. I think there’s some great stuff in there.

And then there’s an Azure feedback item that addresses one of the things I’m talking about here that, uh, of course is just ignored in the sea of Microsoft feedback items. They’re like ignored by PMs everywhere. So, uh, we’ve got a simple table and I’m going to stick two rows in that table with values one and four.

And then I’m going to create a trigger on the table and the trigger on the table is definitely from Aaron’s post, right? So it’s, uh, insert, update, delete, whatever. Uh, and then it’ll like tell you about some stuff and it’ll print some stuff and it’ll be fun.

Now, this is all okay, right? This is all just fine. Except if you look in that trigger, uh, it’s like if exists select from inserted, uh, if stuff’s in there.

Oh, is there stuff in deleted too? Oh, well, let’s check there. Right?

Like, let’s, we got all this stuff to figure out, right? So this is kind of the first place where I get annoyed with things. Now, if we just run this whole block of code and we’re going to talk through it step by step, these are the results that we get, right? So, uh, when we started the table, we had IDs one and four.

And when we finished with the table, we had IDs one, two, one, two, and three. Why? Well, we had row one in there. We added rows two and three.

We inserted those. And then we deleted row four. Why? Because it didn’t match. Cool. All right. All that stuff seems okay. Now, what annoys me is that when you output stuff from merge, you have this magical dollar sign action column.

And this magical dollar sign action column tells you what happened during the course of the merge. You don’t get this for normal inserts, updates, and deletes, of course, because, you know, it should be fairly obvious what you did from writing insert, update, or delete. But if you have, if you have a trigger that, like, fires for multiple different things, like insert, update, and delete, it would be really handy to have that action column available in the trigger to figure out exactly what you need to do.

Right? Great stuff. It would be wonderful.

It would be fantastic. So, you get that in output, but not in the trigger. The other thing that’s really annoying, or rather, okay, actually, you know what? Screw it.

It’s really annoying. It’s really annoying because it’s only available with merge, and people end up writing one-off merge statements that, like, only insert, only update, or only delete, because merge has one superpower that regular insert, update, and delete queries don’t have.

You can reference another table in the output with merge. So, the typical merge statement, you have the target, right, which is my table, right? And in my table, we have one column called ID, right?

And then, when you merge stuff in, you have all the stuff from the source. And in the source, I added some words in here, like 1, 2, and 3, that match the numbers 1, 2, and 3. And you can, in normal situations, you can’t output those columns anywhere.

But when you use merge, you can reference the source table, right? So, you have the stuff, you have the action column from merge, you have all the stuff that was inserted or deleted, and then we get that word column from the source, and we return that in the output, too.

And that’s why, down in here, we have 1, 2, and 3, in word form, for 1, 2, and 3. And then, I mean, we have nothing for 4, because 4 got deleted. That’s fine, you know.

We can’t just make stuff up. We can if we want, but we might not be right if we just made stuff up. So, T-SQL. Whoever is in charge of you at Microsoft, for the love of God, make it easier to manage triggers.

Make action columns available so people can figure stuff out, so they don’t have to look at inserted or deleted, or inserted and deleted, or inserted or deleted, or some combination thereof. Also, it would be really nice if regular inserts, updates, and deletes could reference stuff from, like, source and target tables.

Just seems to make sense. Why can merge do it and nothing else? I don’t know.

It’s bizarre to me. This is stuff that you need to do, because other database platforms in the world are making versions of SQL, stapling their own chickens onto it, that make life so much easier for developers.

DuckDB is great. Postgres is pretty great, too. There’s a lot of stuff in those languages that makes a lot of sense, that makes, you know, when developers need to do something, they have really easy facilities to access to take care of those things.

T-SQL is missing a lot of that stuff. T-SQL is a very stodgy language in a lot of ways that desperately needs this sort of help. So, if there’s anything you can do, if you’re out there listening, please just start making T-SQL better for people.

You have to appeal to these developers. Right? You are no longer just selling C-levels on things.

Because now, C-levels can look at Postgres and be like, all my developers love Postgres and it’s free. Why am I going to pay seven grand a core or whatever ungodly cloud prices for this stodgy language that all my developers hate? You need to bring those people in.

You need to give those people a big hug. This is more like a strangle. So, let’s pretend that this is just a nice multi-arm hug on a person and not fingers around a neck. Because fingers around a neck is mean.

Unless it’s consensual. Right? So, this is just a nice hug for all the developers in the world. So, you can get them to be like, you know what? That SQL Server is all right.

Maybe we don’t need Postgres. Maybe we’d like a database platform that like, you know, works with all our Active Directory credentials or something. I don’t know. It’s crazy out there.

Anyway, I’m going to go now. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that someone at Microsoft, our friend Sam, will start fixing T-SQL so that developers can hate it less. And more people will use it.

So, I can keep having clients. You know? I’m going to keep this thing working here, you and me. Anyway, that’s good for me. Goodbye.

Going Further

How To Return A Dummy Row Instead Of Empty Results In SQL Server

Posted on December 11, 2024January 22, 2026 by Erik Darling

How To Return A Dummy Row Instead Of Empty Results In SQL Server

Thanks for watching!

Video Summary

In this video, I discuss a common scenario where you might want to return a row even when your query returns no results. This is particularly useful in analysis scripts or debugging sessions where it’s important to know that the search criteria didn’t match any records. To demonstrate, I show two different methods: one involving a clever use of `sys.databases` and a union operation, and another using an outer join with a derived table. Both approaches are effective and have minimal performance impact, making them valuable tools in your SQL scripting arsenal. Whether you’re a seasoned DBA or just starting out, understanding these techniques can help improve the clarity and usefulness of your queries.

Full Transcript

Erik Darling here with Darling Data. I’m cool. In today’s video, we’re going to talk about how you can return a row when you actually find empty results. This is something that I use a lot in my analysis scripts because sometimes you run a query to try to find something and it doesn’t matter if you’re debugging or if you’re returning results. Sometimes people want to know that they didn’t find a row. rows for something, right? It’s a good thing to know that like, oh yeah, we checked that but we didn’t find anything. It’s a reasonable thing for some people to want to do. So, we’re going to, I’m going to show you two different ways to do that today. If you like my channel, you can join the like 25 other people who have memberships and give me like four bucks a month, sometimes more. There are some very generous people out there in the world and I thank you kindly for your generosity. If you’d like to join their ranks and also be thanked kindly, there are some very generous people out there in the world. And I thank you kindly for your generosity. If you’d like to join their ranks and also be thanked kindly, there’s a video, there’s a link in the video description that says like become a member and you can become a member. If you cannot part with four dollars a month for whatever reason that you are keeping secret from me, we shouldn’t keep secrets, you know, because we’re in love. But if you just can’t, you know, if you’re just being like, there’s like some financial infidelity between us, you can like, you can comment and you can join the over 5,000 other data darlings out there in the world who have subscribed to the channel and who get, hit on the head with a very small hammer every time I publish a video. If you need help with SQL Server, I am the finest SQL Server consultant known on Earth. There might be better ones elsewhere in the world or other, maybe like in the multiverse or something. On Earth, I just haven’t found one yet. So you can hire me to do this stuff or anything else. And as always, my rates are reasonable.

Why didn’t that? There we go. I’m the least good power pointer on the planet. That’s why all my presentations are just demos. If you would like some very high quality, very low cost training, you can get all of mine for $150 about USD for the rest of your life by either clicking on the link in the video description or going through multiple steps out of your way to go to that URL and enter that discount code. And it can all be yours. You can have me for 24 hours. It’s a bit of an indecent proposal, but I promise I’m wearing almost the exact same Adidas attire in those videos. I have no upcoming events. I’d love to have an upcoming event. Tell me about your events. I’ll get there eventually.

All right. Let’s talk about returning rows when you don’t have one. So I guess contextually, we should go into Stack Overflow. It doesn’t matter much for this demo. But let’s say that, you know, like normally you could do something like this, which is multiple steps and kind of annoying, right? So you create a table. Well, in this case, I’m using a table variable because it doesn’t matter. Performance is not an issue here. And then, you know, we insert data into that table variable.

And if the row count from that table is greater than zero, then we select data from the table. And if it’s, if it’s zero or I guess lower, right? Then we return this thing that says the message, like the return, this message that says the table is empty. Well, you know, that’s okay, but it’s like multiple steps and you got to begin and end and mind all, mind your P’s and Q’s, buster, and all sorts of other stuff that just kind of not fun to do.

But there are other ways to do that like this. And this is a, I’m particularly fond of this one because I think, I think this one is quite clever. Where if the, we select from the thing, right? In this case, we’re selecting from sys.databases and we don’t, there’s no database ID higher than this. So we can never find anything. And then we union all that to this. But on the union all part of the query, we have this, say, where not exists select from this CTE up here.

So if nothing ends up up here, then we return results down here, which is wonderful, right? Because if we don’t find a result all in one query, we can just spit a, spit a row back out, right? Say there was nothing in there, right? So that’s, that’s my favorite way of doing it. And of course that the inverse works where if we do find rows, right? We’re just going to say where database ID is greater than one. We get all the rows back that we would care about, but not the dummy row from the bottom one, because something did exist in the CTE up there.

So that’s one way of doing it. Another way of doing it. And I have the written version of this on my blog, on my blog, in a blog post on my blog, the blog, the blog. And someone in the comments left this as a suggestion. And I think this is also clever, where you’re, you have like a derived thing, which is like kind of almost the same thing. And then you left outer join that to whatever, you know, query you care about. And if you don’t find anything in that query through the magic of is null, you can return a result that just shows you when the thing was empty.

And of course, if you run that for a query where something does exist, you get all the rows back. So there’s two, two different ways that I’ve found. Well, one that I’ve found and one that was given to me on my blog by, I forget who it’s been a while, but they gave me that, that, that suggestion. And that works pretty well, too. So if you are ever writing something and you’re like, well, I don’t want to just return an empty result set.

I want to return like a dummy row that tells people, hey, we didn’t find anything. Those are a couple of ways to do it. And they work pretty well. And there’s not really any performance impact to them. So good stuff there. All right. Cool. Well, that’s it for me in this video. Thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And remember, if you are a major network executive and you’re looking for a young, handsome talk show host to interview Hollywood celebrities about their lives, interests, their loves, passions, I’m available. Sooner rather than later. I won’t always be this young and good looking. Someday I’ll be old and good looking. Anyway. All right. I’ll have to, I’ll have to submit some headshots then. All right. Goodbye.

Going Further

When Function Rewrites Need Query Rewrites In SQL Server

Posted on December 10, 2024January 22, 2026 by Erik Darling

When Function Rewrites Need Query Rewrites In SQL Server

Thanks for watching!

Video Summary

In this video, I delve into a tricky issue involving query rewrites for inline functions in SQL Server. Specifically, we explore the challenges that arise when rewriting scalar UDFs to inline table-valued functions and integrating them back into existing queries. The process can be quite frustrating due to obscure errors like “Aggregates on the right side of an apply cannot reference columns on the left side,” which require unconventional query rewrites using CTEs or derived tables to resolve. Despite these quirks, understanding how to navigate such issues is crucial for optimizing performance and maintaining clean, efficient SQL code.

Full Transcript

Erik Darling here with Darling Data. In today’s video, we’re going to talk about query rewrites for inline functions. And I know this is kind of a hard one to title because it doesn’t have a very apparent title. We’ve spent a few videos talking about rewriting scalar UDFs to make them inline functions and some of the weird stuff you have to do to get that working sometimes. But this one, there’s a very, the function rewrites for the function rewrites for the function rewrites for the function rewrites for the function rewrites. What happens though is when you start using the inline version of the function in the query the way it was written before, you start getting weird errors. And we need to avoid those errors because no one likes errors. You would think this sort of thing would be easy, but it’s a database. So nothing is easy. That’s why I have the problems that I do in life. Anyway, if you like this channel, you can join the nearly 25 other people who support this channel by signing up for a membership. If assuming you’re okay with parting with like four bucks a month, if you’re not okay with parting with like four bucks a month, you can like and comment on the videos and you can even join the over 5000 other data darlings out there in the known universe and subscribe to the channel. So you get notifications whenever I publish a video. It’s a great trade. It’s an awesome trade off. You get a ding and a video and I get to sweat under hot lights. Everything’s coming up you. If you need help with your SQL Server, I am the best SQL Server consultant in the world. A lot of my clients have worked with other SQL Server consultants, thrown their hands up after not getting results and come to me and gotten results. It’s wonderful being able to do that. It’s also very fun seeing the login names from other consulting companies in the world still on those SQL servers.

Hi out there. If you would like some very high quality, very low cost SQL Server training, you can get all mine for about 150 US dollars a month, not a month for life. It’s not a month. I don’t know why I said that. I think what I was going to do is tell you I don’t have a subscription where I charge you a month by month or year. It’s for life. You can get it all. You can either go there and use that code or you can just click on the link in the video description if you’re feeling particularly lazy. And you’ll end up at the site. With the coupon code applied. It’s magical. Since this will be airing after Past Data Summit, I have no upcoming events. If you have an upcoming event, let me know about it. Maybe I’ll come to your up event. I don’t know. We’ll find out. Anyway, with that out of the way, let’s finally do this. Let’s say that we have a scalar UDF. It looks something like this. It’s really not a big deal. Everything’s fine there. Everything is scalar UDF-y. Everything in there works.

And when we run the query, we get a particularly UDF-y plan. It’s a query that has all of the hallmarks of scalar UDF problems in it. This will run for a couple more seconds. And if we look at the execution plan, we’ll see that we have a compute scalar that sucks up the majority of the execution time in this query, right? Almost 11 and a half seconds of time. Well, I guess minus a little bit from over here. So I guess about 10 and a half seconds of time.

And then if we get the properties of this, we will have this fun warning over here. We will have this. I’m sorry that the properties window showed up like that. That is unintentional. We will have this non-parallel plan reason yelling about our scalar UDF. Now, I know what you’re thinking out there. Ah, SQL Server 2019, UDF inlining. I know. The thing is, not a lot of people are in compat level 150.

And even if you’re in compat level 150, there are still a lot of scalar UDFs in the world that cannot be inlined automatically by that feature. So there’s still a lot of UDFs out there to rewrite. Trust me, I know. I rewrite a lot of them. That’s another reason why I have many of the problems that I do.

So this is a fairly straightforward function to rewrite to be an inline table value function, right? This is just fine. You just do the same thing with returning a table rather than returning the value like you do up here, right? So whatever. Not a big deal. The problem is, if you try to stick that in the select list the way that you do, or the way that we did with the scalar UDF, we get this error. And it’s a very obtuse error.

Aggregates on the right side of an apply cannot reference columns on the left side. Well, I don’t see any applies in here. We can’t even get an estimated plan to see if there’s an applied pattern somewhere in there. It’s just, you know, where we have a subquery and that’s, I don’t know.

Like, you couldn’t write it as cross apply because then it would definitely be an apply. Or even order apply would still be an apply. Apply is the key word there. So there are two ways that you can fix this, rewriting the query.

And they’re both stupid. You shouldn’t have to do this. This is asinine. Asinine to the very core, to the back teeth, as some might say.

Okay. So you can either use a CTE and you can do the aggregates here and then pass the aggregates to the function here. And guess what? This will work just fine. We get a much faster parallel execution plan.

So this is okay. And if you’re like me and you just like to use derived tables because you don’t want to use CTE because you don’t want anyone to get the impression that CTE are really any better than derived tables, you can do that too.

And you can get the same fast parallel execution plan. Ain’t life grand. So if you’re rewriting a scalar UDFs and you run into that error where SQL Server is like, oh, no, apply, right side, aggregates, blah.

All you have to do is rewrite your query a little bit to do the aggregates from something in something and then select from that something. And those aggregates magically work in your inline table valued function.

Who would have thunk it? Crazy out there. Anyway, I’m going to just go cry now. I wish I was good at something other than databases.

Maybe I’ll get discovered on here for something else. I don’t know. If anyone needs a young, handsome talk show host, I’d be happy to talk to Hollywood celebrities for almost any amount of money because they’d probably end up liking me a lot and bringing me into their inner circle, most trustworthy friends, and go to cool parties in the hills.

They could buy to all these pains, these sufferings. Anyway, thank you for watching.

Going Further

Choosing Between Triggers And Foreign Keys In SQL Server

Posted on December 9, 2024January 22, 2026 by Erik Darling

Choosing Between Triggers And Foreign Keys In SQL Server

Thanks for watching!

Video Summary

In this video, I delve into when you might want to consider using triggers over foreign keys in SQL Server. Erik Darling from Darling Data shares insights on how cascading actions can lead to unexpected issues, especially with the new cardinality estimator. I highlight a scenario where foreign keys cause performance problems during foreign key validation and discuss potential solutions like query hints. The video also covers best practices for writing efficient triggers, including setting transaction isolation levels and using hint options to ensure optimal performance. If you’re interested in more SQL Server tips or want to support my channel, consider becoming a member by clicking the link in the description.

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we’re going to talk about when you might want to consider using triggers over foreign keys in SQL Server. There are times when you may want to do this because leaving foreign keys up to their own devices can cause all sorts of weird stuff, especially when you have cascading actions involved, especially when you might care about SQL Server’s query plan choice when when cascading those actions out or even when validating foreign keys on insert, update, and delete. Of course, foreign keys and triggers are great ways to maintain referential integrity in OLTP databases. If you have foreign keys or triggers in your data warehouse, you should be dragged out into an alley and beaten with a rigid implement that I will not apologize for the length of the way that I have been doing. I have been apologizing for the length of some of my videos. If you would like to join the 20 or so other people who have been so kind as to become members, and I will not apologize for the length of my members list of this channel, you can do that by clicking the link in the video description. If for some reason you think you have something better to spend $4 a month on, you can like, you can comment, you can subscribe. You can do all sorts of nice things that make me feel less lonely when I wake up at five in the morning and look at my phone. If you need help with SQL Server, I am great at all of these things. And you know what else is great? My reasonable rates. Bam, sold. Pretty good there. Gotcha. Gotcha. You’re gonna be knocking on my door any second now. If you would like low cost, high quality SQL Server performance tuning training, that’s far cheaper than anything you will find on Black Friday. For the rest of your life, not just for a year, you can get over 24 hours of performance tuning training from me at that link with that discount code. There is also a fully formed link in the video description that you can click on without having to do any work or copying and pasting. If you would like to catch me live and in person, Seattle, you can go to Seattle. That increases your chances of your chances of seeing me by a bit. If you actually attend past data summit, you increase those chances exponentially of seeing me. And if you come to me and Kendra’s pre cons on November 4th and 5th, you are nearing a 100% certainty that you will see me. Right? Like, can’t rule anything out. Maybe I’ll like get struck by some kind of weird plasma bolt and turn invisible between now and then. But you will at least still hear my booming voice and see a floating lavalier mic going around the stage. If I turn invisible for past data summit. Well, I mean, just watch out on kilt day. We’ll have some hijinks going on. Anyway, let’s go talk about triggers versus far and keys. And the post that, of course, inspired this, because unlike some other SQL Server websites out there, I like to give credit where credit is due.

And my friend, my good friend, Forrest McDaniel, who I got to catch up with at Data Saturday Dallas, wrote this post in 2018. God, I was still in my 30s. Is this really six years old? Yeah, yeah. In like a month, this thing is like exactly six years old. Anyway, here’s a forest demo in a canute shell. You have a table called P that’s sort of like parent. You have a table called C that’s sort of like child. And you put some data in the parents and you put some data in the children’s and you add a constraint to that table. And let’s just make sure this thing is actually on there. So nothing weird happens. This is the problem that Forrest ran into. And this is a problem that I see a lot more people running into as they start flipping to higher compatibility levels and start succumbing to like Oregon Trail style succumbing to the new cardinality estimator.

So what I’m going to do is force the default with endless air quotes cardinality estimator. That’s the new one. And what you’ll see is a plan that looks like this. And this is not the kind of plan that you want to see when you are validating your foreign keys. This is a very bad plan for foreign key validation. This type of plan will generally be a lot slower than the nested loops variety that you would normally want to see here.

And this type of plan in a foreign key greatly, like you going to Seattle and going to my pre-con, how that exponentially increases your chances of seeing me. Seeing this type of plan greatly exponentially increases your odds of seeing a whole lot of deadlocks, especially if two things try to delete from these tables at the same time. One really sort of interesting, I’m not even going to call it downside, just one very interesting effect of cascading foreign keys is that under the covers they will switch to the serializable isolation level.

And you can, you know, like that’s a pretty strict one. If you come to my session at Past Data Summit about isolation levels, you will learn more about that. Really tying things in today, this is a big sales pitch for Darling Data.

This is the kind of thing you really want to avoid. Now, there are ways to fix this. But if you are like, you know, in any framework or some other ORM only shop, you might have a hard time injecting query hints into your queries.

You know, you might not be able to force a plan because of, you know, differences and stuff. It might not, just might not go well. You can fix this problem with a cardinality estimation hint like this.

Where SQL Server now, because you use the legacy cardinality estimator, SQL Server estimates joins differently, right? There’s a difference in how SQL Server looks at join cardinality between the two. That’s one of the biggest differences between them.

And now we get a nested loops join and things will be much happier as far as when you actually have to cascade deletes out. Because, you know, when you cascade a delete, you’re not just deleting from one table anymore. You are deleting from two tables.

You are deleting from up here. And then you spool a bunch of data into this thing. And then you spool a bunch of data out of this thing. And then you join that spooled data to the other table where the cascading thing lives. And then you delete from another thing.

So all in all, you are deleting from two tables. And the more indexes might be involved here, like say you have a whole bunch of indexes on both of those tables for all the different queries that you’re on, the more stuff you’ve got to delete from, the more stuff you’ve got to lock, the more problems are on.

Goes without saying. All of that stuff. So another way of fixing that is to stick a loop join hint on here. Now, what you’re going to notice if we look at these two plans together is that these are nice, thin, friendly looking lines.

And these are not so thin, friendly looking lines. The reason why is not because of anything other than the cardinality estimator still. You would still want to see this plan even in this state.

The one up there, you get a nested loops join because SQL Server does a better job with cardinality estimation in this case for that join. The bottom one, we’re forcing a loop join, but SQL Server still uses the same cardinality estimation for the default cardinality estimator. That’s why those plans look different.

Both of these use apply nested loops. There’s no prefetching in one and not in the other. There’s no optimized nested loops in one and not in the other. Everything is the same except the cardinality estimation model for that. So if we were to add option loop join and force the legacy cardinality estimator, we would see just basically a plan that looked like the force the legacy cardinality estimator one.

That one only looks different there because it’s only the loop join hint, not the cardinality estimation hint. Now, if you wanted to write triggers to replace this stuff, you would have some stuff to think about. Right.

Because, well, I know thinking isn’t your specialty. I’ve seen your queries, seen your servers, seen your schema, seen your indexes, seen a lot. I’ve seen a lot.

I know I’m like Santa Claus when it comes to SQL Server. I see everything. I see everything. And so there’s some stuff that you have to do inside of your triggers to make them work right or to make them work well or work better. A lot of that stuff comes at the very top.

For example, it’s very good to have this condition at the very beginning of your trigger so you can just bonk out if there are no rows. Right. So if someone does an insert that doesn’t actually insert anything, you would want to do this to avoid having to run anything else in the trigger.

You generally, even though exact abort is the default for triggers, I find it’s a lot safer to set no count and exact abort on here just in case any client options or client settings changes may have tinkered with this. And then you want to set row count to zero, I guess because Paul White says so. So we’re going to listen to Paul on that one.

Now, since behind the scenes, cascading foreign keys use the serializable isolation level. If you are in an environment where that sort of thing might matter, you would probably want to set the transaction isolation level inside of your trigger to serializable as well. So that you get commensurate blocking results when these things execute.

Like I said before, that can tip the scales towards things having a lot more locky and deadlocky. But, you know, this is the sort of this is the price you have to pay if you need this level of consistency in your queries. Another thing that you might have to consider is and this and this comes down to sort of the control thing that I was talking about before, where, you know, if you know, you always want to loop join for this stuff, you can stick the option loop join hint in the trigger.

And you can always get that option loop join. You can always get that loop join plan that you’re after. You can also always put a force seek hint into this part of the query so that you never have to worry about a scan happening on the inner side of anything.

Right. Not on a merge join, hash join, but because we’re only getting loops here. As long as you have a good supporting index, that loop join will always you want to force the seek in there.

Where you have to think about stuff a little bit, though, is if you are using an optimistic isolation level, you might need to add this in because otherwise you might get strange results from reading stale data. Now, Paul White, and I’ll put a link to this post in the video description as well. Paul White goes over this in one of his isolation level series blog posts, sort of explaining something similar.

But, you know, so if you’re using RCSI, you would probably want to use this so you don’t mess anything up. You could, of course, use a serializable thing, too. But most people would probably just benefit from just a regular old recommitted lock hint in there.

That would probably be good enough for most scenarios. But, you know, as always, make sure that you’re testing for your actual use case, not for what I say. That might be OK here.

I don’t always know. So, you know, I have limited insights sometimes into exactly what is going to be the best thing for you locally. But you can do that for all sorts of triggers.

This is an example with an insert trigger. This is, of course, an update trigger that does nearly the same thing. And this is a delete trigger, which would probably come pretty close to doing exactly what we just did with that foreign key with the cascading delete, where we would delete from whatever got inserted into there.

Actually, I might want to do that with the deleted table. You know. Maybe I forgot to change that one when I copied and pasted it.

We’re going to leave that one on the editing room floor, though. That’s going to survive there. And I’m going to give myself 10 demerits. I’m going to go do some push-ups.

But anyway. Let’s just change that to deleted. The deleted. There we go. And we’re going to keep that.

We’re going to keep the as I in there. And now everything looks good. So anyway. Copy-paste errors aside, I’m pretty happy with this one. Though I might have to apologize for the length since we did go a little bit over the 10-minute mark. But anyway.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that you’ll forgive my copy and paste error. And I hope that you’ll continue watching despite the fact that there was an obvious oversight at the end of this video. But anyway.

Goodbye. Cruel World. Cruel World.

Going Further

Simulating WAITFOR In Scalar UDFs In SQL Server

Posted on December 6, 2024January 22, 2026 by Erik Darling

Simulating WAITFOR In Scalar UDFs In SQL Server

Thanks for watching!

Video Summary

In this video, I delve into an intriguing and somewhat tricky aspect of SQL Server functions: making them wait for a specific amount of time. Erik Darling from Darling Data explores how to implement such functionality within user-defined functions (UDFs), which isn’t straightforward due to the limitations imposed by SQL Server. However, with some clever workarounds and insights from Alexander Kuznetsov’s blog post, we can achieve this through while loops that essentially create a delay. I walk you through creating these functions and demonstrate their use in practical scenarios, showing how they can be used for various sneaky and interesting purposes. If you’re curious about the full extent of what can be done with such functions, mark your calendars for Seattle to attend Past Data Summit where we’ll dive deeper into these and other advanced SQL Server techniques!

Full Transcript

Erik Darling here with Darling Data. In this video, we’re going to talk about something exceedingly tricky that you can do. It has to do with having functions wait for a specific amount of time. We’re going to talk more about this. It’s really hard to do a pithy intro here. But we’re going to talk about it. But before we do, we have some things to talk about. Like this channel. If you would like to sign up for a membership to this channel for as low as $4 a month, you can do that by clicking the Become a Member link in the video description. If you don’t have $4, even for one month, perhaps that just cuts into the ramen budget a little too heavily. You can do all sorts of wonderful free things that let me know you care. You can like my videos. You can comment. on my videos. And you can subscribe to the channel. I do like seeing all those things. It brings me a very specific type of joy. If you need help with your SQL Server, probably not anything that we’re going to be talking about today, but I am a consultant and I do some things with SQL Server very well. I don’t set up availability groups. I don’t really sit there and mind your backups. I don’t want to talk about capital R replication. But I can tell you, if your SQL Server is healthy, I can tell you if your SQL Server is as fast as it could be. The answer is no. And I can do all sorts of other things like make it faster. I don’t know. Some people enjoy that. Some people prefer that.

I can even reduce your cloud bills. How about that for a sales pitch? You want to give less money to Microsoft or Amazon? Call me. We can do that together. If you need some high quality, low cost training, you can get all 24 hours of mine for about $150 USD by going to that link up there and then using the discount code springcleaning. There is, of course, a link to automate all of that wonderfulness in the video description as well. So, and probably at this point, this might be past Data Summit. I don’t know. Maybe a little bit before, but you can still catch me there, November 4th and 5th. If it is past November 4th and 5th and you didn’t go to Seattle to pass Data Summit, you missed it. Sorry. Can’t do anything to help you there. But runner-up prize is if there is a SQL Saturday or Data Saturday or whatever Saturday event near you that is in search of a pre-con speaker, let me know. I will do my best to get pre-coned there.

But with all that out of the way, let’s talk about what I want to talk about, which is how you can get a function to wait for you. So, I realize that the logic in this function is not complete. Right. It just says, if delay is greater than zero, do this thing. Otherwise, we’re going to have sort of whatever in there. We could, of course, fix that with like, you know, putting 0000000 in there.

And then it would only change if delay was greater than zero. Otherwise, we would wait for zero seconds. Maybe that is enough. Actually, this function is now Turing complete. We’ve done it. Good job, us. But the problem is that if you try to do this in a scalar UDF, we’ll get an error message. It’s saying the invalid use of a side-effecting operator wait for within a function. That’s no good, is it?

We seem to have hit a wall here. Hmm. What can we do? What can the clever and devious mind do?

Well, a very clever and devious mind, long before I started thinking about this, actually had an example of what you can do. Smart guy named Alexander Kuznetsov. Kuznetsov. Something else. Probably I was pretty close on both of those. Left SQL Server for Postgres around 2013 or 14. Hasn’t been heard from since. Just kidding. He’s doing his thing.

But actually, maybe now that Pass has a Postgres corner, he’ll be back. I would love to give him a very big hug, probably out of nowhere and terrify him. But anyway, a long time ago, he wrote a blog post about scalar UDFs and he actually did the hard work for me.

And all I had to do was find a link to the hard work because everything is in the Wayback Machine now. But what you can do in a function is a while loop that for a, you know, you declare all this stuff and you set a delay in the function input. And this is just the default. You can change this, of course, when it actually runs.

But then you say while the current date time is less than that thing, you just, you know, run this stupid loop thing. And all you’re doing is setting a bit to null over and over again. So it’s very, very little work. But we can test that out and we can say, let’s just make sure this function is actually in there and created.

Sometimes weird things happen. Who knows? Who knows SQL Server? But if we say wait for three seconds here and we keep an eye on the clock that’s sort of next to me over here, you’ll see that that waited for exactly three seconds and then returned our column, right?

That’s pretty cool. And we can also have that, we can also have that act as input from a select list. So if I say select one, union all, select two, union all, select three, this function will wait for one second and then two seconds and then three seconds.

And we can, we can actually test that out by running this. Did I run that? No, I didn’t. I just hit R. Good job. Finger was off by one.

But don’t worry, this function will run for exactly, sorry, six seconds, right? To return those three rows. Now what can you do with something like this? All sorts of interesting, sneaky, outrageous things.

But you’re going to have to come to Seattle. You’re going to have to come to Past Data Summit in order to see all of those sneaky, outrageous things in action. So I suggest you buy your plane tickets now because it’s getting kind of late in the day.

It’s time to boogie. So anyway, thank you for watching. I hope you learned something.

I hope that you will be titillated to the point of travel by what I’ve discussed here. And you’ll be looking forward to seeing just how many awful things you can do with a function like this in SQL Server. Because trust me, there’s a lot.

Anyway, thank you for watching.