A Little About Default Constraint UDFs in SQL Server
Video Summary
In this video, I delve into the fascinating world of default constraint user-defined functions (UDFs) in SQL Server, exploring how they behave differently from other UDFs when used as default constraints. I walk through creating and testing these functions to understand their performance implications, especially in scenarios where parallelism is involved. By examining both a simple processing function and a more complex badge check function, I highlight the nuances of how SQL Server handles these UDFs under various conditions. Whether you’re looking for insights into default constraints or just curious about the intricacies of T-SQL functions, this video offers plenty to explore and ponder.
Full Transcript
All right, Erik Darling here with Darling Data, your pal, your bosom buddy, your friend till the end. And today’s video, we’re going to talk about, there we go, thank you, PowerPoint, default constraint UDFs. Because I got an office hours question about these at some point in recent memory, and I was not prepared for the level of sickness that exists in this world.
So, I had to look at this more, and I did, and I regret every moment of it. So, if you look down in the video description, you’ll see all sorts of helpful links, where you can hire me for consulting, buy my training, become a paying, supporting member of the channel, like PBS or something. You can, for free, ask me office hours questions, and if you like this content, please do like, subscribe, and tell a friend, because, well, what’s the point of making these if no one watches them, I guess?
Good question, right? So, I’ve mentioned before, I do have a few announcements that I will be making, hopefully in short order, about upcoming speaking events. But the only one that is publicly available at the moment is Past Data Community Summit, taking place in Seattle, November 17th to 21st, where I have two days of the best T-SQL material with Kendra Little that you will ever see in your life.
And, you know, you should just be there, right? So, that’s fun. Anyway, it is still Rocktober, so we are going to continue to be spooky.
Anyway, we’re going to look at two different kinds of functions here today. We have this first function, which doesn’t do anything. We’re going to call this loosely processing in memory.
I have another one that performs data access, which is a little bit more interesting. But we’ve got this function here called serializer. It takes an input here and then spits that input back out here. Well, very simple stuff.
And what we’re going to look at is, you know, kind of how that works. Now, when I first started looking at this, I was like, man, like this could get real bad, right? Because like, you know, like a lot of the UDFs I have in the Stack Overflow database are ones that like, you know, you put in like a user ID or a post ID and you go get information about that user ID or that post ID, right?
So, I was like, man, if I can pass a column to a default constraint UDF, we’re in big trouble. Thankfully, we can’t do that, right? Thankfully, Microsoft was like, nah, right?
So, we’re not allowed to put a column in there. We can only put, let’s see, expressions, constants, constant expressions, and in some contexts, variables. But for us, we’re going to not do most of that stuff.
We’re going to, whatever. Anyway, that is not a name resolution thing. So, like if I create the table like this without anything, and then I try to add that constraint, right?
If I say, hey, why don’t we make that thing in here? SQL Server is still like, nah. All right.
So, same error there. Good. So, like that’s actually a load off my mind immediately. So, the best we can put in there is a literal value. So, if we just say the number one, which means this is going to just spit the number one out no matter what, and we create the table, we are now successfully do that. We’re all set, right?
SQL Server is happy. We’re happy. So, on and so forth. Anyway, if we, like, we have a default constraint, right? And obviously the column is not nullable.
So, we can’t like put a null in there. But, if we look at this and we actually supply the column with a thing to put in there, what happens is kind of neat. SQL Server actually doesn’t deal with the UDF.
Now, this will only be obvious when we look at the other plan, but usually when there’s a UDF, let’s say like in a query, there would be like a non-inlineable UDF for a UDF and a computed column or check constraint, and now default constraint, holy cow. We would get like a non-parallel plan warning somewhere in here. We don’t have that here, but we will in a minute.
So, when you supply a value for the column, SQL Server does not run the UDF because it doesn’t have to produce a value, right? So, if we look at sys.dmExec function stats, a brand new DMV in SQL Server 2016. Wow.
We will see that there are no executions of our UDF in there. But if we rely on the default constraint, so we’re just inserting into this sumDate column. We’re not inserting into the column directly that has the default constraint UDF in it.
If we do this, we’re going to have something slightly different happen. We’re looking at the query plan. Now, we have this tsql user-defined function is not parallelizable thing in the query plan.
Telling SQL is telling us we cannot generate a parallel execution plan. Not that we need it to insert one getDateRow, but you see my point, right? Like we can tell the UDF was invoked there.
And if we look in our wonderful brand new 2016 DMV, we will see that we have one execution of the UDF. Okay? All well and good.
Now, the same pattern is going to follow if we say insert a bunch of rows, right? So let’s put a hundred rows into the table. And let’s look in here, right?
So since we supplied a value for bad default, we’re not going to have any additional executions of the UDF after doing the insert, right? So even though we put a hundred rows into the table, we had a value for every row. So SQL Server was like, we can bypass invoking the UDF for those rows.
If we put more rows in, but we rely on the default constraint, right? We’re not, we don’t have that bad default column in here. So we’re relying on SQL Server to do its thing.
And we run this, we put a hundred rows into the table. But now we look at this. We only have two executions, right? So SQL Server really only invoked that UDF once. Since we’re supplying a literal value to that thing and the value that it produces is, you know, like always like a static value almost.
SQL Server is like, well, I just need to figure out what this is once and I can do all the inserts with that. Right? So we get, even though we put a hundred rows into the table again, right?
We did that. SQL Server only, only executed the function once when we didn’t reference it in the insert list and supply of value. Now, normally, if there’s a computed column in a, sorry, if there’s a UDF in the computed column or a check constraint, there are some weird, there’s weird stuff that happens around parallelism.
For a computed column UDF, it’s like, unless the column is persisted and used trace flag 174, you’ll get a fully, fully serial execution plan no matter what. If it’s in a check constraint, you get a plan that’s forced parallel if you reference the column that has the check constraint, right? So for these, it’s kind of funny because all of these queries, whether I reference the whole table or just the bad default column in any way, SQL Server is allowed to use parallel plans for all of these.
All of these plans have all the visible indicators of parallelism, whether it’s parallel exchanges happening in here like these, or whether it’s the little, you know, icons that say, hey, you went really fast, right? So all of these queries are allowed to go parallel despite the UDF default constraint. Now, if we run this and look just to make sure, we still only have the two executions of the UDF from the inserts that we did.
Good stuff. All right. Let’s look at a different type of function. This one is called badge check.
And what this one does is a little bit different. And this was actually kind of tricky to come up with because like I said, a lot of the, a lot of the UDFs that I have in Stack Overflow, except something like a user ID or a post ID. And then they go out and find some information about that user or that post.
So it was a little hard to figure out something that I might be inserting into a table that was something I could go out and look and like, like also insert like a value that exists in the Stack Overflow database as the same thing. So what this thing does is like you supply a badge name to the UDF. SQL Server goes and figures out if that badge exists in the badges table.
And if it exists, then you put the badge name in. If it doesn’t exist, then you just, you say, then there’s no badge, right? So pretty simple stuff there, but this is the best I could come up with.
So I don’t have an index. So I intentionally do not have an index on the badges table on the name column. So if I run this query, it’s going to take around 500 milliseconds.
It’s not going to return anything, which is fine because that’s, that’s what I expected. But I wanted to do this to sort of like exacerbate the issue with the UDF. So let’s clear up the plan cache and let’s redo our bad default constraint table with this.
Right. So we’re still supplying just a static value in here. Like we’re not like, like this isn’t going to execute, like do something different for every row. It’s always going to be nope. Right.
So we’re going to ever for every row that goes into the table, we’re going to go look at the badges table and say, hey, does a badge called nope exist? If not insert whatever like other value was in there. So it’s a little bit different from the last one.
And the almost the same thing happens, at least at first. So if we insert into the table here and we look at the query plan. Oh, thank you, tool tip.
I definitely wanted you to show up and not let me click on properties. We won’t get the non-parallel plan warning. Right. So all the same so far. If we look at the exec function stats, DMV, we see no executions. And then if we insert, but we make the default constraint do some work.
Ah, you notice that took about 500 milliseconds. Right. And you might notice that there’s a compute scale out right here that took about 500 milliseconds because we had to go look at the badges table.
Now we’re going to see that we could not generate a valid parallel plan because the function plan is not parallelizable. Right. So good, good, good. Right. So just about the same thing. And if we look here, we’re going to see that we have one execution.
And now we’re going to insert 100 rows, but we’re going to supply a value for the column bad default. Right. So just the same thing as before. You do this. That happens instantly. Right. So this, this, you know, the zero milliseconds. And we will not have the go away tool tip. I hate you.
We don’t, we don’t have that non-parallel plan warning over here. It doesn’t exist. So all like all kind of good stuff there. Right. And if we go look at our function stats, DMB again, we still only have the one execution from when we did the single row insert.
But, so you might notice that there’s not a hundred rows in here because 500 milliseconds times a hundred is like 50 seconds. Right. We don’t want to sit here for that long, proving that, proving these things. So we’re just going to insert 20 rows here. And this is a little different, right?
Cause now it’s 500 milliseconds times 20 rows, which I forget how long that is. It’s maybe 10 seconds, maybe 20 seconds. No, 10 seconds. All right, cool. So now we have our execution plan. And if we right click here and look at the properties, we have our non-parallel plan warning over here.
But far more interestingly, we have a compute scalar that takes up 10 and a half seconds of time almost. Right. Like there’s no time here in any of these, there’s no extra time. Oh, there’s a millisecond added there. And there’s a, well, there’s no time added there. So all like basically everything except one millisecond of execution time is in this compute scalar.
And of course this compute scalar. Oh, thank you. Tooltip. They keep showing up. You’re really helpful. If we dig in a little bit to through the defined values, oops, that’s the wrong one. That’s also the wrong one. We want this one. We’ll see our scalar string for badge check with nope. Right.
So that this, this compute scalar was responsible for invoking our UDF. Right. And if we look in here, right. And we say, Hey, what happened? Now we have 21 executions. So once per row that UDF executed to go look stuff up.
So depending on what your scalar UDF does, if it just processes data, let’s say in memory, right? Like it, like you take an input and you return some other input based on that without going out and touching user tables, it’s all fine. But if your scalar UDF goes out and looks at another table, right?
Then you’re going to get this really crappy, like once per row execution and maybe even very, very slow execution because of what your, if your function body is slow. And then for the parallelism stuff, well, let’s, let’s go look at that. Do, do, do, do, do, do.
We get three plans and all of them are allowed to go parallel. It’s all the same visual indicators of parallelism, whether it’s the parallel exchanges like gather streams or repartition streams, or whether it’s the little racing stripes on all of our little operators here. Right.
So all of these things. So at least it doesn’t hurt select query performance. But if you write a really monstrous T-SQL UDF that goes out and touches user tables, well, you might be in for a bad time for other things. And of course, if we go look at this, we should still have just the 21 executions of the function from the single row insert.
And then the 20 row insert that did not where we did not supply a value for the column that has a default constraint on. So anyway, I thought this was all very fun and interesting. I hope you also thought it was fun and interesting or maybe just fun or interesting.
Maybe one of the two, maybe you just found it incredibly sad and depressing, which is also a good outcome because now you won’t do this. Right. Or maybe you never would have. Maybe you’re just that much smarter than whoever asked me the question. I don’t know. But anyway, that’s good here.
Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in tomorrow’s video where I think we’re going to talk about a bug in adaptive join plans with query execution time. So fun things to look at there. Anyway, thanks. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.