T-SQL Multi-Statement UDF Plan Caching and Reuse in SQL Server
Video Summary
In this video, I delve into the intricacies of multi-statement table-valued functions (MTVF) in SQL Server, specifically focusing on parameter sensitivity and plan caching issues. Building upon yesterday’s discussion about scalar UDFs, today’s episode explores how these functions behave differently when recompiled, leading to varying execution plans and performance impacts. I demonstrate this through a series of demos, highlighting the differences between using a function with and without an `OPTION (RECOMPILE)` hint, and explain why certain execution plans are chosen over others. By running multiple executions and analyzing query plans in QuickQueryStore, we uncover how SQL Server handles plan caching for these functions, revealing the parameter sniffing problem that can arise. If you have encountered similar issues with your own functions or need help optimizing them, consider reaching out for consulting services. Additionally, supporting my channel as a member of the community will ensure you don’t miss any future insights on T-SQL and database performance optimization.
Full Transcript
Erik Darling here with Darling Data. And in today’s action-packed episode of YouTube, we’re going to continue our conversation about UDF, plan caching, and parameter sensitivity, whatever. I forget what I call these things. But anyway, this is what we’re talking about. Because yesterday we talked about scalar UDFs and what happens there. So today we need to talk about these and what happens here because it is the next logical step in the hierarchy of T-SQL functions. Alright? So let’s do that. But down in the video description, if you would like to hire me for consulting, perhaps you have parameter sniffing problems with functions of your own and you’re like, well, Erik Darling, why can’t you help us too? You can do that. If you would like to buy my training, you can also do that down in the links below. Where these lovely fingers point. You can be. You can become a supporting member of the channel, just like PBS. You can ask me office hours questions. And of course, as always, if this kind of content pushes you in your happy place, well, please do like, subscribe, and tell 50,000 or so friends. Anyway, past data community summit, Seattle, Washington, November 17th to the 21st. Kendra Little and I will be tag teaming to you tantalizing, titillating days of T-SQL pre-cons. Like I keep saying, it’s going to be the best T-SQL pre-cons that humanity has ever witnessed. So you should be there and get a t-shirt to commemorate your presence there because otherwise no one will believe you were there, right? Be like that Skid Row concert you went to in 1987. Did you get the shirt? No, you weren’t there then. Anyway.
Let’s get this database party started. And let’s see. We’ve got our function queued up here. Now, just like the ScalarUDF version of this, I had to write two versions of this function. One, normal function that we’re going to be using in our demonstrations or majority of our demonstrations. And two is a version of this with an option recompile hint in it, which probably already gives away the whole thing that, well, we have a parameter sniffing thing that can happen with these two. That probably messes the whole gig up, right? There I go. Blowing the whole waterworks. But anyway, I have reapplied my filters to this extended event session. So that’s great.
And what we’re going to do is we’re going to run the recompile version of this multistatement table valued function. Well, we can see that it is indeed a multistatement table valued function because it returns a table variable. And the primary driver of this UDF is to insert data into that table variable. And then, well, we just say return here. If we wanted to keep things like normal, we might say return that table variable, but we just say return. Don’t ask me why. I swallowed a fly. I swallowed a fly. But let’s make sure that these things are both in here and all correctamundo and that good stuff. So if I run these two functions, or rather this one function with, again, we’re going to do the same couple, same two of the recompile things.
Actually, there’s a third one hiding on me down here. Look at that. All right. Wonderful. We’ve got three. Run these recompile things. You might notice that these taken a little longer than we might like. The days really drag on, don’t they? Anyway, if we go look at the live data for these, we’ve got a query plan over here that looks like this for the first one. It takes about 5.4 seconds to run. Now, in the last set of demos with the scalar UDF, where I talked about how the query that calls or invokes the scalar UDF, the query itself is inhibited from going parallel because it invokes a non-inlineable scalar UDF.
Multi-statement table value function. Part of what took me a little bit to write these demos was working around this part is that the insert into a table variable has the same general problem that invoking a scalar UDF does, where if we look at the properties over here, I’m just going to make this a little bit wider, apparently a little bit wider. We have this thing in our query plan now. So whereas the last one had a non-parallel plan reason talking about the presence of a scalar UDF, we are not allowed to insert into the table variable.
This is true of all table variables, not just ones in multi-statement table value functions. Table variables can’t be modified using a parallel execution plan, including inserts. So part of why this one is so perturbingly slow is that.
Let’s close that. And for the second invocation, we have a slightly different looking execution plan, right? This one does some stuff and then does a hash join down to here. This one does some stuff and this one does the loop joins again. So this is the key lookup plan.
This one runs a bit faster, right? The total time on this is 238 milliseconds. Whereas for this one, it was five and a half seconds, which is no good at all. And then this one, this one takes 4.9 seconds and uses a similar plan to the one before, except there’s a hash join here.
You know what? These were showing different plans before. This time, I guess it didn’t work out so well. This, this, oh wait, there was something weird and different. Oh yeah. So there is a difference here. This is an actually, this, this was a funny one.
This one, this one took me a minute to catch. This one does a hash match inner join to the post table, right? This one does a right semi join to the post table. So it is a slightly different execution plan.
That caught me off guard there for a second. Someday I’ll have this all memorized, but we’ve cut the query plans. We see we got three different ones when we recompiled. Cool. Now let’s go free the proc cache, right?
And let’s say you select the top one from this. Remember if we like, we don’t want to cross apply to this. We could theoretically outer applying at correct results, but for this, we want to have the scalar sub query in the select list because this will not reduce rows.
Cross apply would reduce rows because it’s a like relational operator. And it’s like an inner join where if rows on like, like the rows don’t exist here, we won’t get a row back, right?
And we want our results to maintain correctness across different executions. So we need to put the, put the function call in the select list for this one. We could outer apply to it. But again, that’s a little bit more of a relational thing, whereas this is just like, go find me a result.
So, uh, we’re going to run this for reputation equals one. Let me make sure I did this. I can’t remember. It was too long ago. And let’s run this.
And that’s going to take about five ish seconds to run and get a result back. And if we come over here, uh, we will have a new entry in this. And we can see that we, we used the query plan that we got when we ran this for the one row with a reputation of one, right?
So that’s that new thing. Now I’ve remembered a clear data and not clear filters because that was annoying last time. But, uh, if we now run this for the 800,000 people, this is going to take about 20 seconds because we, it’s about five seconds per run and we’re returning four rows.
So the rules of mathematics tell you that you should take the number five, like about five, and you should multiply it by four because we have four rows. Four is not negotiable on that.
So this will take 21, 22 seconds, 20 seconds even. Great. So we’ve got that. And now we’ve got, uh, four query plans and we’ll see that SQL Server reused that same plan each time, right?
We, we reused the hash match inner join plan where we go down and scan the post table. Fantastic. We did it, right? We’ve, we’ve, we’ve cracked the case. So let’s do the same thing that we did.
Uh, we’re, we’re only going to do three just because, um, the, the plan for 11 was also about five seconds. So it doesn’t really make a lot of sense to show that one here. Um, I’m just going to do a one and three for this one, just to move things along a little bit.
So now if we run this for three, we get a, we get like the faster query plan for this back. And if we run this for the 800,000 rows here, we get those four rows back pretty quick. So, um, what I’m going to do is because there’s not a view for multi-statement table valued functions, the way that there is for, um, scalar UDFs, uh, they’re not tracked in their own DMV.
Uh, I’m going to use, um, quickie store to look at this and, uh, we should see, well, if we come over here a little bit further, five executions of each one of these, uh, UDFs or rather it’s the same UDF.
We just got different plans for it, right? You can ignore the all plan IDs here from the last hour. We got, uh, for a query ID 28475, we got plan ID 6007 and 6008. Uh, there was another plan ID for this from an earlier run that still ends up in all plan IDs.
It was don’t, maybe don’t want to worry about it so much. Actually, that might’ve been the plan ID for the number 11. I forget the, would be nice if there was a row for it, but what the heck? So we get the five executions and we can see the differences looking in quickie store in average min max in total duration between these, right?
Uh, we can see that, uh, on average, uh, one of these did a lot better than the other one. So over those five executions, the, the execution time for the slower one really piled up. And this is again, uh, a parameter sensitivity issue inside functions in SQL Server.
So hopefully this has answered all of your deepest, most burningest questions about, uh, multi-statement table value function, uh, caching and plan reuse rolls right off the tongue, don’t it? Anyway, uh, we’re going to stop here because we have, we have reached the end of these demos clearly because the next function down there on the bottom is inline table valued functions.
So now we’re, we’re going to go do that next. That’s going to be tomorrow’s video because we still have to make another video for tomorrow. Otherwise we have no video for tomorrow and people will say, Eric, why is there no video today?
Tomorrow. All right. Cool. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you tomorrow where we will talk about inline table valued functions.
All right. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.