A Lot About Multi-Statement Table Valued Function In SQL Server
Video Summary
In this video, I delve into the intricacies of multi-statement table-valued functions in SQL Server, exploring their unique behaviors and performance implications. Starting with a practical example, I demonstrate how these functions can behave differently based on whether they receive literal values or columns as parameters, highlighting the impact on execution plans and performance. By walking through various scenarios, including interleaved execution, rebinds, and option recompile, I aim to provide insights that can help you make informed decisions when working with such functions in your database projects. Whether you’re a seasoned SQL Server professional or just starting out, there’s plenty of valuable information here to enhance your understanding of these often-overlooked features.
Full Transcript
Erik Darling here with Darling Data. And in today’s video, we’re going to talk about a multitude of interesting things around multi-statement table valued functions. That’s about as much as I can give you here because if I tried to explain it all here, the mind would boggle, you would implode, you would just have a bad day. You’d be like a fried egg in your head. So thank you, PowerPoint, for finally responding to me. If you look down in the video description, round about here, there are all sorts of helpful links. You can hire me for consulting, buy my training, become a paid subscribed channel member, ask me office hours questions, and of course, one of the most valuable things that you can do if you enjoy this content is to like, subscribe, and of course, tell a friend. Keeping it all to yourself is of course very selfish. So we want to avoid selfishness here at Darling Data. While you’re watching this, I will actually be in Utrecht. Amazing, isn’t it? I’m not going to be here where I’m recording this. I’m going to be way over there. I got on one of these things and flew. So that means the only thing left for me to talk about is past data community summit taking place in Seattle, November 17th to 21st, where I will have two days of delicious T-SQL pre-cons with Kendra Little. And then I’m home for a bit until someone else decides to pluck me from my domicile.
And send me spiraling out into the world to blather on about SQL Server stuff. Anyway, with that out of the way, let’s talk about multi-statement table valued functions and all of the interesting things that can happen. So I’ve got this function here, which is a multi-statement table valued function. You can tell because it returns a table variable, right? And I’m not saying that this is a good idea. Usually I prescribe against doing this sort of thing. But there are some interesting things to talk about if you ever want to know why. You shouldn’t do this sort of thing. So we’ve got that going for us. There are, of course, times when they’re fine and actually useful and stuff like that. But usually it’s in the hands of quite a skilled T-SQL practitioner. And if you do not, if you have not elevated yourself to that level, you should probably leave the dangerous stuff on the ground. Don’t pick it up.
This particular function has a thing that we’ll talk about later. We can call this some ominous foreshadowing here, where I’ve sort of treated the two parameters that this thing accepts as optional parameters. And I’ve done this. This is also something that I generally prescribe against. So don’t look at this and like aspire to this code. This is not what you should be doing. Okay, so leave this out, right? Like, say, Eric said, probably don’t do this unless you’re a skilled T-SQL practitioner who understands the full ramifications and is making an informed choice about writing bad code for a demo in Minecraft.
So what’s interesting here is actually, we should explain a little bit. Actually, I need to do one thing just to make sure that we don’t end up with anything funny. I don’t want to spoil any surprises. So we’re going to make sure I do that. But so what I’m doing is I am essentially selecting from the users table and I’m getting everyone with a reputation over 800,000. Right. And let me just add a column to this. Say c equals count big. So make things a little bit more clear here.
So the reason why I’m passing in the literal values that you see here is because of the data that I get back from this query. Right. So if I look at the users who have a reputation over 800,000, I want to know what their min and max creation date and last access date are. And just for you later, I want to show you how many of them there are.
So the min and max for those are 2008, 913 and 2018, 0902. And we get four of those rows back. Four. Remember the number four from this query. And now what I’m going to do is I’m going to execute this one and we’re going to look at what happens.
This takes about seven seconds. We did it. And we get eight rows back. But the eight rows that we get back are essentially for four users. Right. So these are the four users that we get back rows for. I know that it’s eight rows, but it’s four users that we get rows back for.
So each execution of each row that we produced from the four rows that we produced from the users table produces two rows from the function. Right. Four times two, eight. Cool. Let’s go look at the query plan. And what you might think looking at this query plan is, Eric, this took seven seconds, but I don’t see seven seconds of time anywhere in here.
And you would be absolutely correct. There is not seven seconds of time displayed anywhere in here. This all looks like it took 256 milliseconds. Well, if I were me and I were looking at this, I might think, well, did that thing just take a long time to compile?
But no, the compilation on that was very low, four milliseconds. All right. That’s not it. So what happened? Well, this function has a particular thing applied to it.
The particular thing applied to it is called interleaved execution. And we’re going to see that in the properties over here. I clicked on the wrong thing. Sorry about that.
So is interleaved executed is true. If it were not interleaved executed, we would simply not have this node. It would not say is interleaved executed false. It would just not be there.
Love how Microsoft does that. It makes you feel crazy. So this has a optimizer feature released in 2017 called is interleaved execution is true. And what that means is SQL Server ran the function and then kind of paused and then like tried to do some cardinality estimate stuff based on the function.
The way that it does that is by using this thing called a sequence operator. The sequence operator ensures that things happen in a sequence. It has sequenced the events.
So the sequence is it executes up here and tries to do some interleaved execution stuff where it tries to make a cardinality estimate based on what ends up in the table variable inside of the multistatement table valued function. Of course, the guesses aren’t really all that great, but we’re using a table variable. So you get what you get and you get what you deserve.
Okay, cool. Well, why is why is all this interesting? Well, let’s run this again. Let’s run this a second time.
And I know I apologize for making you wait another seven seconds. It’s not my fault. This is Microsoft’s fault. So if you would like to blame anyone or invoice anyone for your seven seconds, make sure it’s Microsoft. A second execution of this query plan, once it’s in the cache, now we see seven seconds.
Look, it showed up in here. This is no longer 200 something milliseconds from the stuff down here. Now we see the 7.1 seconds here.
So when you have a multistatement table valued function that receives interleaved execution, only the second execution shows the time spent in the function. Isn’t that fun? Isn’t that just great?
It’s wonderful. Doesn’t drive you nuts at all, does it? But what is interesting here, aside from that, is because we have passed in two literal static values here, we have done something. What is the thing that we have done?
Well, we have made sure that this function will only execute once. You can tell how many times this function has executed by the rebinds. All right?
So we have a rebind is something that happens with like spools primarily, where you might see rebinds and rewinds. A rebind means that you have gone out and you have fetched data into something. A rewind means you have used data in that something.
So primarily a spool. If you see like a lazy spool or something, you’ll see rebinds and rewinds. And the rebinds tell you how many times you went and put data into that spool. And the rebinds, like every time you like truncate the spool and put new data into it.
And the rewinds are every time you reuse the data in this spool. So it’s sort of like cache hit versus cache miss. Great.
What’s next? Well, if we run this a little bit differently and we use the creation date and last access date columns from the users table now, rather than the two values that we put in there, we’re going to use the creation date and last access date.
We’re going to do everything the same except that. This is going to take a little bit longer. And by a little bit longer, I mean 30 seconds.
And what are we going to do for 30 seconds? I don’t know. I could ask you how your day was. Ask you how school was.
What did you get up to, kid? Stuff like that. You can bond a little bit. Little father whatever you are bonding while this query executes. And, you know, you could answer me politely and say, I don’t know, whatever.
Whatever. School was school. But 30 seconds later, our chat will be over. Our bonding time will be done. And we’ll have a query plan to look at.
We get the same rows back, right? There’s the same, you know, four people with producing two rows apiece that we got back before. All well and good. But our execution plan is much different now.
There has been no sequence of events. And we get all the execution time up front. If we look at the properties of the multi-statement table valued function now, we have four rebinds, right? So we went from one rebind to four rebinds.
Because every time we correlated to this function, we had to pass in two new values. The two new values or whatever the creation date and last access date were from the user’s table. So not a good time here.
And another thing that’s not a good time here is we have this filter operator. And this filter operator is saying where the owner user ID column coming from the function equals the ID column coming from the user’s table. So this isn’t a great strategy in general.
Well, you know, if you’re going to do this sort of correlation outside, you should probably think about pushing this correlation inside, right? It seems like a reasonable thing. So we’ll add a parameter to this, to a new version of the function called multi-optional pushed, right?
And we’re going to add owner user ID. And we’re going to expand our bad idea where clause to include owner user ID here, all right? So let’s, and again, more ominous foreshadowing is the option recompile here.
So let’s create or alter this function. And now when we run this, right? So now we take the correlation out of here and we use the ID column here.
We’re going to start by passing in two literal values. Because what I want to show you is that as soon as you start passing in a column here, we have to start executing and rebinding the function more than once. This is where the extra time comes in, right?
So like when we had a situation earlier where, you know, the function took seven seconds, now we can sort of get that back by pushing the correlation in. And granted, we lose the multi, whatever, we lose the multi-statement table-of-added function interleaved execution thing. We don’t have that sequence operator anymore.
But now we have, and we do rebind this thing four times. Fine. Okay. Now we’re going to see the exact same thing here, right? And this is, the annoying thing is not that like this still takes seven seconds.
The annoying thing is that pushing that predicate in is not improving life for us, right? So like, we want to improve this in a way. What we’re messing up here is when, like, you can’t see it in here, right?
Because when you look at an actual execution plan for a multi-statement table-valued function, you don’t see what happens inside the multi-statement table-valued function. Like, up here, like, we can see what’s going on in the function. We select from posts.
We, you know, group by these things. We get a sum of this. We filter on the sum. Like, we can see what’s going on, right? If we get the estimated execution plan for this query, now we get an additional query plan back, right? The additional query plan that we get back is for the function.
We don’t get this back with the actual execution plan because then we’d see, like, four execution plans for the function, right? One for each row that comes out of the user’s table. If we’d returned a lot of rows from the user’s table, like 1,000 or 2,000, we’d kill SSMS.
But if we look at the scan over here, we can see that we’re doing the typical thing that happens when we use this sort of optional parameter query syntax, where we have, like, all of these predicates expressed out fully, like, owner user ID equals isn’t all this. Creation day is greater than isn’t all that.
Creation day is less than isn’t all that. So, like, this is why you don’t want to write queries that way. What’s particularly annoying is that if you add option recompile outside here, we still get the same problem, right? So, this is where things get a bit sticky and tricky, right?
We still get the same seven seconds here, and we still get the same index scan in here, where SQL Server does not get the parameter embedding optimization involved inside of the function. Even though we’re like, hey, query, option recompile, when we don’t have that inside of the function, things break, right?
Or rather, things don’t go all the way we planned. So, I’m going to add option recompile inside of the function, right? And we’re going to recreate this thing.
And now, when we run this, it’s very fast, right? And now, even when we put in all of the columns, it’s very fast, right? So, now, rather than seven seconds, this takes 200 milliseconds.
The cost is, of course, recompiling the function every time that it runs. But when you look at the execution plan in here, rather than doing that big index scan with all of the is nulling on there, we do get the seek predicates of the literal values in here.
So, there are ways that you can, like, if you’re just not allowed to rewrite multi, like, so, think of it two ways. If you’re not allowed to fully rewrite a multi-statement table-valued function, or if your multi-statement table-valued function is simply, like, too long and involved in too many, like, if, then, else, blah, blah, blah, things for you to turn into a single inline table-valued function, there are some things that you can do under certain circumstances to get your multi-statement table-valued function faster.
One of those things is, of course, like, option recompile, if you have the, like, sort of, like, is null parameter column thing going on. And the other is to start pushing, like, predicates that you would normally have outside of the function as things inside of the function.
It’s two ways of getting around that stuff. So, like I said, this was going to be a whole bunch of stuff about multi-statement table-valued functions all at once. We talked about the rebinds and the rewinds.
We talked about interleaved execution. We talked about sequence operators. We talked about all sorts of things that were interesting and fun. But now I’m out of things that are interesting and fun to talk about. And so it’s time to go.
Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video. And don’t forget to look in the video description for helpful links. All right.
Should I say that every time? Probably not. There’s no one over there. Don’t worry. I’m talking to myself. All right. Thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.