Another Trick For Working Around Scalar UDF Inlining Restrictions In SQL Server
Thanks for watching!
Video Summary
In this video, I dive into another fascinating UDF inlining restriction trick, this time focusing on scalar UDFs and their limitations when it comes to string aggregation functions like `STRING_AGG`. I explain how Microsoft introduced the SQL Server 2019 feature of UDF inlining, which aims to improve performance by treating certain scalar UDFs as inline code. However, there are restrictions—specifically with XML methods and the `STRING_AGG` function—that can prevent this optimization from occurring. To address these limitations, I demonstrate how you can replicate `STRING_AGG` behavior using XML without breaking scalar UDF inlining. The video also covers practical examples of rewriting scalar UDFs as inline table-valued functions to maintain performance benefits while avoiding the restrictions. Whether you’re looking for tips on optimizing your SQL Server queries or just curious about the intricacies of UDF inlining, this video has something valuable to offer.
Full Transcript
Erik Darling here with Darling Data, doing some Darling Data stuff. Boy, oh boy. My arm’s tired. In this video, we’re going to be talking about another UDF inlining restriction trick. I’ve talked to, I don’t know, I think I have a few other videos about how Microsoft around SQL Server 2019 introduced this new query processor feature called SQL Server. UDF inlining, where it would attempt to take some of your scalar UDFs and basically treat them as inline versions of the code, reducing some of the performance problems that you’d see there. Again, typical performance problems with scalar UDFs is they disallow parallel execution plans from the query that calls them, and they don’t run once per query. They run once per row that the function has to process, which could be way more or way less than you’re thinking of, then based on where the function is in the query. If it’s in the select list, it’s typically the number of rows returned by the query. If it’s somewhere weird, like in a join or where clause, you could have to process way, way more rows to produce a result and then compare it to either another column in the join or a predicate in the where clause. So before we get into all that, though, let’s talk about you and me. Let’s talk about getting serious here. No, not actually serious. We don’t, we, you know. Well, the things we talk about are serious. We don’t talk about them.
in an overly serious manner because you would get sick of me real quick if I was, if I was a bore. So if you would like to say thank you in some small way for producing all of this content, you can subscribe to the channel with a membership for $4. There is a link to do so in the video description. If you can’t do that for whatever reason, because everyone has reasons, you can like, you can comment, you can subscribe, and, you know, you can keep me company in between, in between recording takes. You can, you can, make me a happy fella. If you need help with SQL Server, I’m really good at all this stuff and my rates are reasonable. If you need help with something else involving SQL Server, let me know what it is. My rates will remain reasonable. If you would like some very high quality, very low cost SQL Server training to the tune of 24 hours of it that will last you for the rest of your life without the need to renew a subscription, you can get all of that stuff from me for about $150.
Again, there is a link for that in the video description. It’s a good, good way to spend your money. If you would like to see me live and in person, right now, the only thing on my calendar is Past Data Summit in Seattle, Washington, where I will be co-hosting two days of performance tuning excellence with her excellence, Ms. Kendra Little. So you should, you should come to that. If you can’t make it to that. If you can’t make it to that, but you go to other events and you think, boy, it’d be great if Erik Darling came to these other events, you can tell me what they are and I’ll show up. This is assuming they need a pre-con speaker because that’s, that’s how I pay for some of my flight and hotel to get there. Independent consulting does means I don’t have a company paying for it. That’s not me. So we make the trade off.
Anyway, let’s talk about scalar UDF inlining and all the more restrictions that it has. So the two things that I want to highlight from the scalar UDF inlining docs are here. UDF doesn’t reference XML methods and here. UDF doesn’t reference the string ag function.
There’s another XML thing down here about namespaces, but we don’t care so much about that one. I am always going to laugh at this one though, because CTE, such garbage. They’re so bad. They had to get disallowed from UDF inlining.
Who would have thunk it? Anyway, what we have here is an example of a scalar UDF that references the string ag function. Right. And the only reason that I have this one is because I need to show you that you can replicate the string ag behavior using XML without breaking scalar UDF inlining.
Now, if, now, usually when I’m writing scalar UDF, or sorry, when I’m writing XML, like string concatenation stuff, I’ll do something like this. Because this gives me a lot of type safety and, you know, just gives me like pretty much the simplest version of an XML query plan that you can get. The problem is that using this is the type of XML method thing that scalar UDF inlining gets mad about.
So this dot value clause, right? The dot value thing in there messes that up. So since we can’t use that, we have to change our XML query a little bit. We can’t use for XML path comma type, and we can’t use the dot value thing to produce a scalar value from the XML.
So we have to do these two things like this. I’m just going to make sure that both of these functions are created the way that I intended them to be, since I am the god of the SQL Server. That’s just what we have to deal with.
Now, the first thing that I’m going to show you is something that I feel like I have to caution everyone about when they first start rewriting scalar UDF’s inline table valued functions. And that is that you really have to mind your indexes when you do this. If you haven’t minded your indexes when you do this, you will end up with awful query plans.
So what I’m going to do is I already have compat level set to 150. So if I run these two queries, we’re going to see some slight differences in here. The one down the bottom is the scalar UDF version, where we see the query that calls the scalar UDF, right, which is this thing right here.
And this thing is going to have all of the standard scalar UDF problems. If you look at the properties, we’ll see a non-parallel plan reason. And because I’m on SQL Server 2022, I have the T-SQL user defined function, not parallelizable reason.
If you’re on an earlier version or a non-cloud version of SQL Server, you’ll just see you’ll just see could not generate valid parallel plan. We’ll also have this compute scalar. And the compute scalar will be responsible for calling the scalar UDF over here.
And this will, again, like run once per row that the query returns, right? So good, good, good. We have all that figured out.
Since we’re selecting 5,000 rows out, that function will have run 5,000 times. For the first query plan, we see this big parallel view, right? Look at all this stuff it’s doing with all this parallelism.
Isn’t that wonderful? Isn’t that great? We just parallelize the whole thing. It’s fast. It’s wonderful. Ooh, blah, blah. Really put those CPUs to work. The problem is that we don’t have a good index on the badges table.
So we’re going to end up with this eager index spool over here. And this eager index spool is going to slow things down. If you are replacing T-SQL scalar UDFs with inline table valued functions, T-SQL inline table valued functions, you really need to pay attention to the query plans. Because if your query slows down even more so than it did with the scalar UDF in there, this is the kind of query plan pattern that you have to watch out for.
In our case, it’s pretty simple to fix. All we need to do is add an index to the badges table, and both of those will go away. Or rather, that eager index spool will go away.
So if we look at these two plans now, just the estimated ones, we’ll see the eager index spool is gone from here. All right. So now, how does this work?
How much better is one than the other? Well, we’re going to let these run. And we’re going to hang out and do some stuff. And we’re going to have, I mean, the results are correct for both.
I’ve, you know, I’ve validated that the results are correct for both of these. But if we look at the query plans, our query plan that calls the inline table valued function is at about 1.6 seconds. And our query that calls the non-inlineable string ag version of the function takes almost 7 seconds.
And if you pay careful attention to some of the operator times in here, you’ll see about 3.6 seconds here. And then a little time here and a little time here and a little time here. And then the rest of the time.
So 6.7 minus 3.6 is about 3 seconds even. So we spend about 3 seconds just in the scalar UDF right here. Because remember, this compute scalar is where the scalar UDF gets invoked.
All right. So this is where, oops, I forgot to expand that. This is where string ag gets called in here. And this is where things get messy for our query.
This is what causes the majority of our query problems. So if you are writing or rewriting T-SQL scalar UDFs and the purpose of one was to do some string aggregation stuff and come up with some sort of delimited list of text, and you’re really excited about using maybe the new string ag function and maybe some of the new enhancements to the string ag function like ordering within groups and stuff, it’s going to suck for you.
You can’t do it in a function and still get UDF inlining. You could, of course, rewrite it as an inline table-valued function, but you wouldn’t get the same sort of, if you felt like it.
It really depends on what you want to do. But if you have just a really big, complicated function and you don’t want to, rather you’re unable to rewrite it as an inline table-valued function, having it remain as scalar UDF can allow you to keep using, or rather to use the XML stuff without the dot value method or anything like that and still get good results.
So string ag in scalar UDF prevents inlining, XML stuff does not, and that can really mean the difference between you having good performance and bad performance. So if you asked me what I would do, I would probably just rewrite it as an inline table-valued function where there are really no restrictions on anything, but you just might not be in a position where you can do that easily or do that without changing a whole bunch of code.
Part of the beauty of the scalar UDF inlining feature is that you don’t have to change the way that your query gets called in the select list like you would if you rewrote it as an inline table-valued function.
So there are some upsides to remaining on a TSQL scalar UDF, but having it be eligible for inlining rather than just doing the full rewrite to an inline table-valued function.
So anyway, with that out of the way, me and Bats are signing off for the night. Thank you for watching. I hope you enjoyed yourselves.
I hope you learned something. I hope that you will continue to watch these videos and all that good stuff. So anyway, I’m out of here. Really, I’m leaving.
Get, what is that, Ferris Bueller? Get, go. Go.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.