Learn T-SQL With Erik: Scalar UDFs
Video Summary
In this video, I delve into scalar UDFs in T-SQL, discussing their potential pitfalls and how they can impact query performance. I explore the differences between non-inlinable scalar UDFs and those that are inlined, using practical examples to illustrate the hidden work these functions can do when not inlined. By walking through an example with a non-inlinable scalar UDF, I demonstrate how it executes once per row, leading to inefficiencies that can be mitigated by adding parameters or rewriting the function as an inline table-valued function. The video also includes links for those interested in my training courses and upcoming events where you can interact directly with me.
Full Transcript
I’ve been listening to a lot of Tom Petty lately. I don’t know why. Anyway, Erik Darling here with Darling Data, joined today by Bats Maru, who has some thoughts and feelings on what we’re going to talk about. We are going to, of course, be doing some more T-SQL learning. These are all little tidbits and morsels from my larger course called Learn T-SQL. I think with Erik might just be Learn T-SQL. I forget how much I involve myself in the course name. But that’s okay. There are more important things like content, accuracy, you know, polarity, things, other things that I care deeply about. But before we go talk about Scalar UDFs, if you would like to buy any of my training, including the full Learn T-SQL with me course, all these links are available down in the video description. You can find all sorts of useful ways to interact with me, both by giving me money for things and by getting things for free from me, like asking office hours questions or liking, subscribing and telling a friend. So look down and you’ll find links for all the useful stuff in the world.
If you would like to have some useful interactions with me in person, the fine folks at Redgate are yanking me from the comforts of home and dragging me to various business class hotels in Dallas, Texas, September 15th and 16th. Utrecht, Netherlands, October 1st and 2nd and Seattle, Washington, November 17th to 21st, where I will be attending various past events and talking about SQL Server stuff, including, but not limited to, two days of T-SQL pre-cons with me, hosted by myself and Kendra Little. So you should come to those because you won’t be disappointed like you would be if you went to any other pre-con.
It’s the tremendous disappointment, not just fear, but the knowledge of missing, the como, the knowledge of missing out that you would feel by going to a different pre-con would just ruin your life. You would be a complete dark downward spiral for you from that day on. You would be able to just mark it on the calendar.
Life, graph, down. If you were making a tapestry of your life, it would just be black thread from that moment onward. Frayed at the edges.
All right. Anyway, let’s talk about T-SQL. Let’s talk about something. Well, we are in dark mode here, aren’t we? Coincidence?
I don’t know. But let’s talk about scalar UDFs. Now, scalar UDFs, of course, if you have spent any amount of time in your life dealing with T-SQL and perhaps Googling, or let’s just say Googling. There’s no one using Bing.
Let’s not lie to ourselves. You will undoubtedly have come across many articles, maybe even some articles by yours truly, about the perils of scalar user-defined functions in T-SQL code. And you may have even read something added to SQL Server 2019 called scalar UDF inlining.
Of course, scalar UDF inlining has many restrictions and limitations put on it that make it so it is ineligible for certain forms of functions. If you would like to explore the full list of that, Microsoft has just about, I think, just about all of it documented. There may be some stuff that we don’t know about, we don’t see.
But, you know, look up scalar UDF inlining. You’ll see the Microsoft Learn or Docs or whatever they’re calling it these days page. And you’ll be able to see the full list of restrictions and limitations.
So what we’re going to do is, since my copy of the Stack Overflow database is currently in compatibility level 160, I am currently using SQL Server 2022. I need to play a slight trick on SQL Server in order for this function to not be eligible for scalar UDF inlining. The trick that I am playing on SQL Server is incorporating one of these non-deterministic date functions in the function body.
So sysdate time, get date, any of those things would have the same effect on disallowing inlining for this code. And what I want to show you first is the estimated execution plan for this. So when we run the estimated execution plan, when we collect the estimated execution plan, we get two things back.
We get back the query plan for the query. And then we also get back an additional query plan that shows us what the scalar UDF would have done. We will not see that with the actual execution plan for this query.
Because SQL Server does two things with non-inlineable scalar UDFs. The thing that immediately affects this is that the scalar UDF is not run once per query, since it is not inlined. It is run once per row that the query has to process.
If we look at this query plan, we’ll see an index scan, we’ll see a sort, and then we will see a compute scalar. The compute scalar is where the scalar UDF is… Rather, the compute scalar is the operator responsible for executing our scalar UDF.
You can see that right here. These are the properties of the compute scalar. And we will see the name of the UDF in there.
So that is where the UDF was executed once per row that the query projected out. All right. So we do not see the actual execution plan for the scalar UDF because it executes once per row.
And if we executed that UDF for a thousand rows, we would not want to get back 1,000 execution plans, would we? No. No siree, Bob.
The second thing that scalar UDFs will do to us is prevent a… Rather, non-inlinable scalar UDFs will do to us is prevent the calling query from using a parallel execution plan. Not that one is warranted here, but it is restricted from using one even if one were warranted.
So those are the two big downsides of non-inlinable scalar UDFs. To prove out that this UDF does, in fact, execute once per row, that query returned six rows. And we have six executions of the scalar UDF in the sys.dm exec function stats view, which only collects data about scalar UDFs.
Ah, I didn’t want that. One way that we can work around that situation with the UDF not being inlinable is rather than have sysdate time in the body of the UDF, we can just add a third parameter to the function. And the third parameter will get used here in place of the sysdate time function.
And if we do that and we pass in sysdate time from outside of the function call, then it will be perfectly inlinable. So if we run this query and look at the execution plan, this will look a little bit different. Rather than seeing the compute scalar with the scalar UDF in it, now we have some extra stuff in here.
And this is where the function was inlined into the query. All right, these constant scans and the nested loops join and stuff. This is where the function, this is where the body of the function ran, but it was inlined into the execution plan.
So now we have this little bit of additional information. If we get an estimated execution plan for this, we no longer have that second plan down here, right? We can only prove a negative so much, but there’s no second execution plan that shows us what the body of the scalar UDF would have done.
And if we go and interrogate the sys.dmexec function stats dmb, it will be empty for the inlinable UDF. All right, so that is one sort of peril of the UDF is when they are not inlinable, they hide some work for you. Of course, the work that they hide from you is only obvious if we look at the use of a function where user tables are touched.
Right, so here you can see in the body of this function, we touch the post table and we touch the comments table. But if we get, and we’ll only, rather, let’s start here. We’ll only see that if we get the estimated execution plan, right?
So like the query plan up here with the UDF in it just shows us clustered index c compute scalar select. The estimated execution plan for the function, right? This is our non-inlinable scalar UDF here.
This shows us two index scans of posts and comments, some parallelism, some stream aggregates, some compute scalars and so forth. And when we execute this query, we get the actual execution plan. We will once again not have a, we will not have the benefit of the execution plan for the UDF.
We will have absolutely no information that it went out and touched the post table and the comments table and scanned the clustered index. That will not be visible to us. All we’ll see is that we scanned the clustered index of the users table, which took zero seconds.
And then we spent 1.2 seconds in the compute scalar. And of course, the 1.2 seconds that we spent in the compute scalar was executing our user defined function once per row that came out of the original query, which for our results was just once, but that one execution took 1.2 seconds.
So think about that if we returned 10 rows, we’re looking at 10 something sec, maybe 12 seconds at that point. If we executed this 20 times, well, what’s 1.2 times 20? It’s a higher number, isn’t it?
So when scalar UDFs are a particularly nasty piece of work, if you are, you know, given the number of limitations and restrictions, we will, we will have work to do rewriting these as inline table valued functions for many years to come. So a really smart fellow named Sam Arch wrote a paper a little while back about scalar UDF inlining and how, I think it’s called like, scalar UDF inlining first start with outlining or something.
And it was actually implemented in DuckDB. And what it did was it took cases where scalar UDF inlining broke like queries completely and fixed them. So it was a very, very cool paper.
So it’s Sam Arch, A-R-C-H. If you have the wherewithal to read a white paper about query optimization, I highly suggest checking that one out. Very, very cool thing.
Anyway, scalar UDFs, you know, you can pretty much tell when someone has used one because they just return a data type like integer or date or date time or varchar max or something. And when you see these in your queries, you better hope that they’re simple and that SQL Server can rewrite them. Otherwise, you’re going to have to hire a young and handsome consultant with reasonable rates to come and rewrite them for you.
Isn’t that a shame? Isn’t that something there? All right.
Anyway, thank you for watching. Hope you enjoyed yourselves. Hope you learned something. And I will see you in tomorrow’s video. I believe we’re going to be talking about multi-statement table-valued functions. And we’re going to learn a little bit about how the T-SQL that goes into those interacts with the queries and query plans that you see presented to you.
When you go trying to tune queries and whatnot. Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.