A Trick For Working Around Scalar UDF Performance Issues In SQL Server

Mr. Duster



Thanks for watching!

Video Summary

In this video, I share a performance tuning shortcut that came in handy during a recent client project. When working with complex queries involving scalar-valued functions, sometimes the best approach is not to rewrite everything from scratch but to take a “band-aid” route. By temporarily removing these functions and dumping the data into a temporary table, we can significantly improve query performance without losing functionality. This method allows us to run our big, complex queries faster while still applying necessary formatting later on, ensuring that scalar-valued functions don’t hinder overall performance. It’s a practical solution for those of us who need quick wins in SQL Server tuning until more comprehensive optimizations can be implemented.

Full Transcript

Howdy folks, Erik Darling here with the old Erik Darling data. And I want to show you a kind of a funny little performance tuning shortcut that actually came in handy earlier this week. Now, when I’m working with clients, my job is to get them the fastest possible solution to their problem that, you know, that it’s like possible, right? So it’s like, some times I can touch more stuff than others. Other times I can touch less stuff than others. Either way, my job is to get people moving faster in a hurry. Sometimes, like, you know, in a perfect world, we can all take a lot of time to perfectly tune this, that, and the other thing to change, you know, functions and indexes and queries all over the place to get the perfect code. Other times, it’s, we can take what I like to call the band-aid route and we can put a band-aid on it. That gets us to a much, much better place for now until we can get to that wonderful Arcadian pasture that is perfect performance. I know, right? So what I’m going to show you is something that came in handy earlier this week. When I was tuning, it was a much bigger query than the one you’re looking at on the screen. It was a big tangle with, like, a 500-line where clause and all this other stuff.

And there’s a lot of complex stuff going on, even in the select case expressions and whatnot. But what it came down to was that there were a bunch of functions. Now, in the query that I’m looking at, I only have one function, and it’s called initial cap. And it does exactly what it sounds like. It’ll take a string, and it’ll make the first letter a capital, and it’ll make any letter after a space a capital. It’s a scalar valued function. And while we’re all breathlessly awaiting these things not sucking in SQL Server 2019, there are a whole heck of a lot of us who have to deal with versions of SQL Server that do not have the magic of Freud. Now, the real query that I dealt with had a bunch of formatting functions in it.

So, like, there were a whole bunch of things that formatted, like, phone numbers and postal codes and dates and all this other crazy stuff. And I know what you’re thinking. You can’t just, like, use convert to formatted date? No, apparently not. Apparently, you need to scale our valued function to do that. So, rather than sit there and rewrite every single function and mess with logic and results, here’s what I did. Now, I’m going to show you exactly how this reproduces with this query. I’m going to turn on query plans, and I’m going to fire this thing off. And this thing is going to run for around about 11 or 12 seconds.

And part of the problem with the way this query runs is that because of this scalar value function in there, it’s forced to run serially. Now, another problem that scalar valued functions have is that they run once per row, not once per query. We can kind of see that if we look in the query plan and we look at this compute scalar operator.

If we hit F4 here, we can see that this is where that init cap function gets called. All right? So, that’s where we do it. But we do it at the very end. And we only do this over 500 rows. So, this function running 500 times isn’t going to be a big deal.

What’s a much, much bigger deal for this query is that it’s forced to run serially. If we look at the properties of the select operator, we have this non-parallel plan reason could not generate valid parallel plan. This started in the SQL Server 2012 plan XML. This would start showing up in there.

So, if you’re on 2008 or 2008 R2, you’re not going to see this. If you’re on 2012 plus, you will see that if you have some construct in your query plan forcing things to run serially. Now, since there were a bunch of functions in the actual code that I was tuning, what I decided to do was, rather than spend my time and waste someone’s money rewriting a bunch of functions and trying to toggle with indexes and this, that, and the other thing, I just wanted to see how the query ran if I took the functions out.

And that’s what I did in this query right here below it. Is rather than have a reference to the formatted display name, I just have the display name. So, I quoted out the functions and I kept the columns in that the functions were formatting.

And, oops, I hit F5, but I highlighted the whole thing as a professional query performance tuner. I cannot impress upon you enough the importance of highlighting the entire query. So, I highlighted the entire, in real life, I highlighted the entire query.

And I hit F5, and it ran much faster. That finished in 2 seconds, rather than, what, 12 seconds or something? Yeah, it took 12 seconds.

So, we improved this query by 10 seconds, just by not having the scalar function in there. Of course, the reason this query runs much faster isn’t because that function did anything so heinous. It’s because this query was allowed to go parallel.

So, now that we have all this parallelism going on in here, the root query finishes much faster. Now, there are a lot of caveats, and I’m not saying that a query going parallel is always a good thing, or that you should have as many queries going parallel at once as possible, because that has effects on concurrency, how many threads get used by queries, how many threads get used by overall workload.

But, in this case, it didn’t hurt too much. So, what I did is I took that query, and I said, well, you know what? This query gets us close to what we need. We just need to format that data at some point.

And I’m going to tell you something about scalar valued functions. In small doses, they’re not that bad. In small doses, they’re okay.

They’re not too terrible, which I know is not a great thing to say. Everyone’s just going to come up with torches and pitforks and tell me that scalar valued functions are the devil. And a lot of the times, I’ll agree with them.

But in this case, where we’re just formatting data, the scalar valued functions aren’t going out to other tables and doing all sorts of other crazy things. We’re just formatting some data in place.

Well, that still will suck, but we can make it suck a whole lot less if we don’t let it impact our big queries and if we only do it on a limited number of rows later. So, what we can do, and this is a trick that you can try out, is if you have your big queries and lots of scalar valued functions in them, you don’t want to rewrite a bunch of scalar valued functions and have a whole bunch of awful things.

I get it. It’s hard. It’s daunting. But what you can do is you can take your big, mean, nasty, awful queries, and you can dump them into a temp table.

And the dump into a temp table will still take about two seconds to finish. All right. So, that still takes, you know, a couple seconds to get the data into the temp table, but it’s a lot faster than the 12 seconds that we waited for when we just needed to select all the data.

And then after we have data in a temp table, well, now we can just call our functions on what’s remaining. So, now if I hit F5 here, well, that finishes just about as quick as you could want it to finish. So, we still have, like, the old, like, we still have a big query that we have to run and do stuff with, but that query isn’t impacted by the side effects of scalar valued functions.

So, that’s one way to kind of skirt around the issue and then get the data that, get whatever the scalar valued functions need to do at a later point. Anyway, I thought that I would share that with you in video form because I care about you and I care about the health and well-being.

No, I’m kidding. Yeah. No. No. Good luck. Good luck. I wish you luck out there. Anyway, I’m Erik Darling.

Thanks for watching. I hope you learned something. I hope you enjoyed yourselves. And I will see you in some other video some other time. Goodbye. Bye. Bye.

Video Summary

In this video, I share a performance tuning shortcut that came in handy during a recent client project. When working with complex queries involving scalar-valued functions, sometimes the best approach is not to rewrite everything from scratch but to take a “band-aid” route. By temporarily removing these functions and dumping the data into a temporary table, we can significantly improve query performance without losing functionality. This method allows us to run our big, complex queries faster while still applying necessary formatting later on, ensuring that scalar-valued functions don’t hinder overall performance. It’s a practical solution for those of us who need quick wins in SQL Server tuning until more comprehensive optimizations can be implemented.

Full Transcript

Howdy folks, Erik Darling here with the old Erik Darling data. And I want to show you a kind of a funny little performance tuning shortcut that actually came in handy earlier this week. Now, when I’m working with clients, my job is to get them the fastest possible solution to their problem that, you know, that it’s like possible, right? So it’s like, some times I can touch more stuff than others. Other times I can touch less stuff than others. Either way, my job is to get people moving faster in a hurry. Sometimes, like, you know, in a perfect world, we can all take a lot of time to perfectly tune this, that, and the other thing to change, you know, functions and indexes and queries all over the place to get the perfect code. Other times, it’s, we can take what I like to call the band-aid route and we can put a band-aid on it. That gets us to a much, much better place for now until we can get to that wonderful Arcadian pasture that is perfect performance. I know, right? So what I’m going to show you is something that came in handy earlier this week. When I was tuning, it was a much bigger query than the one you’re looking at on the screen. It was a big tangle with, like, a 500-line where clause and all this other stuff.

And there’s a lot of complex stuff going on, even in the select case expressions and whatnot. But what it came down to was that there were a bunch of functions. Now, in the query that I’m looking at, I only have one function, and it’s called initial cap. And it does exactly what it sounds like. It’ll take a string, and it’ll make the first letter a capital, and it’ll make any letter after a space a capital. It’s a scalar valued function. And while we’re all breathlessly awaiting these things not sucking in SQL Server 2019, there are a whole heck of a lot of us who have to deal with versions of SQL Server that do not have the magic of Freud. Now, the real query that I dealt with had a bunch of formatting functions in it.

So, like, there were a whole bunch of things that formatted, like, phone numbers and postal codes and dates and all this other crazy stuff. And I know what you’re thinking. You can’t just, like, use convert to formatted date? No, apparently not. Apparently, you need to scale our valued function to do that. So, rather than sit there and rewrite every single function and mess with logic and results, here’s what I did. Now, I’m going to show you exactly how this reproduces with this query. I’m going to turn on query plans, and I’m going to fire this thing off. And this thing is going to run for around about 11 or 12 seconds.

And part of the problem with the way this query runs is that because of this scalar value function in there, it’s forced to run serially. Now, another problem that scalar valued functions have is that they run once per row, not once per query. We can kind of see that if we look in the query plan and we look at this compute scalar operator.

If we hit F4 here, we can see that this is where that init cap function gets called. All right? So, that’s where we do it. But we do it at the very end. And we only do this over 500 rows. So, this function running 500 times isn’t going to be a big deal.

What’s a much, much bigger deal for this query is that it’s forced to run serially. If we look at the properties of the select operator, we have this non-parallel plan reason could not generate valid parallel plan. This started in the SQL Server 2012 plan XML. This would start showing up in there.

So, if you’re on 2008 or 2008 R2, you’re not going to see this. If you’re on 2012 plus, you will see that if you have some construct in your query plan forcing things to run serially. Now, since there were a bunch of functions in the actual code that I was tuning, what I decided to do was, rather than spend my time and waste someone’s money rewriting a bunch of functions and trying to toggle with indexes and this, that, and the other thing, I just wanted to see how the query ran if I took the functions out.

And that’s what I did in this query right here below it. Is rather than have a reference to the formatted display name, I just have the display name. So, I quoted out the functions and I kept the columns in that the functions were formatting.

And, oops, I hit F5, but I highlighted the whole thing as a professional query performance tuner. I cannot impress upon you enough the importance of highlighting the entire query. So, I highlighted the entire, in real life, I highlighted the entire query.

And I hit F5, and it ran much faster. That finished in 2 seconds, rather than, what, 12 seconds or something? Yeah, it took 12 seconds.

So, we improved this query by 10 seconds, just by not having the scalar function in there. Of course, the reason this query runs much faster isn’t because that function did anything so heinous. It’s because this query was allowed to go parallel.

So, now that we have all this parallelism going on in here, the root query finishes much faster. Now, there are a lot of caveats, and I’m not saying that a query going parallel is always a good thing, or that you should have as many queries going parallel at once as possible, because that has effects on concurrency, how many threads get used by queries, how many threads get used by overall workload.

But, in this case, it didn’t hurt too much. So, what I did is I took that query, and I said, well, you know what? This query gets us close to what we need. We just need to format that data at some point.

And I’m going to tell you something about scalar valued functions. In small doses, they’re not that bad. In small doses, they’re okay.

They’re not too terrible, which I know is not a great thing to say. Everyone’s just going to come up with torches and pitforks and tell me that scalar valued functions are the devil. And a lot of the times, I’ll agree with them.

But in this case, where we’re just formatting data, the scalar valued functions aren’t going out to other tables and doing all sorts of other crazy things. We’re just formatting some data in place.

Well, that still will suck, but we can make it suck a whole lot less if we don’t let it impact our big queries and if we only do it on a limited number of rows later. So, what we can do, and this is a trick that you can try out, is if you have your big queries and lots of scalar valued functions in them, you don’t want to rewrite a bunch of scalar valued functions and have a whole bunch of awful things.

I get it. It’s hard. It’s daunting. But what you can do is you can take your big, mean, nasty, awful queries, and you can dump them into a temp table.

And the dump into a temp table will still take about two seconds to finish. All right. So, that still takes, you know, a couple seconds to get the data into the temp table, but it’s a lot faster than the 12 seconds that we waited for when we just needed to select all the data.

And then after we have data in a temp table, well, now we can just call our functions on what’s remaining. So, now if I hit F5 here, well, that finishes just about as quick as you could want it to finish. So, we still have, like, the old, like, we still have a big query that we have to run and do stuff with, but that query isn’t impacted by the side effects of scalar valued functions.

So, that’s one way to kind of skirt around the issue and then get the data that, get whatever the scalar valued functions need to do at a later point. Anyway, I thought that I would share that with you in video form because I care about you and I care about the health and well-being.

No, I’m kidding. Yeah. No. No. Good luck. Good luck. I wish you luck out there. Anyway, I’m Erik Darling.

Thanks for watching. I hope you learned something. I hope you enjoyed yourselves. And I will see you in some other video some other time. Goodbye. Bye. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.



2 thoughts on “A Trick For Working Around Scalar UDF Performance Issues In SQL Server

  1. Great post Erik! We use this trick, often it’s really effective! Sometimes it also runs loads faster because when we do this we also remove the ORDER BY when we do the SELECT INTO.

Comments are closed.