bit OBSCENE Episode 11: Scalar UDF Inlining

bit OBSCENE Episode 11: Scalar UDF Inlining


Summary

In this video, Joe Obbish and I delve into the topic of scalar UDF inlining, discussing its relevance and impact on SQL Server. After some contractual disputes that temporarily sidelined our Bit Obscene podcast, we revisited the subject with a fresh perspective. I share my initial impressions from five years ago, noting how the landscape has evolved significantly since then. The video explores the challenges and limitations of scalar UDF inlining, emphasizing its fragility and the complex rules developers must adhere to for successful inlining. We also touch on the broader implications for existing codebases versus new development, arguing that while it might offer a free performance win for legacy systems, it’s not as compelling for modern database development practices.

Chapters

  • *00:00:00* – Introduction
  • *00:01:33* – Sponsor Announcement
  • *00:02:46* – Scalar UDF and Compilation Issues
  • *00:07:12* – Reporting Privately vs Publicly
  • *00:10:54* – Query Optimization Rules
  • *00:18:04* – CTE and Inlineable Functions
  • *00:23:05* – Table Valued Functions
  • *00:26:14* – UDF Inlining Constraints
  • *00:29:32* – General Take on Scalar UDF and Inlining

Full Transcript

Hello, and welcome to another episode of the Bit Obscene podcast. There were some contractual disputes between myself and the temporary guest co-host that I stupidly let form a union, but we’re all settled, we’re all resolved now. The union bosses have driven off in their Cadillacs, and we are ready to record again. So with that out of the way, uh, welcome temporary guest co-host and union benefactor, Joe Obbish, uh, to the, to the show. I have to rename Joe in a moment. I just noticed that. And, uh, in this episode, we’re going to discuss Scalar UDF inlining. Uh, maybe, maybe, maybe not the hottest topic on the block, since it is about six years old at this point, but, you know, there’s always, always room to discuss. So here we are. And, uh, Joe, take it away. I’m sure there’s someone upgrading to SQL Server 2019, and it’s brand new technology to him. I did hear we got some complaints about the sponsor. I don’t know if you’ve been able to get the, uh, pistachio toothpaste as a sponsor yet.
Uh, oh, well, would you like to do your proud demonstration? Uh, no, you know what? I’ll, I’ll, I’ll save that. If you would warn me that you were going to make me like use props in this, I would have go gotten, gotten, gotten it. But here we are. So here you are sabotaging me when I’m not sabotaging you. We need to make sure our sponsors get their, get their money’s worth, right? So, okay. So we are brought to you by pistachio poppy toothpaste. Uh, it tastes like pistachios. Uh, I’ve heard that nine out of 10 DBAs say that it improves their, uh, query tuning.
Yeah. Uh, it also makes you a good kisser. So it depends on, depends on your, depends on your priorities there. I remember, uh, what do you remember, Joe? I remember my first impression was scalar UDS. And I think I remember you like being really excited about it and like reading like white papers, like the research papers that were, that really wasn’t for me. Um, uh, uh, uh, I don’t like white papers so much, but we had tried it on a preview version of SQL storage 2019.
Yeah. Yeah. I remember it used to be cool and get all sorts of preview stuff. Yeah. I used to be a big deal. Um, mighty have fallen. And the thing that we noticed was we had a function. It was about like 50 lines long. I think, uh, basically a series of if this return this, if this return this and so on.
Um, and it added 30 seconds of compile time with scalar UDF and lining being enabled. Yeah. So as, as, uh, good SQL Samaritans, we tracked down the, the, uh, scalar UDF guy.
Yeah. Who I don’t remember anymore. And we emailed them.
Power thing. And I, I, I think this was in like January of 2019. So about a year before release. And he said that they had much bigger issues to solve than that, which to be fair. Turned out to be true.
Yeah. Like, I mean, sure. You know, like that could definitely be true. Um, personally, I think 30 seconds of extra compile is pretty bad, but yeah, you know, it’s sure he had other things on his mind.
Um, I think we, I think we followed up six months later and didn’t get a response. And we were like, well, whatever we tried. Um, yeah.
And SQL Server 2019 has glorious debut. Yeah. It’s full release. And some of our customers were going to use it and they wanted our guidance on, you know, should this be turned on or not.
And we found that our 30 second compile issue had not been addressed because, you know, there are more important things to address. And my colleague at the time, who was also a good, who’s also a SQL good Samaritan or a good SQL Samaritan or whatever.
Um, he had like a, I think like a blog post about it, um, which, you know, like as a database professional, I feel that it is good for the community to let them know your learnings and, you know, to maybe issue some warnings like, Hey, this brand new feature might not be quite there yet because it wasn’t, you know, 30 seconds of compile time is a long time.
If Eric and I just stopped talking for 30 seconds to the demo of that, it would be really awkward and boring for everyone. Yeah. Um, so we’re not going to do that. Um, anyway, some overrated consultants took personal offense.
Ooh, at my colleague’s warning and accused him of SQL cloud chasing, whatever. Oh, right. Yeah.
I remember that. I don’t know what that means. I’ve, I, I, I, I, yeah, I don’t know. Well, there used to, there used to be this website called cloud that you could use to, uh, sort of get your social media credit score.
Perhaps that’s what they were referring to. Could be. Um, it was close with a K. Oh, well, surprisingly, he, he, he wasn’t the only one. And there was a bit of like a, like a mob with pitchforks.
Oh, I think. Really? Do they all have, do they all have MVP after their name? Cause if so, they can beat it. It’s, uh, it’s possible.
Um, yeah, I, I, I, I think the critique goes, Oh, well, you should have contacted Microsoft. Microsoft. Right. Yeah.
Let them know. And, you know, they totally would have fixed it. Like, why are you. Yeah. People act like, like that, that doesn’t happen. And it goes nowhere. Like a number of things I’ve like, okay. So like the number of things I’ve content contacted Microsoft about that, they’re like nothing.
Just like crickets on, um, you know, like, like, like when Joe Sack was there, it was different. When Joe Sack was there, it could be like, Hey Joe, this is weird and interesting. Check this out.
And like, it might not get like immediate traction and might not get like fixed in like the next CU. Uh, and to be fair, Joe did backlog me on a couple of things, but like, for the most part, when I was like, Hey, this is weird.
Joe would be like, like, it has an engineer. I’d be like, like that was, that was the, that was like Microsoft action. I don’t know who’s there now, but I don’t know. It’s all fabricating themselves.
Oh, I’m sure if you had an MVP by your name, they would take you more serious. Yeah. It’s possible. Not all of us do. Um, so that was a very weird thing.
I, I had to play peacemaker of all, like me, of all people. And I was like, Hey, um, Hey, Microsoft person who is in swept up. And outrage, we actually like did let, you know, let’s like, Oh, like calm down and be like good partners.
And I think it got fixed in a senior too, which, you know, if, if, if Microsoft adjacent people want to extol the virtues of reporting privately and not publicly, it’s not really a, it’s not really a good, uh, case study.
I also realized that I totally told that story in the wrong way. I could have had like, like the, the, the dramatic reveal, you know, if, if, if, if I had started with the 2019 release, but, uh, Oh, well, we’re, we’re, we’re, we’re not going to record this again.
It’s, you know, no, no, we’re, we’re, we’re, we’re too far into it now for sure. Plus if, if, if, if we, if you get overtime, the union is going to make me pay you double for it.
So I can’t, can’t afford that these days. AI is, AI is taking everyone’s money. I believe I am sharing my web browser. Do you, do you, I see, I see, I, I do think you are doing that now, you know, to be fair, I only gave a first impression, which is like five years ago.
Um, that list has grown significantly over the past five years. Yeah, it has. Cause like, I remember like doing this page a long time ago and I don’t remember it being very long and now it’s like.
Yeah. Much longer. And it’s not even a complete list. You have to go to some KB, which I’m surprised they haven’t like deleted yet. Yeah.
There are some things in this KB that aren’t in there and so on. Uh, so, so the thing that really, I know what you’re going for. Yeah.
The thing that really, uh, sticks out to me is I feel like general developers have like, uh, have like a small bucket that, that you can fill with database knowledge. Some of them have a very, very small bucket.
And you know, like if I can, if, if I can only teach a developer, like a few things about databases per year, one of the things that I don’t want to teach is, Hey, if you want to make UDF, which can be inlined, it’s simple.
You just have to like follow these like 30 rules and then, then, then, then the queries to call it have like 10 rules and these can change over time. So, you know, be sure to bookmark this page and, uh, it’ll be great.
Right. Like, like, like, I want them focusing on like big, impactful, wide ranging things and not, you know, correct.
This, um, is like, you know, even if someone does make a successfully inlineable UDF, which can also be successfully inline based on the query definition, which the optimizer also chooses to inline when it’s compiling the plan.
Like I hope, like I want developers thinking in, in terms of sets, right. Like that, you know, oh, I’m going to arrange my declares and sets and my ifs and my else’s correctly to get inline.
Right. Yeah. But inline tilde-like functions, if I can get a developer to create an inline tilde-like function and not a scalar UDF. You have won a battle.
Yes. Yeah. Yeah. The thing is that like, it’s real hard to get developers out of the, like, I need a loop for this mindset or like, I need to return one value. Like I’m just going to write, like, like it’s real hard to break them out of that.
Um, and like, I, I, like a lot of the UDFs that I see out there are not complicated. They’re mostly created out, like out of some weird convenience. So like, you know, uh, like, like the most common one that I see is like, like date formatting.
And I’m like, why, why, but you, but it doesn’t have to be a scalar UDF. And like, like, like, is that really so hard that you need to like encapsulate it? It’s very, very weird to me.
And even if someone does successfully make one of these inlineable scalar UDFs, you know, someone else might come along and make what they think is a small and consequential change. And they could be violating one of these rules or Microsoft could change one of the rules or one of my, my favorite rule on the list is the one about CTE.
Oh yeah. Cause it’s, I was just about to get there. Or, you know, like the UDF isn’t useful unless it’s being used by, by a query and their rules for the query.
So someone could change the query, like, I don’t know, adding a CTEand suddenly none of your scalar UDFs are getting inlined and performance tanks or this set of rules could change. And remember, like, this actually isn’t the full set of rules.
There’s a KB article too. There’s a really, so like, there’s a funny thing about all this too, where it like go up a little bit.
So like where it’s talking about, like, you can’t use string ag and you can’t build strings like, like with the assignment thing. So like the, but so like there’s, there’s actually misleading in here too, because it says that the UDF doesn’t reference XML methods.
So XML methods are things like value and exists and nodes, right? But if you just use, and so like, if you build a string, like a comma separated string in a UDF with just for XML path without like values or anything else in it, it’s in lineable.
But if you use string ag, or if you use the string builder method to do it, it’s not right. It’s like, and stupid. It’s like, just like inconsequentially stupid.
I can’t, I cannot believe some of the weird stuff that they were like, no, you can’t do that, but this is fine. Like, it’s just weird to me. Yeah.
I feel like some of these are reasonable or you get over it. Like, yeah, some of them just like, okay, you get, you’re getting what you deserve if you’re doing it. And I haven’t done any like research here, but based on how this list is arranged, it feels like the list only gets longer over time.
And there’s never an update where they’re like, hey, good news. You can now use XML methods in your UDFs, right? The UDF doesn’t contain a select with order by without a top one class.
Like, um, it’s all funny. So in terms of new developments. Oh, nothing’s getting it.
It would be nice to, well, like in, in terms of new development by database developers or people writing TC code. I certainly feel like I, I, I can’t get to a place where I would say, yeah, for, for, for, for new development, if you want to use a scalar UDF, go for it.
Cause it’s inlineable and it’s like super easy and convenient and it works the way you want to write your code. Like, no, it’s, this seems so fragile. Yeah.
I think a fair word for it. Yeah. No, it’s, it’s, it’s fragile. And, um, you know, uh, excuse me. Like when I first heard about the feature, this list didn’t exist. So like, I was a lot more excited about it because I, cause like, I mean, like really like, like investment into a problem that big, like it’s a big problem space for a lot of, a lot of people who have like, you know, at SQL Server, like code bases get developed over this point, like 20, 25, 30 years back when no one knew how bad this stuff was or like, there weren’t enough rows to see how bad this stuff was.
And so when I first heard about it, I was like, I mean, like a, like Microsoft invested like serious time in this and it’s like, that’s a crazy problem to solve. But then like, you know, like, like everything else, once, once the, once the fine print starts making it out into the, into the, the documentation, you’re like, oh, well, I can’t, I’m not, that’s not going to help any, like even when I see it kick in, like there’s a good chunk of the time where like, and this isn’t scalar UDF end linings fault, but there’s a good chunk of the time where like you had a UDF, it was slow, but like when inlining kicked in, it got slower because like all of a sudden your plan was full of eager index spools off like giant tables.
And it’s like, like, okay, well, like I don’t know if the optimizer in UDF inlining, like didn’t shake, couldn’t quite shake hands on like, Hey, can you make me look good? No, I’m just going to keep doing the eager index pool thing.
I think the point you’re making is where I wanted to get to next, which is, when we think about the, the tens of billions of lines of old TC code, which are never getting updated. Yeah.
Maybe when you upgrade the SQL service 2019, you know, if you have a hundred UDFs, half of them get inlined, your overall CPU goes down, a lot of queries run faster, a few run slower, like maybe it just feels like a free win in that sense, if no one’s touching the code anyway.
And that does feel like the sweet spot for this feature, which, I mean, you know, don’t get me wrong, making old code better for free. Yeah.
Sort of magically. This is certainly good. But if you’re thinking about all, all the new development you plan to do, this, this really doesn’t seem to be a good fit. Uh, cause looking at the rules for the queries, like, you know, I mean like, like some of these, some of these seem pretty weird or easy to avoid.
Like I’ve like, personally, I know that group grouping sets exist. I haven’t used it in like 10 years. I can go another 10 years without using it.
I think. Yeah. Didn’t you have a question about grouping sets on stack exchange? Isn’t that an early question? Like, yeah. Like ancient history. Yeah.
Yeah. That was very old. Um, or like, yeah, like don’t order by UDF. Okay. Yeah. Fair enough. But I mean, like this one, I don’t know what this means exactly. Cause I did try nesting a few UDFs and it worked.
Like, yeah, I don’t know if this is a return for a start procedure or if there’s a long time ago, I wasn’t sure about that. And I’m pretty sure Paul gave an example of it.
Um, but I, I, I, I, I didn’t follow it. Like, like the code was just so strange. I was like, all right, you win. That that’s, that just is what it is.
Um, but one of these things is not like the others. Yep. Cause what this actually means is if there’s, if there’s a CT in the query, even, even if it’s like doing nothing, you know, select one from nothing, the CT isn’t even used in the query, it could be, it could be simplified away by the query optimizer and it is simplified away that will block UDF inlining a totally useless simplified out CT is enough to block inline.
I am pretty sure that a lot of the UDF inlining rules are almost just like, they’re not implemented. Like, I’m pretty sure this is like T-SQL linting for some of them. Like, I don’t, I don’t think they actually like, look at what the resulting plan would be.
Yeah. Like if, if, if I, I tried to get more information about this and couldn’t, if I had to make a guess, someone had some weird recursive query with a UDF and it didn’t work and, you know, Microsoft wanted to fix it in a CU and the simple fix was okay.
If it has a CT, then no inlining and that, that, that, that was done in CU 11. So it certainly feels like we got away with using CTE for a while until someone ran the wrong bizarre query and now no one can ever use CTE with inlineable scalar UDFs ever again.
Like, you know, we were talking about things like you can add restrictions, but can you please eventually get rid of some of them? Like, well, I mean, the CTEthing. Could this be like, maybe it doesn’t work if you’re doing the recursive CTE or if the CTE don’t reference the UDFs, it’s fine.
This is just like, so broad. No, that’s a, that’s a giant one, but, but I can, I can actually sympathize a little bit because people abuse CTE to the nth degree in like, in like, you know, the, the regular work that I do, like they are just beat up.
So I can imagine cases where like either someone, um, so I imagine like three, three overarching cases that would make it reasonable to do this. One would be sort of your example with the recursive CTE, especially if other UDFs are called in the sort of like the body of one of the recursive parts of the CTE.
The other is just like a bunch of like highly stacked CTE with a rather like, um, maybe it’s like complicated set of joins at the end. And the third is someone like with one CTE that they referenced like 50 million times and all of that stuff would do kind of like the issue you ran into with the compilation time, just blowing up the plan.
So I can absolutely imagine situations where inlining like arbitrary CTE blocks in a UDF would be nuts. Cause remember like UD, UDFs aren’t, aren’t just like, you know, like, like, like inline table valued functions where you have like one simple select that just returns a thing.
People probably had all sorts of crazy procedural code and like, you know, like, like, like maybe could like imagine someone with like 10, like if blocks of code in a scalar UDF and they all have like crazy CTE in them to figure out some values, uh, life would be miserable.
So yeah, I would say in response that there is some kind of compile based check and like, I would like to see that type of thing. Oh, I have a UDF getting used a hundred times in the plan.
It has five joins. QueryAthema isn’t going to add 500 joins to the query. So this UDF isn’t getting in line for, for this query plan. Like that makes perfect sense to me.
Um, I, I, I think you’re being too generous here. Uh, I don’t know, man. It’s like, like, like, like, you know, it’s, it’s one thing to look at this from the perspective of like code we would write.
It’s another thing to look at this from the perspective of like the corpus of SQL Server installs that Microsoft has to deal with as support cases. Like I, like, cause I see some nutty stuff out there.
It’s not anything I would ever write, but I see some real nutty stuff out there. And I can only imagine what Microsoft support gets hit with on some of these where you’re like, Oh girl, like, I’m lucky.
Cause at some point I can say like, like, like that is way too much to deal with. Um, but, uh, Microsoft, you know, that’s a, it’s a paid support ticket there. They, they got to do something with it.
Let’s show speech. It’s true. You almost had 30 seconds. I was, uh, compiling my response.
Um, it’s, it’s somewhat similar, right? Cause I think this is really broad. I think the examples you’re given could be pretty easily included in whatever heuristic they would want to use to make this less broad.
Sure. But instead what we get is see you live in which one even was that like 2020, 2021? I don’t know.
Click on the link. There is no link. Oh, well, I don’t know. Yeah. They don’t, they don’t really believe in links. Um, who writes this documentation? Linkless.
There’s all, I think that’s also a typo in there. All of the following requirement of the execution context must be true. It’s true. You know, if you were a good citizen, you were a good citizen, you would open up a GitHub issue and add an S there.
Not a good citizen. I wouldn’t want to be accused of chasing clout. Ah, well, yeah. Your choice.
This is why you’ll never be an MVP, Joe. Yeah. That’s, that’s why. Speaking of arrogance, I feel like Microsoft did this totally wrong and I have a better design and they should have done it the way I’m picturing.
All right. They should have done is whenever you create or alter the function at like creation time, the engine tries to compile an inline televalued function version and then like saves that internally.
Because if, if you look at how, how narrow this is, it certainly does seem to, to fit within an inline televalued function. Like you only have one return.
If, if, if, if, if you put it, if you put a while loop in your UDF, it’s not inlineable anymore. Like, right. That just doesn’t work. So, and then when you have a query that’s using the UDF, replacing it with a inline function is more of a parsing thing.
Like, is it allowed to actually do the replacement? Cause in, in some like edge cases, you can have a UDF, but you can’t have a sub career for a different reason.
And if, if, if, if, if it’s allowed, put it in there and see what happens. Cause there’s just so many, it’s just like so many weird, like, like, like, like, how are these even like, like, like, like, how do you get to where, oh man, string ag.
Like, well, you can’t use that. I always wonder about that one too. Like, oh man. Like the, the, the, the, the, the technological issues of making get data inlineable.
We, we just don’t have the, the, the technology. Like, I don’t know, maybe this is something like if you’re executing a scalar UDF has get date will return different values as the, as the function that executes, like, is it trying to maintain that behavior?
Do you think? I am, I am like new sequential ID. I understand a little bit, but get date is just a very strange one. That’s the only thing I can think of. Cause if you have a non inline UDF, then that’s, it’s a different context when it executes.
So when you get like different get date values is the, um, so yes, so yes. And I think, um, I, I, I wrote a demo recently about, um, RCSI and UDFs where, um, like things just get extra weird with that.
I guess not, not a good time. Um, but I can, so I can imagine the separate context of get date presenting issues in, in some circumstances, I guess.
Um, if there’s anyone out there who has a scalar UDF and they’re relying on their get date function within that scalar UDF, returning different time values as the query executes, please leave a comment.
Cause I w I would like to scold you, but I’m also interested in, uh, whatever weird thing you’re doing. Yeah. But I don’t know. It just, well, oh no, that would be the opposite of fixing it.
Nevermind. I get, I said anything. It’s forgotten. All right. We can move on. Let’s move on. Uh, well, I don’t, I don’t know where you can move to. I’ve said just about everything I want to say on this.
All right. So, um, uh, yeah, I mean, you know, my, my general take on scalar UDF and lining is I think it was a very good investment in a very hard problem that the, the, the realities of things, um, kind of just killed it a little bit.
Uh, like just to, just to beef up the content a little bit, uh, I’m going to go inside baseball on scalar UDF and lining. So your friend who didn’t respond to you for months about the 32nd compilation stuff, uh, this guy Karthik, I mean, it was like very smart, very nice guy.
No, no beef with him, but the whole scalar UDF and lining thing was, um, was basically like written, uh, down the street from you at the university of Wisconsin and, uh, they, they did all the like Microsoft, like, cause like Karthik’s like in with Microsoft, they have like research stuff or whatever, but, um, like, uh, so Karthik and his team at, at, at, at Wisconsin, like basically built this whole thing with Microsoft source code sitting there and then, uh, like basically stapled it into the engine.
Um, and then, um, when Microsoft got it back, it was up to them to sort of do their millions and billions of unit tests on it. And I think just slowly over time is, you know, like either support cases or those millions and billions of unit tests finished, uh, a lot of stuff ended up getting like, sort of just like cutting this and like, especially a lot of security stuff, um, I would imagine posed a problem, but like, like really like the number of limitations on it, um, you know, really, really messes up the, the scope of usefulness for the feature.
But I do think, oh, like at the end of the day, you know, it was a very good investment in a very hard problem and it does solve a pretty good set of problems for people, but you know, the number of UDFs I see that are still not in lineable or, you know, even if someone’s on 2019 or better, um, cause this isn’t standard edition, even if they’re on 2019 or better, they’re not in a compatibility level where UDF and lighting kicks in.
And so there’s just a lot of stuff that like, is sort of unfortunate, um, that it got gated behind, but anyway, that’s about it there. Do you have anything else?
I do remember hearing about the university thing and I vaguely feel like there might’ve been an opportunity to say hello during the development process, which we didn’t do. And man, if only, if only, if only, if only I had gotten in there, I could have whipped them in the shape, you know, but missed opportunity.
I did try the UDF with get date and query and the value does change throughout the query execution as we thought. So I suppose that get date thing maybe isn’t a technical limitation, but, uh, you know, there’s some dumb ass out there who’s, who’s relying on the old behavior where, you know, I’m getting a new get date every time I execute that, that function and the rest of us suffer.
Yeah. Cause the thing with RCSI was that every time the function executed, it read a different set of row versions.
So it is really like a completely separate context. Didn’t like every, like every iteration of it just starts afresh. It’s wild in there. All right. Uh, do you want to, do you want to call this one and then talk about the next thing?
Final comment. Don’t forget. Uh, nine out of 10 DBAs get inlining more frequently than they use. Disashio.
Based text poppy to a four or five, four, three, four. I don’t know what that does, but it might be funny. All right.
We good. All right. Thank you for watching this episode of the bit obscene radio program. Uh, we will see you in the next episode in which Joe will talk about and interesting insertion problem.
So, uh, we will see you then. All right. Thank you for watching. Well, I want this.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.