bit Obscene: Ordered Column Store

bit Obscene: Ordered Column Store


Video Summary

In this video, I delve into the intricacies of ordered columnstore indexes in SQL Server 2022 and 2025, exploring their benefits and limitations. I argue that while these new features offer some advantages, particularly for large ETL loads, partitioning tables by date or another logical key remains a more robust solution for most use cases. To illustrate my points, I walk through a demo using the latest SSMS preview (version 22), showcasing how turning on trace flag 58008 can prevent the creation of unpartitioned columnstore indexes in large tables, ultimately guiding users towards more sensible design practices.

Full Transcript

All right. Welcome, everyone, back to the unofficial maybe relaunch of the Bit Obscene podcast brought to you by Darling Data and BeerGut Magazine. I’ve got my temporary co-host, Joe Obish here. We’ll see how he does this time around. And today, Joe is really excited. You can tell by the look in his face. Look at that face. That’s an excited man. Joe is excited to talk about ordered columnstore indexes, which, correct me if I’m wrong, Joe, that, that, that, that, I’m sorry, I have to get rid of his background. It’s making me look like an insane person. It came up with SQL Server 2022. It’s made maybe some, some progress in SQL Server 2025 that, but no one knows, because SQL Server 2025 is, of course, the unofficial AI and fabric release. SQL Server is billed third on that one.

So maybe there’s nothing good for SQL Server in it. But anyway, take it away. Mr. Robish. First of all, Oh, shoot. Here we go. I learned that my dear friend, Eric is now living under communist rule. And I thought, what’s a small token gesture that I can do to appear like, like I care, and I want to help but not really do much. And I thought, Oh, I can just show up as a guest on his YouTube channel. So that’s why I’m here. I’m, I’m, I’m, I’m, I’m, I’m, I’m in full solidarity with Eric. I’m wearing the plain white t shirt.

He’ll probably be wearing one in a couple of months along with everybody else in New York City. But I, I, I, I’m not here to help Eric. We’re, we’re, we’re all pulling for you. All right. Thanks, Ben. Hopefully, hopefully, my, my means of production are seized in a gentle way.

So for, for order to come store, Eric and I actually both blogged about this a few days of party then back in 2022. And the way I remember it was, that’s right, no clues. The way I remember it was Brent blogs about it. And he was like, Hey, I tried this and it didn’t work. And, you know, Eric and I live to serve. So we, we both took our own little look at it.

But, um, I think Eric can have the links in the description. Oh, yeah. And the viewers can vote on who, uh, did it better. You know, we can get some engagement, maybe. Of course, there’s no, there’s no voting in communism, Joe. So that’s, that’s, that’s, that’s very, that’s a good point. I take it.

Oh, back. Uh, now I’ll admit that I haven’t looked at since. I remember ending the blog post like, Oh, this is CTP 2.0 or whatever. Maybe Microsoft will make it better. Wink, wink. Now they, they like never do. Um, but I haven’t looked at it. I’m assuming it’s not better. If, if they did make it better, feel free to let us know it in the comments.

Now, in terms of what, okay. So why does ordered columnstore exist? So there’s a very old workaround for columnstore where you create a clustered index on the row you want to order by first, and then you create a columnstore index with max. That one second, the idea there being, I was like an, it’s like an unofficial way to get like a, an ordered columnstore after you built it.

That’s where you create the clustered index on the table on some column or key columns. And then you do the clustered columnstore index with drop existing. Yeah. Yeah. Yeah. Okay. And like, you know, like that could be improved. You’re writing the data twice. It’s a bit of a annoying workaround. So I guess like from my point of view, you know, getting rid of that workaround could be a good thing. Um, because now you can just create a, just create a clustered columnstore next with ordering.

Problem is, and this is the thing that Brent blogged about. It’s not always ordered, which, which is not too good start. Um, but like, I’m even gonna set that aside. So I found that when you add ordering, SQL Server is making three changes. First being when you create the index, it’ll add a sort operator. It’s a soft sort. Um, there are all kinds of reasons why the data isn’t going to be fully sorted.

When you’re doing an insert in the table, it’ll again, add, add, add a, add a sort. And then once again, the data might not actually be sorted or it’ll be sorted in an intuitive way. And then when you’re doing a rebuild, it once again, adds a sort. And we’re a three for three and that sort may also not fully sort the data. So that’s what the feature does. Um, my big beef with it is how it changes the, the insert queries, but you had a comment here. Oh, I was gonna say we’re now at three sorts and actually no sorting. Yeah, yeah. So.

All right, I’m gonna ask you a non trick question. You ready? I know what that means. Why are clustered indexes generally useful?

Why are clustered indexes generally useful? Uh, well, uh, there, there are lots of reasons why they’re generally useful. Looking for the simplest, the simplest answer.

Well, it, it gives you a free access to all of the columns in the table in an ordered fashion. Yes. Yes. It’s because the data is sorted, right? The data is sorted globally. If you insert some old data, according to your key, it’ll do the, you know, page splitting and it’ll, the data is always sorted.

Now, what if I told you that we created a new type of index and it was kind of sorted. Sort of sorted. And whenever you inserted new data, it would sort the new data you inserted and append at the end of the table.

And it would, it would, it would keep doing that. So whenever you do an insert, it’ll locally sort that data and append it at the end. Does that sound very useful to you?

Let me, let me just make sure I have the right mental model of this. Let’s say you have a table with a million rows in it and it’s the numbers one through 1 million and they’re in order. And then you insert 10,000 rows to it.

Instead of those 10,000 row, let’s say they’re, they’re also the numbers one through 10,000. So they fit in with the numbers one through 1 million. So instead of the numbers being dispersed in the table where they should be, it would just be like an appendage on the table with the rows in order from one to 10,000.

So it would be like one to 1 million and then one to 10,000. Yes. Yes, that’s correct.

And that’s, that’s the fundamental problem. From my point of view with order to columnstore, like I find that kind of worse than though we’re going to not sort data always for you. Other people could disagree because you know, like it’s, if you have clustered indexes, you have your, your, your AK pages.

They can be split all kinds of great things can happen. You have columnstore. You have your compressed row groups.

Like there isn’t any way for, there isn’t any reasonable way. Like if you’re loading a million new rows, like it’s, it’s not going to find the most appropriate compressed row group uncompress it, add the new data and then recompress it. That that would take forever.

Like it’s just, it’s just not going to do that. Like it’s going to append the data at the end. I’m surprised though that on rebuild, it doesn’t like fix that. It could fix it.

It could. If, if, if your sort manages to work fully. Yeah. Okay. Great. Good. Good. So what, what makes us sort work fully or not? Uh, here’s a number of things, how much memory you have, how much data you have, what DOP you’re at.

Um, which brings me to something, which, man, I, I, I have these great notes that I’m like not reading. So, you know, like one of the, uh, like one of the annoying things about the old workaround to get like pseudo ordered column stores before 2022, you know, the cluster index and CCIs. You have to create the CCI at max top one, which, which could be slow for a big table.

Free the, free the documentation for ordered columnstore. It says, if you create the index in parallel, it’s not going to be ordered. And in fact, if you want perfect ordering, it.

Max top one. Yeah. Max top one. So like, so are you better off just doing the old message, the old method rather? The, the, the problem.

I view the sorts during insert as not that useful or, or even harmful in some cases. And you, you get this partial to full store when creating the index, but like the penalty for that is you, you lose online rebuilds. Hmm.

At least on 2022 and 2025, they’re adding online rebuilds or the columnstore. Okay. So if you asked me if it was worth it, I would say you should just partition your tables, but I’m going to try to make my case a bit better. Um, on this subject of insert.

So let’s say you, you have an ordered comp store index. Cause. You’re in 2022. Yeah. Use all the new features as soon as they come out. Read the documentation.

You’re hot on the heels. Why not? Yeah. Yeah. Why not? Now, if you’re inserting less than a million rows per partition, what’s going to happen? Well, it’s going to sort the data pointlessly. Yeah.

And that’s going to try to compress the new single row group. If you get some kind of, uh, what’s it called? Uh, pressure, dictionary pressure, memory pressure, whatever, you can get multiple row groups. But like in general, like that, that, that sort’s not useful.

Like if it’s, it’s, it’s, it’s, it’s almost like sorting, like a single, like a single 8K page before you insert the data, which maybe it happens, but you know, like, you know, like, like, it’s, it’s not doing anything for you. Now you want me to say, all right, well, what if I insert more than a million rows? What if I insert 10 million rows of time?

Well, It’s an ambitious insert. First of all, if you, if you, if you have like, you know, like, like a normal ETL load where you’re like, like, like a normal ETL load. Like yesterday’s data, it’s going to be like new data probably.

Right. So the thing you want to order by is probably for like yesterday’s day. So it’s, it’s not gonna matter in that case, or like, let’s suppose you’re loading a week of data at a time. Cause you’re, you know, you, you, you, you live under communism.

You’re not going to work every day of the week. Right. Right. So in that case, you, you, you have a week of data, 10 million rows. So it’s, if it’s ordered, then your row groups have like one to three days of data instead of seven.

And if you happen to run a select query, which is like, you know, looking at a single day in that range, then maybe you read a couple of fewer row groups. So it’s, it’s not, it’s not, yeah, maybe like, that’s not really giving you a whole lot in terms of performance. No, but I think, I think that gets, I think that gets back to your point though, about if, if this is something that you care deeply about, regardless of ordered columnstore.

Like, but if your table is truly that large and you’re truly, truly doing that sort of ETL with like what you’re talking about where there’s daily loads. And I think that’s, truly the most sensible thing to do is have that table partitioned by some date that, that, that gets you like, I mean, maybe, maybe not like from a data loading perspective, but from like a, like querying that data later perspective. That at least gets you the partition elimination and then segment elimination based on, you know, the criteria for your query, looking for data for some date range.

I agree fully. And I’m going to jump into my demo ahead of time. Ooh, my, my, uh, four notes.

They’re just, they’re just basically just, you know, the, the, a cast of Erik Darling is just down the memory hole. Yeah. All right.

Can you see the beautiful modern SSMS? And I can’t, is that 21 or 22? It’s the newest one. That’s SSMS 22 preview. It’s glorious.

It’s glorious. Look at it. Oh, it’s not 21. 21.6.1. Wow. I’m already out of date. I guess. Let’s do that. Check for an update too.

All right. All right. Um, I, I, I, I, I, I, I, I hear this stuff. All right. So what’s that trace flag? It looks, it looks, it’s that trace flag, Joe. Microsoft has heard my, my, my desperate please.

Yep. On the subject of partitioning is super important for columnstore. So what was, so why do I say that?

It’s the same reason you said it. If you, if you, if you partition logically, then that’s going to force all the word groups together. It, it, it basically gives you like, like, like, like a minimum sort of segment elimination free.

Like no matter how badly the developers load the data. If it’s a row at a time, it’s a hundred thousand rows at a time. If, if, if they don’t load the data correctly the first time and they delete it and update over and over again, they just still in that partition.

Even if you don’t get a partition elimination, you can still get segmentation. I mean, I’ll go further and say that I’ve never had a columns or forms issue get resolved by, well, you know, we, we had bad segment elimination because the data wasn’t ordered.

So we were reading like, we were reading like 20 row groups and then we made the segment elimination perfect. And now we’re reading three and performance is like amazing.

Cause we, we, we got rid of those 17 compressor groups. Like I I’ve never seen that. The things I’ve seen are you have a big table. It’s not partitioned at all.

Data is basically in a random order. You know, you can get hurt there. Um, or you have like a billion soft deleted rows. I never got cleaned up.

I’ve seen them production, you know, that’ll hurt you. Or even just like, like all of the normal performance issues that you can find with really any query.

Like for me, it’s, it has never been like I’m partitioning my data by quarter or whatever, but well, my data is too unordered within the three month window. And like, I really need that perfect segment elimination.

Like I’ve never seen that. Maybe there’s somebody with high frequency trading or online gambling, or who was like super hardcore. And that they really need those perfect pristine row groups.

In which case I would think that, you know, a sort that might not start isn’t good enough. Definitely not. And you, you gotta fix your UTL anyway.

Anyway, I’m a big fan of partitioning and Microsoft has heard my fleet, my please. They gave me permission to introduce this new trace leg 58008. We go ahead and turn it on.

All right, it’s turned on. And now I’m going to try to create a columnstore index, a big table, which I’m creating an, an, an unpartitioned index.

I’m trying to, that’s, that’s, that, that, oh, look at that. This is not safe for children. Look at that. The operation failed because the table is too big. Like, create a partitioned commister index instead.

Finally, SQL Server is finally stopping us from, from making dumb mistakes and bad decisions. Isn’t this great? This is fantastic.

Now, you know, I’m sure, you know, like, now, like, you know, maybe your DPA thinks they always, they always know better or, you know, like, it’s, it’s important to be able to, to, you know, have full control of the software.

So we had this new create bad index mode, uh, option. And if we, if we set it to on, then, uh, we can, uh, avoid Microsoft’s guardrails. So I, I’m really excited about this new feature.

We can finally, you know, encourage people to partition their damn tables, which I think is really, it’s probably like the, I’d say it’s the most important thing if you’re doing columnstore at any like reasonable size, you got those deals partitioned.

Great performance, ETL, maintenance. There’s just so many reasons to do it. And, uh, I’ve seen what happens when you, when you don’t do it and it is not, it is not pretty.

Yeah. I mean, uh, you know, I don’t, I don’t do a ton of production work with, uh, with column store. I do a bit, but, um, you know, most of the, most of the tables that I run into that are, are, that are column stored are well beyond like, you know, my ability to, um, have my consultant way with them and like, maybe go back and partition them or, you know, do anything useful in that way.

Um, you know, I wish, I wish more people would involve me in like the implementation of something rather than like the, Hey, all of a sudden this sucks. I think I’m almost like, well, guess what? At 3 billion rows, we’re a little past the point where we can easily do this. So, um, you know, not, not undoable, but certainly, uh, what makes, makes the experience a little bit more painful.

But, um, you know, a lot of what I run into is, um, people who have, uh, you know, like row store partition tables, cause they bought into the, the meme that, uh, you know, partitioning tables makes queries faster with Rome, with row stores indexes.

And boy, are they surprised when they start doing things like men queries and max queries, and even some, like, you know, some other, some other types of queries, all of a sudden they, they, they can’t just like, like start at one end of an index and get something like they, they, all of a sudden they have to scan every single like index and like the, like all of the partitions.

And you’re like, again, like, like, why didn’t you call me like, you know, five years ago when you first got this bad idea in your head. So for the, for the folks out there who have perhaps large columnstore tables implemented, but did not have, uh, the foresight and certainly, uh, at this point may or may not have the high hindsight to, uh, to partition their, their large clustered columnstore tables.

Uh, is there any hope? Is there anything they can do to make their lives better or easier? Or do you just recommend creating the partition columnstore table and loading data over and doing the old SP rename switcheroo?

I don’t remember how I did cause it was so long ago, but we had that same problem where we had like, you know, we had 300 customers at the data warehouse, nothing was partitioned. Some tables had billions or tens of billions of rows.

And we did ship the code to make that happen. Um, how did we do it? I don’t remember how we did it.

Uh, I, I, I think we just did the usual scam of, oh, you know, you’re now in a maintenance window and we’re going to rewrite all of your data. Um, I think we had like a lot of parallel jobs doing the inserts.

Sure. There is, there is some trickery there. It, it, it definitely wasn’t easy, but it’s, you know, if you want to make the best long-term decision, you gotta have partitioning.

It’s only going to get worse the longer it goes on without partitioning. Um, in terms of this new feature or columnstore, I mean, maybe it’s better than nothing. You know, there probably are some use cases where it’s okay or bar than status quo.

I do think it’s kind of fundamentally flawed and that, you know, as we talked about your word groups already compressed, you’re not going to uncompress them to get perfect segment elimination.

And that’s just like an unsolvable problem pretty much like, you know, the, the, uh, new data you load at best. It’s going to sort that, but it, it, uh, might not even do it. You know, you might get a partial sort, or if, if, if, if you’re doing parallel insert, it’s just, it’s just not going to work in the way you’re expecting.

That’s covered in the, in the blog post I did. Yep. So I’m looking, I’m looking at the 2025 notes here. And, uh, apparently Microsoft has really expanded its repertoire of, of 75% complete features by adding ordered non-clustered columnstore indexes.

So we’re very excited for that. Aren’t we? Yeah. Yeah. That’s a big, that’s, that’s, that’s going to change, change the data landscape forever.

I was actually trying to fathom like why order to columnstore even exists. Uh, I do remember hearing a rumor about the, the, uh, SQL variant data type where like some big, important customer gave Microsoft a really big bag of money and then Microsoft implemented the SQL variant data type just for them.

And all of us could depend on it. So like maybe it was that, uh, maybe there was some customer who was like, Hey, you know, we have, we got this big problem.

We really need to order to come store. And they managed to convince them with, without the big bag of money. Yeah. Maybe I have a third theory though, without, without any factual basis, what’s the thing whatsoever, but I feel like it’s my favorite kind of theories.

So I’m just going to throw it out there. Like they’re like, really there, there, there are like two things that, uh, DBAs love. Right.

First one is a good clean demo. Second one is attempting to remove fragmentation. And the problem with calm store is it offers you both. Cause like, you know, like, like what’s the easiest possible demo to do with calm store?

Oh, I load my data. Like, uh, like a total dumb ass. And now I’m not getting any segment of elimination. I can, I can, I can view the little print screen and it says segment with eliminate zero segments red 99 or whatever.

Yeah. Then I heroically load my data the right way. And now it’s ordered.

And now, you know, the, the, uh, little numbers and the segments, you know, I’m, I’m skipping segments, doing less work. Like, you know, like I’ve done that demo. You’ve done that demo.

Like everyone’s on that demo. It’s really easy to do that demo. Yeah. You see all these blog posts talking about how great and easy and important. The elimination is nobody talks about partitioning. So, so like, you know, like, like, like the, the important thing, do the ease of demoing it, become a single elimination.

And then there’s just anything, right? Who doesn’t like anger or fragmentation? How many people are still rebuilding all their indexes every night?

Too many, too many, too many Joe. You gotta get rid of fragmentation. Right. So I feel like there’s this way too much importance on that subject. Um, I mean, I, I already said, practically speaking, what I think people should do.

I think partitioning is, is way more important and, you know, like it’s, it’s, it’s even worse. Cause like, I don’t know, like I I’ve seen playing questions like what comes or indexes aren’t ordered. This is totally useless.

Microsoft needs that ordering. Like I’ve seen that come up a lot. And, you know, like, like maybe finally Microsoft listened to the community when they shouldn’t have. And now we had this super half baked feature that can like never truly work, which source the data sometimes and removes online index rebuilds and ads point in the sorts.

And, and so on. Um, you know, it, it, pointless sorts are like, I don’t know. I think a good chunk of the SQL Server source code.

Um, so in, in, in terms of what I want, I already showed what I want the, uh, magic new trace lag. So that’s, that’s great. Look forward to that.

Um, I kind of feel like maybe like, uh, you know how there, there, there’s the sword and, uh, temp TV option, which only applies during your next creates or rebuilds. Yep. I think that would have been better.

Like, Hey, you know, you, you’re creating columns or index, like say which court, which column you want to order by, it’ll make a. Attempt to do it. Mm-hmm . And it’s not going to fundamentally change every insert and it’s not going to remove online rebuilds because it’s just something that happens like once during creation. Mm-hmm .

And, you know, if you’re the kind of guy who’s like rebuilding your indexes every night, well, it’s going to be offline now. Like maybe that works for you. Maybe it doesn’t. I don’t know, but.

At maxed up one. Right. Yeah. Like, I don’t know. I, I don’t think. I don’t think this is a very good feature. And if, if you read the documentation, like there isn’t anything even about inserts, you know, it’s just all like, Oh, we’ll, we’ll sort the data and, or maybe we won’t, but even partially sorted data will improve query start performance.

Like they make it sound like a no brainer and. I, I, I, I just think it’s, it’s, it’s, it’s fairly flawed, you know, do the whole. Once again, we’re not going to uncompress row groups and, you know, slice and dice and surgically insert things.

No, I, I honestly like this should be, this should be one of those things where like when you use it. Uh, like. So there, there are other database systems where like you can.

Uh, like create a table and partition it all in one go. Right. Like, like SQL Server, you have to like create the partition function and partition scheme other databases.

You can be like create table with like, and like has like this, like with partition as syntax type thing, which is like, like when you write a window function and you’re like partition by blah, blah. Like you can write the table is like partitioning by that. And you can specify like the column and like the partitioning like ranges for it.

Uh, I think that. With ordered columnstore. Uh, like if it’s not on a partition table, like it should force you to choose a, like, like whatever columns you’re ordering. By it should force partitioning on those columns.

Like, I don’t care. I don’t, I don’t care if it like, like, I don’t care if it makes a stupid choice doing that, not doing that as a stupider choice than any choice you could make in that. Like just.

Bucket things somehow. Like just like why. That’s interesting. I mean, I don’t, yeah, you know, it’s. I think they’d be scared to try to pick partitions for you because you know, of course they would, but yeah.

But yeah, no, it’s, it’s definitely a good point. Possibly even superior to my, you know, you can be able to trace that to tell you when you’re being a idiot or not. So how did, how does that trace flag decide what a too big table is?

That’s, that’s, that’s a, under NDA. So, okay. It’s proprietary. All right. Gotcha. Gotcha.

Well, maybe, hopefully someday, uh, you’ll, you’ll be able to write anonymously on, on some blog about the thresholds for, uh, when a column is too big. Well, I mean, so, so, so the great thing is if I violate NDA on your blog, everyone should think that you wrote it. So I want to get in trouble.

That’s true. I won’t get in trouble. I won’t get in trouble. I don’t have an NDA. What NDA do I have? Oh, okay. All right. Can’t sue me if there’s no paperwork. Okay. All right. That’s, that’s, that’s, that’s the defense you’re going with, huh? Yeah.

Okay. I was drunk when I wrote it. I was just guessing. Uh, I didn’t sign any paperwork unless I was drunk when I signed the paperwork. So make your own decisions there. Well, I mean, I understand why the NDA is certainly annoying.

I remember I was once told by, uh, uh, Dr. Now that, you know, here’s some information that’s under NDA, but I already knew it. Yeah.

But you’re gonna tell me something I already know and now it’s under NDA and I can’t tell the people. No, like I already, I already had that information. I already knew it. That’s why you have to, that’s why you have to blog about everything.

Yeah. Believe it or not, we, oh, let’s see what you did there. Yeah. Believe it or not, we can figure out things on our own sometimes. So what, so the real trick of what to do is to write the blog post, but put this, the posting date is like a month before anyone said anything to you and then publish it and then be like, well, when did you tell me that? Oh, such and such a date.

Well, this post was published a month before that. It’s not covered by NDA. I bet everyone who’s watching this is learning a lot, getting a lot of good tips. Look, there’s one thing I’m good for.

It’s sound legal advice. So that was, that was all my complaints about, or account store. Um, I don’t think it’s very good. I think you should partition your tables and that’s going to do way more for you.

So, but I mean, you know, maybe there’s some workload out there where this is the best thing since that communism and they’ll get a lot of value out of it. Well, I think the only thing that we can say about this feature that it truly is for the people, it’s free kind of for $7,000 a core. It’s free.

Maybe not. All right. Yeah, I’m gonna, I’m gonna have to workshop that one. But, uh, I don’t, I don’t have any specific complaints about ordered columnstore. Uh, you know, my, my complaints are far more generalized than that, like, you know, like, okay, you’re gonna spend development time and cycles on another feature that’s like, you know, 60, 70% done.

And like, try to get us excited about it. And then, you know, they trot poor Bob Ward out to another conference and everyone’s like, Oh, like, what are the internals like get the debugger Bob. And then poor Bob has to sit there and be like, there’s no, there’s no debugger for it.

All right. Sorry. Like, you know, it’s just like one of those, one of those things. It’s like, we’re supposed to get excited about like this, this, this thing that, that, that is like not solving a problem for 99% of the SQL SQL Server. So I think the analogy you wanted before was, uh, you know, think about a, a, a parallel rebuild of an order to come for index.

You have all these individual threads. They’re all working together, but by the end of it, they haven’t accomplished anything. Just like communism.

Man, you’re good. You’re going to get me like flagged in the EU with this one. I’m, I’m going to, I’m going to get, I’m not going to be a big fan of the EU. You’re going to be allowed to travel internationally when, when this, when this goes live.

No, it’s, it’s the, you should have it go live after you’ve, uh, fled the earth. Uh, well, maybe, maybe Elon will take me to Mars. I can open up a saloon somewhere in a gully or that’d be nice.

Probably not have a lot of demand for SQL Server consultants on Mars. But there will be for saloons. Cause it’s me, the new wild west, nice place.

You can, you know, tie up your robot horse, buggy thing, go in for a nice cold beer, Mars, Mars beer, and take a load off. You know, we would be like just the ultimate. We’ll have one of those pianos that plays itself.

Be great. Got a whole vision here. Okay. No, my original, my original plan to open a bar on the beach in Bora Bora. I think that market is saturated at this point.

So I’m going for a saloon on Mars. I wish you well, you know, I don’t drink as you know, it’s true. So I’m not going to support you.

That’s why you’re going to be the perfect bartender. I can trust you not to drink my whole supply of Mars beer. Unless you know, Mars beer is just so delicious.

You change your mind or you get bored on Mars. Then again, you’re not, you’re not bored in Wisconsin. So you can, I don’t know. I don’t know where you could get bored. Sorry.

Did I give away too much personal? Oh yeah. I actually thought you stopped recording a long time ago, but apparently you’re still going. Oh no. Do you want me to stop recording? Yeah. Oh, all right. Cool.

All right. Well, thank you everyone for tuning in to the mildly resuscitated bit obscene podcast with my temporary cohost, Joe Obish. As always, this is brought to you by Darling Data and the kind folks at Beer Gut Magazine who supply me with stickers and magazine covers.

So thank you. And I’m going to hit the button now. All right. But I don’t know the beauty of the light in the internet. Now, I mean, we can do continues. Coming.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 42

SQL Server Performance Office Hours Episode 42



Questions:

  •  My developers overuse table-valued parameters in application code. Are there any server/database settings I can change to make the queries using them run faster? I know about in-memory tempdb and memory-optimized table variables, but I don’t have latch contention issues so I doubt that they would help.
  • why aren’t developers good at database performance yet?
  • Why can’t you make an ErikAI to answer these questions?
  • Do you have any training content specifies performance tuning efforts and limitations for SQL Server Standard Edition? I get excited to see posts from you and the community around Batch Mode in Rowstore only to get let down by licensing limitations yet again.
  • Are you doing a Black Friday sale? (This was a blatant lie.)

To ask your questions, head over here.

Video Summary

In this video, I delve into answering five community-submitted questions during an office hours session, providing insights and advice on various database-related topics. We cover issues like table-valued parameters in application code, performance tuning efforts for SQL Server Standard Edition, and the limitations of licensing. I also touch on upcoming pre-conference events at Data Tune Nashville and Data Saturday Chicago, encouraging viewers to purchase tickets early. Throughout the session, I emphasize the importance of regular practice in database performance tuning and share my thoughts on why developers might not excel in this area due to time constraints and project deadlines.

Full Transcript

Erik Darling here with Darling Data. It is Monday in YouTube. Actually, it is Monday in regular land, too. Which Monday it is, you will never know. It’s between me and my Monday. Anyway, we are doing office hours in which I answer five community submitted questions and hope that I can give good answers at a reasonable rate. They’re free. Shut up. If you want to ask your own question, you can head down to the video description where these helpful yellow fingies are pointing. And there’s a link there where you can go and submit your questions to me. If you would like to participate in some sort of monetary exchange, there are ways to do that as well. You can hire me for consulting, buy my training, sponsor this channel, all sorts of other stuff. And of course, if you enjoy this content, please do like, subscribe, and tell a friend. Because that’s how the good word spreads, my friends. Anyway, coming up. Of course, by the time you see this video, Past Data Summit will actually be occurring. It’ll be actually Monday. I’ll be teaching a pre-con. Isn’t that wild?

So I should probably start promoting other stuff that has recently been acceptified. I will be doing a pre-con at Data Tune in Nashville. That event is taking place March 6th and 7th. And I will also be doing a pre-con the very next weekend. Data Saturday, Chicago, March 13th and 14th.

So buy those tickets now. So I can gloat about people buying those tickets now or something. I don’t know. Well, it would just be nice of you to do. Just buy the tickets. It’s great. Anyway, let’s get on with this whole database party here.

I need to go to the Excel file that has the questions in it. And let’s make sure there are, let’s see, one, two, three, four, five questions. And let’s go answer these.

Did it, did it. What do we got here? My developers. Oh, your developers. Oh, all right. See the way it is? My developers.

Let me tell you this. If they’re your developers, tell them to stop using table-valued parameters if they’re such a problem. All right. Not to spoil the question too much here. But my developers overuse table-valued parameters in application code.

Are there any server or database settings I can change to make the queries using them run faster? I know about in-memory TempDB and memory-optimized table variables. Oh, boy.

But I don’t have latch contention issues, so I doubt that they would help. Boy, are you, boy, are you right there. The problem with this question, though, is you haven’t told me what’s slow about them. What’s, what’s going wrong?

Some details would be nice. There are not really any server or database level settings that, you know, wouldn’t already be turned on if you’re in a position to have them that wouldn’t be helping you out. So I don’t really know what to tell you here.

Tell me what’s slow about them. Better yet, click on, click on that consulting link. And then we can do a fully detailed investigation into these table-valued parameters and figure something out for you.

I don’t know. This level of vagueness just can’t be, can’t be overcome. Why aren’t developers good at database performance yet?

Well, because they don’t practice it. They develop features. They need to get something working quickly because they’re under, you know, whatever, you know, time constraints and deadlines they have to deal with. And they have to make something work.

It’s the same reason why developers aren’t typically good at database security or security in general. It’s why, like, every time you read, oh, we found a database on the internet with no password. Surprise!

You know why? Because someone just had to get something working. Security gets in the way of stuff working, right? Like doing performance tuning. Oh, gets in the way of just making the thing work, right? It’s like, oh, I figured out how to do it.

I don’t know if it’s very fast, but it works on my machine. You don’t pry. Unless you are investing time and effort into learning and practicing a skill, you are never going to get better at it. That is true of any mental or physical endeavor in life.

If you are not regularly engaging in that endeavor, you stand no chance at progressing in that endeavor. Let’s see. Why can’t you make an Eric AI to answer these questions?

Have you seen the cost of training a model? Right? I mean, like an MCP server ain’t going to cut it. There’s not a lot of funding rounds here at Darling Data to get into an AI training situation.

So that’s way out of the question. And I don’t know. I always assume part of the charm was me showing up with my alacrity and effulgence.

And effervescence and answering these questions. From wrong, you just want AI to do it. You want a chatbot.

Well, I don’t know. What else? It’s funny because like LLMs are basically an MLM, right? Because like, I mean, just like the industry in general at this point is an MLM.

But like the people making money off LLMs are like selling prompts. Like you’re selling me a question to ask them? Like you’re just like, oh my God.

Anyone who does that is an absolute scavenger. All right. But that’s why. All right. Because Eric AI would be too expensive to train. Do you have any training content specifies performance tuning efforts and limitations for SQL Server Standard Edition?

I get excited to see posts from you in the community around batch mode and rowstore only to get let down by licensing limitations yet again. No, I don’t. Because 90 something percent of the time performance tuning with Standard Edition is exactly the same as performance tuning with Enterprise Edition.

You have some additional restrictions and limitations around hardware and what is automatically available to you. And even some restrictions on, you know, things like batch mode where it is limited to a DOP of 2 in Standard Edition. But most Standard Edition performance tuning is no different from Enterprise Edition performance tuning.

There is just not a whole lot that you have to do differently. You know, you work within your restrictions and limitations and you make the best of it. You know, a lot of the training that I have would work just fine on Standard Edition.

There’s nothing specific about it. There’s nothing specific enough about it, especially since Microsoft gave up on some of the earlier restrictions that existed with it. Some of the programmability stuff with 2016 SP1 that became available in Standard Edition.

There’s really not a whole lot too, too different there. Microsoft was even kind enough to give UDF inlining to Standard Edition. So, you know, there are some, of course, some additional considerations.

But, again, my rates are reasonable. Final question of the day. Oh, boy.

Are you doing a Black Friday sale? No. My training is already priced so that normal people in the world can afford it. I already bake a lot of coupons and discounts into stuff.

You know, I don’t want you to need, you know, a payday loan or a stipend from work in order to buy my training. I want lots of people to be able to buy it and benefit from it. So I don’t price it at a point where, like, a Black Friday sale would really get a lot more butts through the door.

So, you know, right now, no, I can’t see a reason to discount things further. You know, I do have a family to feed and all that. You know, I have actual children.

You know, actual life. There are people, like, you’ll never see on camera who I have to take care of. So, no, I can’t see a Black Friday sale being a very provocative. I can’t see too many provocative incentives in me knocking stuff down further.

So it already sells fine at the very low and reasonable prices that are set for it. So, no. No.

I’m not embarrassed by my prices. So, anyway. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in tomorrow’s video, which, if I have things right, will be a very special revival video of sorts. So stay tuned for that.

Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Dynamic SQL Insert Bomb

Dynamic SQL Insert Bomb


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

T-SQL Inline UDF Plan Caching and Reuse in SQL Server

T-SQL Inline UDF Plan Caching and Reuse in SQL Server



 

Since it’s also Christmas, enjoy my favorite Christmas song.

Video Summary

In this video, I delve into the nuances of inline table-valued functions (inline UDFs) in SQL Server, focusing on how they differ from scalar and multi-statement table-valued functions when it comes to parameter sniffing and plan caching. By demonstrating with an example, I show that inline UDFs are automatically inlined into the calling query, which means traditional methods of tracking execution plans using extended events become less useful. Instead, I highlight how running queries with `OPTION (RECOMPILE)` can yield multiple execution plans, providing a more dynamic and realistic view of performance under different conditions. Through this exploration, I aim to help you understand when and why certain query plans are chosen by SQL Server, offering insights that can lead to better optimization strategies for your database operations.

Full Transcript

Erik Darling here with Darling Data. In today’s video, we are going to finish up our amazing, thrilling coverage of various user-defined function parameter sniffing and plan caching scenarios in SQL Server. And we’re going to finish up by talking about the type of UDF that I generally prefer, which is the inline table-valued function. To put it briefly, it is the inline UDF, the actual inline UDF. If you like this type of stuff, well, I do this type of stuff all day. Occasionally, I do get paid for it, too. If you would occasionally like to pay me for it, you can hire me for consulting. You can also buy my training, become a supporting member of the channel, ask me office hours, questions, and if you enjoy this type of stuff, please do like, subscribe, and tell a friend.

Past Data Community Summit, geez, closer by the day, ain’t she? Of course, Seattle, Washington, November 17th to 21st, two days of T-SQL pre-cons, the likes of which you have never seen. It’s a great sell line presented by me and Kendra Little. I’m not sure if that’s a great sell line. You should do, what was one from Dune? We got T-SQL pre-cons the likes of which God has never seen. God has not seen? I forget, forget the exact quote. I’m a terrible nerd these days.

but anyway you you go you come you show up you learn things it’s good for everyone anyway let’s talk about inline udf’s here so i’ve got an inline table value function here denoted by the fact that the the returns portion of the function body just says returns table right we do not return a table variable we are not returning a data type we are returning the result of a select the select that we’re returning the result of is exactly what we’ve been doing in prior iterations where we’re just doing this whole thing right so we’ve got that going for us now the first thing that i want to say here is that inline table value function the reason why they’re so different is because the code from them is automatically inlined into whatever query you run so this extended event that we’ve been using to track other executions is no longer useful to us right so if we run this query and we look at the query plan right uh we will see that we got this query plan back right but we actually got the query plan back here right with with other types of udf’s the plan for the udf is hidden away from you and but like for for this we don’t get anything back because the filters on this are for where we want the query post execution plan for the object name of co that contains cohort score and even though the name of our udf does in fact contain the name cohort score uh we we it does not execute as a separate object the the the query inside of this is like inlined into the the query that calls it right so we don’t have to worry about this extended event anymore we can close that out and please god don’t crash ssms please god don’t crash thank you always touch and go with these things but let’s run all three of these with option recompile just because um you know i talked about how um the other other versions of this i needed to make specific versions of the functions with the option recompile hint to see different query plans and write big loops and everything but this one we just naturally get three different query plans with option recompile and the end of the query right this one up here uh it’s a parallel thing we scan an index we scan an index we do some hashing this one down here has a bit more going on in it uh you know we’ve got some compute scale hours and we’ve got a seek and then we’ve got this parallel zone in the plan and then this one down here uh sort of a weird mix between the two except this one uh the sorry the second query i wish this thing would like hold its framing a little bit better but uh this one here uh is just like uh index seek loop uh loop index seek you know we’re done or rather index seek loop loop index seek loop index seek this one down here is of course uh index seek loop uh hash join down to the post table so we got three different plans with the recompile hint right there and we could see the plans without having to use extended events to track what’s going on inside the function because with other ones we wouldn’t see the actual execution plan for the function without the extended event so we’re getting somewhere yay uh so if we uh just because we’ve done it every single time we run dbcc free proc cache and we select the top one here this is kind of annoyingly slow and we’re going to talk about this later but this takes like seven seconds and it takes seven seconds because sql server chooses i mean really like a non-parallel plan and it does some silly things right like uh like like this part is probably fine 152 milliseconds but then like we we did a you know single threaded scan of 17 million rows in the post table maybe not a great idea sql server maybe maybe not such a great idea but the the point here is that we get this version of the plan back and we have the whole query plan nestled in with our query now again we’re not tracking stuff in extended events like we were with the scalar or multi-statement uh valued functions but now when we run this with uh the the wider range of things really because this is where the reputation is over 800 000 people we get back results really quickly and sql server chooses a different execution plan from the one we just saw right like this one here we’re not doing that big ugly hash join single threaded scan of the entire post table we do a seek and a loop and a loop and a seek and a loop and a seek and all this returns relatively quickly right like we could do some performance tuning work on this but i intentionally had to make parts of this not perform great to show you like the differences and the different plans if everything had a perfect plan there wouldn’t be much of a demo here would there no be bored there would be no youtube video crap couldn’t have that what would you do what would you do with your life so now let’s try this for three right so now we’re going to do this and you know and in prior runs when when we did like the the scalar and the multi-statement one for just three it was usually really quick but for this one we get the same kind of crappy query plan back that we got back for reputation equals one right it takes like seven seconds and that’s that’s not so great either i’m like not thrilled about that but again when when we run this like with like in the context of like the full query we we get a we get a good planning or we get a better plan again and this all finishes relatively quickly for the four rows that we care about right again it’s kind of like a you know seek loop seek loop blah blah so on and so forth right you get you get the idea but this all works out a lot better now uh we are going to request performance assistance i’m going to say performance assistance please and we’re going to try to figure this thing out so um like you know you might start by thinking well sql server just you know it it didn’t do a loop join it should have done a loop join right needed a loop join like the the hash join plan is stupid sql server please just use loop joins but when we just hit the loop join at the end uh this ends up taking a little bit longer right that was the last time we ran this it was seven seconds and now that’s like 12 and a half seconds so that loop join hint was clearly not effective and sql server has made an even sillier mistake here now it has done all this right again all single threaded scanning the post table aggregating some stuff uh you know then a loop join down here and then another thing down here that takes like three seconds and so this all adds up to about 12 and a half seconds of time down in this portion of the plan now normally i would do this re i would do this i would do this hint inside of the function body but just for sort of like code brevity sake uh i’m going to just take the function body and inline it into the query and add the hint to that that i would usually put in the function body and that is to just add a force seek hint right here on the post table because really what this hinges on is sql server seeking or scanning the into the post table and with a lot of uh inline table valued function rewrites you do have to do this sort of hinting because like despite them being like the superior type of function in just about every single way imaginable you know they can return more than one row they can return multiple columns there’s no table variable involved they don’t force your query to run single threaded for any reason this is just the optimizer picking a bad single threaded plan for whatever reason the optimizer is doing that for um i haven’t gotten so far as to experiment with the parallel version of these plans because uh i i wanted to get this part done and recorded so uh you know not terribly interesting there uh yet anyway but um if we add a force seek hint to this query and and we run this well now now we’ll now we’ll get sort of the plan that we want and like i said normally i would take this and i would put this in the the body of the function and i would say sql server uh where every time this function runs i am going to force you to seek into the post table because we have a great index there for that now i like i i did try other stuff in this you know like i’ve been talking a lot about row goals and oh you can add top to this and that and top and speed things up top did nothing here top was ineffective uh across this entire thing fast one hints ineffective uh i tried all sorts of hinting around things and really the force seek hint here was the only one that kind of stuck and made things tolerable so so um the bottom line here is that uh because inline table valued functions even though they accept parameters they get inlined into the code and you don’t really see the same type of parameter sensitivity plan reuse thing that you would uh when you deal when you are dealing with scalar and multi-statement table valued functions so we can tell that this is the end because down here is where i was working on the loop code that i showed you in the first video we’re not going to look at that again because we just don’t need to see that but uh i i i think that’s the end so um thank you for watching i hope you enjoyed yourselves i hope you learned something and i will see you in tomorrow’s video i have a short video on sort of a funny take on sql injection that i hope you enjoy too um it’ll be it’ll be it’ll be it’ll be a friday short we’ll call it right because it’s not long so because it’s not long it’s a short all right i think i think i’ve i’ve taken enough of your time thank you for watching i’ll see you tomorrow all right that’s that’s good goodbye you

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Merry Christmas from Santa and Definitely Not Erik

Merry Christmas from Santa and Definitely Not Erik



 

Video Summary

In this video, I’m stepping into the jolly spirit of Christmas and sharing a bit of magic with you all! As Santa Claus himself, I’m offering something special: use the code HOHOHO from now until January 1st, 2026, to get my LearnTeenCast.com course at a whopping 75% off. But wait, there’s more—Erik’s Performance Engineering course is also up for grabs with his own unique discount! So, spread some holiday cheer and share this video with your friends and family. Ho, ho, ho! Remember, Erik wouldn’t be so generous with such a big discount, but as Santa Claus, I am all about spreading joy and generosity. Happy holidays from the North Pole!

Full Transcript

This is definitely not Erik Darling. This is definitely Santa Claus. Ho, ho, ho! Can’t you tell how jolly and merry I am? Anyway, it’s Christmas, and in the spirit of Christmas, I’m gonna give you something. Ho, ho, ho! I’m Santa. If you use the coupon code HOHOHO from now until January 1st of 2026, you can get my LearnTeenCast.com You can get Erik’s LearnTeenCast.com with, definitely not Santa’s course, definitely Erik’s course, LearnTeenCast.com with Erik. And Erik’s course, Performance Engineering for 75% off. Ho, ho, ho! I’m Santa. Look at me. Definitely not Erik. Erik would never give you a 75% off coupon code for that stuff. He lacks the generosity that Santa has. Ho, ho, ho!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

T-SQL Multi-Statement UDF Plan Caching and Reuse in SQL Server

T-SQL Multi-Statement UDF Plan Caching and Reuse in SQL Server


Video Summary

In this video, I delve into the intricacies of multi-statement table-valued functions (MTVF) in SQL Server, specifically focusing on parameter sensitivity and plan caching issues. Building upon yesterday’s discussion about scalar UDFs, today’s episode explores how these functions behave differently when recompiled, leading to varying execution plans and performance impacts. I demonstrate this through a series of demos, highlighting the differences between using a function with and without an `OPTION (RECOMPILE)` hint, and explain why certain execution plans are chosen over others. By running multiple executions and analyzing query plans in QuickQueryStore, we uncover how SQL Server handles plan caching for these functions, revealing the parameter sniffing problem that can arise. If you have encountered similar issues with your own functions or need help optimizing them, consider reaching out for consulting services. Additionally, supporting my channel as a member of the community will ensure you don’t miss any future insights on T-SQL and database performance optimization.

Full Transcript

Erik Darling here with Darling Data. And in today’s action-packed episode of YouTube, we’re going to continue our conversation about UDF, plan caching, and parameter sensitivity, whatever. I forget what I call these things. But anyway, this is what we’re talking about. Because yesterday we talked about scalar UDFs and what happens there. So today we need to talk about these and what happens here because it is the next logical step in the hierarchy of T-SQL functions. Alright? So let’s do that. But down in the video description, if you would like to hire me for consulting, perhaps you have parameter sniffing problems with functions of your own and you’re like, well, Erik Darling, why can’t you help us too? You can do that. If you would like to buy my training, you can also do that down in the links below. Where these lovely fingers point. You can be. You can become a supporting member of the channel, just like PBS. You can ask me office hours questions. And of course, as always, if this kind of content pushes you in your happy place, well, please do like, subscribe, and tell 50,000 or so friends. Anyway, past data community summit, Seattle, Washington, November 17th to the 21st. Kendra Little and I will be tag teaming to you tantalizing, titillating days of T-SQL pre-cons. Like I keep saying, it’s going to be the best T-SQL pre-cons that humanity has ever witnessed. So you should be there and get a t-shirt to commemorate your presence there because otherwise no one will believe you were there, right? Be like that Skid Row concert you went to in 1987. Did you get the shirt? No, you weren’t there then. Anyway.

Let’s get this database party started. And let’s see. We’ve got our function queued up here. Now, just like the ScalarUDF version of this, I had to write two versions of this function. One, normal function that we’re going to be using in our demonstrations or majority of our demonstrations. And two is a version of this with an option recompile hint in it, which probably already gives away the whole thing that, well, we have a parameter sniffing thing that can happen with these two. That probably messes the whole gig up, right? There I go. Blowing the whole waterworks. But anyway, I have reapplied my filters to this extended event session. So that’s great.

And what we’re going to do is we’re going to run the recompile version of this multistatement table valued function. Well, we can see that it is indeed a multistatement table valued function because it returns a table variable. And the primary driver of this UDF is to insert data into that table variable. And then, well, we just say return here. If we wanted to keep things like normal, we might say return that table variable, but we just say return. Don’t ask me why. I swallowed a fly. I swallowed a fly. But let’s make sure that these things are both in here and all correctamundo and that good stuff. So if I run these two functions, or rather this one function with, again, we’re going to do the same couple, same two of the recompile things.

Actually, there’s a third one hiding on me down here. Look at that. All right. Wonderful. We’ve got three. Run these recompile things. You might notice that these taken a little longer than we might like. The days really drag on, don’t they? Anyway, if we go look at the live data for these, we’ve got a query plan over here that looks like this for the first one. It takes about 5.4 seconds to run. Now, in the last set of demos with the scalar UDF, where I talked about how the query that calls or invokes the scalar UDF, the query itself is inhibited from going parallel because it invokes a non-inlineable scalar UDF.

Multi-statement table value function. Part of what took me a little bit to write these demos was working around this part is that the insert into a table variable has the same general problem that invoking a scalar UDF does, where if we look at the properties over here, I’m just going to make this a little bit wider, apparently a little bit wider. We have this thing in our query plan now. So whereas the last one had a non-parallel plan reason talking about the presence of a scalar UDF, we are not allowed to insert into the table variable.

This is true of all table variables, not just ones in multi-statement table value functions. Table variables can’t be modified using a parallel execution plan, including inserts. So part of why this one is so perturbingly slow is that.

Let’s close that. And for the second invocation, we have a slightly different looking execution plan, right? This one does some stuff and then does a hash join down to here. This one does some stuff and this one does the loop joins again. So this is the key lookup plan.

This one runs a bit faster, right? The total time on this is 238 milliseconds. Whereas for this one, it was five and a half seconds, which is no good at all. And then this one, this one takes 4.9 seconds and uses a similar plan to the one before, except there’s a hash join here.

You know what? These were showing different plans before. This time, I guess it didn’t work out so well. This, this, oh wait, there was something weird and different. Oh yeah. So there is a difference here. This is an actually, this, this was a funny one.

This one, this one took me a minute to catch. This one does a hash match inner join to the post table, right? This one does a right semi join to the post table. So it is a slightly different execution plan.

That caught me off guard there for a second. Someday I’ll have this all memorized, but we’ve cut the query plans. We see we got three different ones when we recompiled. Cool. Now let’s go free the proc cache, right?

And let’s say you select the top one from this. Remember if we like, we don’t want to cross apply to this. We could theoretically outer applying at correct results, but for this, we want to have the scalar sub query in the select list because this will not reduce rows.

Cross apply would reduce rows because it’s a like relational operator. And it’s like an inner join where if rows on like, like the rows don’t exist here, we won’t get a row back, right?

And we want our results to maintain correctness across different executions. So we need to put the, put the function call in the select list for this one. We could outer apply to it. But again, that’s a little bit more of a relational thing, whereas this is just like, go find me a result.

So, uh, we’re going to run this for reputation equals one. Let me make sure I did this. I can’t remember. It was too long ago. And let’s run this.

And that’s going to take about five ish seconds to run and get a result back. And if we come over here, uh, we will have a new entry in this. And we can see that we, we used the query plan that we got when we ran this for the one row with a reputation of one, right?

So that’s that new thing. Now I’ve remembered a clear data and not clear filters because that was annoying last time. But, uh, if we now run this for the 800,000 people, this is going to take about 20 seconds because we, it’s about five seconds per run and we’re returning four rows.

So the rules of mathematics tell you that you should take the number five, like about five, and you should multiply it by four because we have four rows. Four is not negotiable on that.

So this will take 21, 22 seconds, 20 seconds even. Great. So we’ve got that. And now we’ve got, uh, four query plans and we’ll see that SQL Server reused that same plan each time, right?

We, we reused the hash match inner join plan where we go down and scan the post table. Fantastic. We did it, right? We’ve, we’ve, we’ve cracked the case. So let’s do the same thing that we did.

Uh, we’re, we’re only going to do three just because, um, the, the plan for 11 was also about five seconds. So it doesn’t really make a lot of sense to show that one here. Um, I’m just going to do a one and three for this one, just to move things along a little bit.

So now if we run this for three, we get a, we get like the faster query plan for this back. And if we run this for the 800,000 rows here, we get those four rows back pretty quick. So, um, what I’m going to do is because there’s not a view for multi-statement table valued functions, the way that there is for, um, scalar UDFs, uh, they’re not tracked in their own DMV.

Uh, I’m going to use, um, quickie store to look at this and, uh, we should see, well, if we come over here a little bit further, five executions of each one of these, uh, UDFs or rather it’s the same UDF.

We just got different plans for it, right? You can ignore the all plan IDs here from the last hour. We got, uh, for a query ID 28475, we got plan ID 6007 and 6008. Uh, there was another plan ID for this from an earlier run that still ends up in all plan IDs.

It was don’t, maybe don’t want to worry about it so much. Actually, that might’ve been the plan ID for the number 11. I forget the, would be nice if there was a row for it, but what the heck? So we get the five executions and we can see the differences looking in quickie store in average min max in total duration between these, right?

Uh, we can see that, uh, on average, uh, one of these did a lot better than the other one. So over those five executions, the, the execution time for the slower one really piled up. And this is again, uh, a parameter sensitivity issue inside functions in SQL Server.

So hopefully this has answered all of your deepest, most burningest questions about, uh, multi-statement table value function, uh, caching and plan reuse rolls right off the tongue, don’t it? Anyway, uh, we’re going to stop here because we have, we have reached the end of these demos clearly because the next function down there on the bottom is inline table valued functions.

So now we’re, we’re going to go do that next. That’s going to be tomorrow’s video because we still have to make another video for tomorrow. Otherwise we have no video for tomorrow and people will say, Eric, why is there no video today?

Tomorrow. All right. Cool. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you tomorrow where we will talk about inline table valued functions.

All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

T-SQL Scalar UDF Plan Caching and Reuse in SQL Server

T-SQL Scalar UDF Plan Caching and Reuse in SQL Server


Video Summary

In this video, I delve into the intricacies of scalar UDF parameter sniffing, specifically focusing on how different parameters can lead to varied execution plans. I walk through a detailed example using two versions of a scalar user-defined function—one without and one with a recompile hint—to illustrate why parameter sniffing is crucial in understanding query performance. By leveraging extended events and dynamic management views, I demonstrate the nuances of plan caching and reuse, highlighting how these functions can significantly impact execution times based on input parameters. This video aims to provide insights that go beyond just stored procedures, emphasizing the importance of considering function behavior when optimizing SQL Server queries.

Full Transcript

Erik Darling here with Darling Data. And today’s video is going to be about Scalar UDF parameter sniffing. There was a question in recent office hours where I was like, I got blog posts about that. And then I went back and I read the blog post. And it’s funny, like, reading old content because you’re like, oh, I could make that better. Oh, I could fix that. But, you know, updating old material always seems to get get pushed away in favor of forging ahead, creating new material. I went back and I actually reviewed the file that I had on that stuff. And I was like, well, it’s like, I had to get more into some of this stuff. So I actually ended up using completely rewriting completely different functions and giving much more interesting tests on these things. So the next three videos are going to be about Scalar, multi-statement and inline user-defined function parameter. Well, like plan caching parameters sniffing. So we will endeavor to do our best covering those. Down in the video description, you will find all sorts of useful links. You can hire me for consulting. You can buy my training. You can become a supporting, that means with money, member of the channel. You can also do other fun things with your life, like ask me office hours questions. And of course, if you like this content, please do like this.

Subscribe and tell a friend. Past data community summit. Oh boy. These are probably sound like my last pitches that I can do on this thing. But November 17th to 21st, Seattle, Washington. As usual, me and Kendra Little are going to do two days of the absolute most magnificent T-SQL pre-cons that have ever existed. Just the entire span of human existence. So I hope that you’ll be there getting swag and other things from me. T-shirts and stickers and buttons and all the other stuff. So anyway, let’s start this database party up here. So there are a few things that I need to walk through before I show you exactly what happens when these functions start executed.

The first one is a view that I’m going to use to look at scalar UDF details. This hits a few dynamic management views and gets a pretty, like the reason why this is a view here is because I don’t feel like having this code interspersed throughout the presentation and having to reference the entire thing over and over again. So we’re going to look at the sys.dm.exe function stats view. Brand spanking new in SQL Server 2016. Glad you’re keeping up. And we’re going to use this to look at scalar function performance metrics. The second thing is, of course, the scalar UDF itself.

All right. So this is going to be the scalar UDF that we’re going to use. There’s actually two of them. And there’s two of them for a reason that I’m going to explain in a moment. But the idea here is to find out, based on someone’s reputation, we find people whose reputation is like 25% lower and 25% higher.

We can, of course, manipulate this if we wanted to, but it seemed like a good starting number. And we sum up all the reputation for other people who have a similar or all the post scores for people who have a similar reputation. And we look at those. It’s kind of silly, but it sounded like a fun function to write, so I wrote it.

So I don’t really care if it has a, I don’t really care if it delivers actionable business insights for anyone. Because the idea is to show you a behavior, not to show you actionable business insights. I’m on a dashboard. My eyes are up here, as they say.

So there’s this one. But then I also had to create a version where there’s a recompile hint inside it. And the reason I had to do this might surprise you. If you run a query, let’s say like a select query from blah, blah, blah, blah, blah, blah, blah, and you reference a scalar or multi-statement user-defined function in that query, and you have a recompile hint on the outer query, it does not recompile the scalar or multi-statement function in your query.

Fun, right? So the reason why that’s important is because over here, I had written both for the scalar, but this version has the multi-statement function column in it.

But I had written a loop. And the reason that I wrote a loop is because I needed to find out which of these values generated different plans. And the easiest way for me to do that was to grab the histogram from an index, put that into a temp table, and then have a cursor go over the temp table.

I did this in two different ways. One was just grabbing all of the range high keys. And then one way was grabbing all the range high keys plus one. So like I was trying to see if like the in-between numbers gave me any different plans.

Mostly they didn’t. And mostly I think they didn’t because I do not have a direct equality predicate in this. It’s a greater than, equal to, less than, equal to predicate.

So this was maybe a little bit of a wasted effort adding the plus one, but it was a fun experiment anyway. And then basically what I do is I cursor over the table, grabbing the range high key, and then execute.

Well, you can see there’s a recompile in the name of this function. This is clearly not the scalar version of the function, but this is the multi-statement version. And basically it’s just me trying to find different plans getting generated.

The way that I examined those plans getting generated was with extended events. And of course, I used my fabulous store procedure, SPHumanEvents, to spin up an extended event that grabs actual execution plans so that I could see what plans the function generated on each call.

Because getting execution plans normally, you don’t see that, right? Like when you get an actual execution plan for a query that calls a scalar UDF or a multi-statement table-valued function, the actual execution plan for the function doesn’t show up in the query.

That only happens for functions that are inlined into the code. So like if you have a non-inlinable scalar UDF as mine is, you get nothing, right?

It doesn’t show up in there. You can get an estimated plan, but the estimated plan often doesn’t match the actual plan. That’s another funny thing that I ran into quite a bit with this. It was a bit of a head scratcher at first.

So that’s what this session over here is. And there are some filters on this. That’s why you might see displaying zero of filtered event. Displaying zero of filtered zero events.

Total. What? Okay, whatever. Microsoft English. It’s a nightmare.

Anyway. So that’s this option recompile version of the function here. So what I want to begin by showing you is we can turn off actual execution plans for the moment. They’ll only get in our way.

Is if I run the recompile version of the store procedure, this was my big finding with the loop that I ran. If I run this and we keep an eye on this window over here, I might have to tinker with the filter, so don’t be mad.

If I run these three queries, all of these three queries will generate different execution plans because they’re going to recompile. There’s an option recompile on this each and every time.

So that’s running this version. And if we come over here and look at the live query data, wow, it worked. We have three query plans for this. And the reason why we generated three different plans is notice this number right here changed.

That’s the second parameter in the function. That’s the reputation one. This is the one that really drives cardinality estimation for the query plans that we get. So running these for one, three, and five, clearly I didn’t need that whole loop through 195 histogram steps because the first three, four lines in the histogram gave me everything.

So great. Anyway, for reputation one, we get this plan. Which takes about 1.2 seconds.

We can see that it is a big parallel plan. We scan an index over here. We scan an index down here. And the whole thing takes, again, about 1 point. Well, actually, it’s about 1.3 seconds.

The second query plan looks like this, where we have some constant scans and some compute scalars and blah, blah, and a nested loops join.

And then we go parallel over here and we do some stuff. The third plan looks like this, where this one’s a little bit different from this one, right? This one, we have one, I think it’s one set of constant scan stuff over here. But this one, you know, we still have one, but it’s different, right?

Because it’s a different, like, join over here. This one is nested loops, nested loops, nested loops. This one is nested loops hash join, right?

So we change things a little bit here, right? Because there’s a key lookup in this plan. Oops, this one. There’s a key lookup in this plan. And this one, we just scan the index on the post table.

This one takes about 735 milliseconds. And the one prior takes about 75 milliseconds. All right. So let’s keep those numbers in mind as we forge forward in our path to figure all this stuff out.

So there’s some stuff in here about the histogram that is unimportant. And there’s also some stuff in here about getting some frequencies. When I used to work in market research, you would call getting counts of various things frequencies.

How frequently did this happen? What’s the frequency of this? I’m going to throw the whole thing into the garbage. But this stuff is not pertinent to us.

This was pertinent to me writing the demo. So we can move on from that. But what I want to show you here is let’s come over here and let’s clear data out of this thing. And let’s run dbcc free proc cache because we’re going to have the non-recompile version of this first scalar UDF.

And what we’re going to do is we’re just going to get the top one row for where reputation equals one. And that takes about 1.3 seconds. Crazy, right? If we come over here and look at live query data, well, I guess it was more like 1.2 seconds.

I’m sorry. I misled you by 0.1 of a second. I hope you can forgive me. I hope you can. Find it in your heart.

Now let’s run this query. And let’s get everyone with a reputation over 800,000. Run this. And this is going to return four rows in about five-ish seconds or so.

And if we come over here and we look, there are four rows. There are four new execution plans. We reuse that hash join execution plan every time, right? And some of the times are a little bit different.

Like this is a 952 milliseconds. There’s 990 milliseconds. There’s 993 milliseconds. And there’s 1,003 milliseconds or 1.003 seconds. So every time that we ran the function, the function reused the plan that was cached for.

All right? Let’s clear that out. And let’s repeat our experiment. First, let’s see in here. Let’s validate some stuff.

Here is our function. Here are the five executions. And we have the total and average worker time and the total and average elapsed time. Something that I didn’t talk about up there that is kind of funny is all of the function plans that we saw, it’s inside of a scalar UDF, right?

And one thing that a lot of people, what they mess up about scalar UDFs, and I’m actually going to come back to these because it’s interesting enough to sort of backtrack a little bit. What everyone says about scalar UDFs is, oh, they prevent parallelism, which is true, right?

For the non-inlineable scalar UDF will inhibit parallelism for the query that calls it. But the body of the function is completely free to go parallel, right? Like this is clearly a parallel execution plan.

There is clearly parallelism in this execution plan, right? We can see the parallelism, right? We can smell it. It smells like fast. And we can see the parallelism in this one.

So the body of the scalar UDF is completely free to go parallel, but the query outside that calls that function is not free to go parallel, right? So I’ll actually show you that in the next demo just in case you don’t believe me.

Anyway, let’s free the proc cache and let’s run this now for reputation three, right? So let’s run this. And actually, we’ll turn on query plans for this one.

We’re going to have an extra execution of this query, but that’s okay. If we go to the properties over here, we will have this little thing in the query plan. Non-parallel plan reason. T-SQL user defined functions not parallelizable.

So they inhibit parallelism for the query that calls them, but the body of the function perfectly fine to go parallel, right? Good for us. Anyway, if we come over here and look at the live query data, we’ll see the two executions of this and the two query plans for this that do in fact use parallelism in quite a bit of the plan.

And now, so remember the last one took, I don’t know, like three, four seconds? But that was because the query plan for that one for each execution took around 991 second to run.

These function calls, I’ll take 70, well, rather, these function calls specifically take about 68, 69 milliseconds to run. So when we come over here and we run that 800,000 query for the same four rows, this comes back a lot faster because we reuse a slightly faster execution plan.

So we can already see that T-SQL non-inlineable scalar UDFs get a plan cast for them and further executions will reuse that execution plan, right?

So we come over here now and we look, remember the first two in here were the ones from our executions. The next four are from what our query returned, right? So this one, two, three, and four, these all took around 90 to 100 milliseconds or so.

And we got, but we got faster return results because we were using a generally faster execution plan for this query. So the plan reuse thing here can be a pretty big deal. Not a lot of people think about functions when they think about parameter sensitivity.

Everyone looks at stored procedures and SP execute SQL and like, ah, you parameter sensitive jerk. But functions, just as likely to happen. Fun stuff, right?

And just to sort of prove things out, let’s come over here and we’ll notice that the total and average times for these things are way down, right? From the initial invocation there. And just to sort of prove things out a little bit further, we’re going to do this for five as well.

So if we run this, did I clear this? I didn’t. Let me make sure this is cleared out. Oh, oh, I didn’t mean to clear all filters. I just meant to clear data. There we go.

Anyway, we’ll just deal with it. All right. So there’s our first execution. And we can turn execution plans back off. And this is why I had things filtered out. So here is our query plan for that.

There’s one other query plan in here for the calling query, right? Well, it’s, but this is the one that we cared about. But now if we come back over here and we run this for the 800,000 people, this is going to take a few seconds, but we get results back.

And then we have, if we scroll a little bit, we will have the query post execution plan for all of our executions of that. Oh, we’re up at the top.

So let’s scroll down a little bit, but you get the point. Oh, that’s not the point. Point is down here a little bit further. So there’s one reuse of the plan. There’s two.

There’s three. There’s four reuses of the same plan. So T-SQL, scalar UDFs, most definitely, most definitely. Cache execution plans and reuse them.

And if we look over here, then we will see, of course, the times on that. And these times went up a little bit. That query plan is a little bit slower for the results that we were looking at there.

So as always, I hope you enjoyed yourselves. I hope you learned something. And I will see you in tomorrow’s video where most definitely we are going to talk about the multi-statement variety of UDF.

And we will find out how plans are cached and reused for these. So we have that to look forward to, don’t we? All right.

Thank you for watching. Thank you. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 41

SQL Server Performance Office Hours Episode 41



Questions:

  •  Hi Erik! How do you see the DBA role evolve in the next couple of years ? Will it go to a more coding level or maybe more to a multi-database role ? Thank you for the great content.
  •  Hi Erik! When will you be coming to Sweden?
  • What setup do you use to record your YouTube videos?
  • First of all, thank you for everything you share! When it comes to SQL Server 2025, which new performance-related feature do you believe has the greatest potential to become a real success?
  • you’ve already told us all the sql books you like, what are the books about lifting yhtat you like?

To ask your questions, head over here.

Video Summary

In this video, I discuss the evolving role of database administrators (DBAs) over the next couple of years and share my thoughts on how it might shift towards coding or a multi-database role. I also dive into some personal questions from viewers, such as my travel plans to Sweden for SQL Server conferences and the setup I use for recording YouTube videos. It’s been interesting to reflect on the future of DBAs while sharing practical advice on equipment and techniques that can help improve your own video production process. Additionally, I touch on what new performance-related features might be coming in SQL Server 2025, or rather, why they may not live up to expectations based on current demos and updates. Overall, it’s been a great opportunity to engage with the community and share insights that can help both DBAs and developers navigate these changes.

Full Transcript

Erik Darling here with Darling Data, and it is my favorite Monday of the week because I get to answer your office hours questions. Five of them, this many. I’ve counted and double counted to make sure that I don’t shortchange anyone or overcharge myself. So we’ve got that going for us. If you want to ask me office hours questions, you can do so. There’s a link down in the video description. It will take you to my website where there’s like, you know, a little language that’s going to be in the description. So I’m going to have the language around like, hey, if you want to ask a question with some details, here’s how to do it. But there’s also a handy link to the Google doc form that you submit your questions to no email required, right? Just send it in there, right? No, just drop it in. While you’re down there in the video description, hunting for the office hours link, you can do all sorts of other wonderful things like hire me for SQL Server consulting, buy my SQL Server training, become a member of my SQL Server channel and all sorts of other stuff. And of course, if you enjoy this content, well, you can always like, like, subscribe, tell all your friends, you know, all that good stuff. Past Data Community Summit coming up short order. Very close now. You can smell the pre-cons that me and Kendra Little are doing. Two days of them, T-SQL, the best T-SQL pre-cons that have ever been pre-conned. So, you know, if you’re not there, you will be missing out entirely.

on the most formative event of your life, right? But anyway, let’s get this T-SQL, or not T-SQL, SQL Server Office Hours, Performance Office Hours, Party Started. Oh, that database is missing an arm. Ah, AI. That’s not, that’s, that’s not nice. Chopping arms off of these poor databases. Anyway, let’s go look at what we’ve got here. So, hi, Eric. How do you see the DBA role evolve in the next couple of years? Will it go to a more coding level or maybe more to a multi-database role? Thank you for the great content. I am notoriously bad at forecasting these things. I’m just straight up not good at it. You know, I’ve, I’ve lived through, I think like three DBA extension, extinction events now. You know, it was like VMs, the cloud and well, I guess we’re living through the AI one at current.

But, you know, I think that if you are good at what you do and you are good at learning things, you know, you, you will, you are, you are not going to be extinct. But, um, as far as like the, the two things you mentioned, I don’t think that it’s going to go to, I don’t, I don’t think that it’s going to go to either one specifically. Um, you know, uh, you, you might see a shift to, uh, or you might see yourself shift to a different technology, a different database platform more and more.

I don’t know. Uh, I, I, I don’t know what industry you’re in, where that might happen. Uh, a lot of my clients are firmly entrenched in the Microsoft stack for better or for worse. Um, you know, so like a lot of them, there is no like multi-database thing, unless it’s for like some microservices or like some outside feature. Like, um, you know, you’ll see them move some micro microservices to like, uh, like Aurora Postgres, but it’s not stuff where they’re like sitting there staring at performance and like, like biting their nails over it. Um, they might go to Elasticsearch for full text stuff rather than keep, uh, like punching themselves in the face with full text indexes and SQL Server.

But, um, you know, uh, like when you think about like what like DBAs generally do, you know, you’ve got your, like your infrastructure DBAs, you’ve got your performance, like, uh, oriented DBAs, which is like what I do. And then you have the people who like, like, who are like developers and stuff. Uh, the people who do, uh, performance tuning work and development work have already, like already do a lot of coding type stuff, right?

You rewrite queries or, you know, like, like the developers make the features and the performance tuners make the features faster, right? Type things. Uh, the infrastructure type DBAs, they’re just doing infrastructure in the cloud mostly now, right? You still have a lot of on-prem people, of course, but for people who are part of that shift, you know, like whatever they were, whatever they were doing on-prem, they now just have to do like through a crappy portal and like maybe some like command line scripts and stuff.

So, um, you know, I, I don’t generally know where, where the whole thing is going to go and end up. Um, but as far as I’m concerned, uh, people are always going to have performance problems. And so that’s why I’m very happy to keep doing my performance tuning stuff.

So that’s the best answer I can muster on that. I just had lunch. So this is like 90% chance I’m going to fall asleep during or after this video. Uh, hi Eric, when will you be coming to Sweden?

Well, uh, I don’t have any vacation plans to Sweden currently. Uh, but if there is a SQL Server conference in Sweden that you feel that I could make a meaningful contribution to, uh, and they are the type of conference that accepts pre-con events, uh, where I can, I can, you know, defray some of the cost of traveling overseas, uh, then I’m, I’m, I’m always happy to show up.

So you tell me when I should come to Sweden based on when there is an event in Sweden that I can attend. And again, defray some travel costs and potentially have a little tax write-off as well. So you tell me when I’m coming to Sweden.

I can’t tell you. All right. Uh, what setup do you use to record your YouTube videos? Well, it’s a good question. Um, I have a Nikon Z30 pointed at me. Um, I didn’t, I bought it because they were like, this is the ultimate content creator camera.

And then like, it got like one firmware update and like in like, I don’t know, I guess it’s 2019 or something and nothing since. So, uh, I don’t know if it is the ultimate creator camera.

It took a lot of work for me to get it to the point where it didn’t overheat and shut off after like 30, 40 minutes. So maybe, maybe not. Uh, I don’t know if that’s the best advice that I, uh, I don’t know if that’s advice you want to take, get it.

Maybe the advice is not get it. Like maybe there’s better advice out there. Um, I’ve got this shore wireless thing that I use. Um, I don’t know if you recall, uh, a few months back, my, my microphone thing snapped and I had to get these clip on things, but it’s been working great ever since.

Uh, and aside from that, I’ve just got a bunch of hot lights. I’ve got two, one on one, like, you know, photography type lights on either side of me. And then a couple of ring lights in front of me over here.

Uh, aside from that, um, it’s, it’s all held together by bubble gum and magic as far as I’m concerned. All right. First of all, thank you for everything you share.

Well, obviously we haven’t eaten dinner together. Uh, when it comes to SQL Server 2025, which new performance related feature do you believe has the greatest potential to become a real success?

Uh, quite frankly, none of them. Um, so there are features that I think are cool. Like I think optimized locking is cool. Um, I also thought the optimized Halloween protection thing was cool until it got pulled out cause there were bugs in it, but I was really looking forward to that one.

Um, aside from that, when you look at like performance stuff in 2025, there’s nothing really all that interesting in there. Um, there’s like the oppo thing, which is built on the pisspo thing, but the pisspo thing isn’t that great.

And they haven’t made the pisspo thing any better. So I don’t think the oppo thing is going to be all that great. Uh, and then like, like the few demos that I’ve seen about like, uh, batch mode improvements for some of the mathematical functions, like sum and min and max and average, all the, all the improvements are measured in microseconds.

So I’m like, you can’t like, like, I remember like when, uh, when, uh, what was it? JSON first came out in SQL Server 2016. There were my actual Microsoft blog posts comparing JSON and XML for stuff.

And the, all the, all of the demos ended with, and as you can see, there is a 300 micro sec, micro second difference. JSON is clearly superior.

And I’m like, get out of here, like, leave me alone. Yeah. Uh, so I don’t know. Uh, I’m not, I’m not all that crazy about SQL Server 2025 for this stuff. Um, clearly this is the fabric in AI release.

This is not a SQL Server release. Nothing good happening here. Uh, the vector indexes are laughable at the moment, right? They’re read only. Um, I had posted the video about the code that they currently runs when you created a vector index.

It’s like, like who pushed that dead horse out the door? Oh, it’s amazing. Anyway, um, what was I going to say? Oh yeah. It was, there was a, there was a, Bob Ward posted a picture the other day of, uh, the SQL Server 2025 book on a shelf with all of his other books and the SQL Server 2025, 2025 book is like a pamphlet about this big.

All the other books are like, like a good solid ribeye thick, right? Maybe a healthy porterhouse thick. That’s that’s, those are steak measurements in case you’re a vegetarian. But it was just funny.

It was like, damn, like, uh, not much to say about that one. All right. Well, anyway, I dug a pretty good hole on that one. I’m going to have to send flowers.

Uh, you’ve already told us all the SQL books you like. What are the books about lifting? Excuse you that you like. Um, well, um, the, the, the, the two books that have made the biggest difference for me, uh, are the two starting strength books.

Um, it’s, there’s the blue book and the gray book. Um, one is, uh, for about beginner, um, barbell training. The other is about intermediate barbell training. Um, I suppose I could put links to those in the video description if you’re interested in picking those up.

Um, but they are excellent at, um, both, uh, describing, uh, the, uh, how to do the lifts properly. I think it’s like 72 pages about how to properly do a squat, which if you, uh, read and slowly absorb them as you’re doing those lifts, you can, you can really like feel why this is the right way of doing things, or at least, uh, I think, I think a very optimal way of doing things.

Uh, and so it’s not only like how to do the lifts, but also like how to program, um, a lot of, uh, uh, take, takes a lot of the programming questioning out of there. Um, you know, there are of course like templates in the book on how to do something, but, uh, past a certain point, you really should just get yourself a proper coach who is good at programming and make, uh, good calls on these things.

Uh, cause, um, you know, you’re, you’re going to get certain, uh, rep and set schemes stuck in your head. And it’s going to be very difficult for you to figure out, uh, alternate ones and which ones would be appropriate at your stage of training.

So, uh, if, if you, if you, if you do find yourself, um, you know, uh, going into the gym and getting, uh, quite advanced with the, both the amount of weight that you can lift and all that other stuff, and you find yourself sort of like hitting plateaus and not like being able to like meaningfully add weight to the bar any further than, uh, getting a coach really is, um, uh, the next logical step for you there.

Anyway, I think that just about covers everything here. Uh, been a pleasure as always. Uh, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something and I will see you in tomorrow’s video. I’ve got some interesting, at least I hope it’s interesting stuff lined up for this week. So, um, we’ll, we’ll slog our way through that as we are slogging our way through, uh, yet another Monday of corporate doldrums and all the other stuff.

Uh, you know, there’s a reason that people used to drink at lunch and it’s sad that we’ve lost that in our, in our society. Anyway, thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

New in SQL Server Management Studio 22: Open Execution Plans In a New Tab!

New in SQL Server Management Studio 22: Open Execution Plans In a New Tab!


Video Summary

In this video, I share an exciting new feature in SQL Server Management Studio 22 that was recently added as part of the preview release. As someone who frequently deals with query plans and execution paths, I can attest to how frustrating it is when you lose a plan after doing something else within your query window. This feature allows you to open an execution plan in a new tab directly from the query results, ensuring that you don’t have to worry about accidentally losing or overwriting important plans. It’s a simple but incredibly useful addition that streamlines the process of tuning and comparing queries. Additionally, I highlight another issue I’ve opened for SSMS, suggesting the inclusion of an option to keep actual execution plans enabled by default across all tabs. This would be particularly beneficial for query tuners and presenters who rely heavily on these features during their work.

Full Transcript

Erik Darling here with Darling Data. And this is a very short video because it’s Friday and we’re all in love. Just to talk about a new feature that got added in SQL Server Management Studio 22 preview. If you haven’t gotten a chance to download and try that out yet, I don’t blame you. It took me a little bit to get to it too. The SSMS team is working fastly and furiously. Remember the cadence at which you went from 20 to 21 to 22 has been whiplash speed. So this was actually a feature that I suggested and I’d like to thank, of course, Aaron Stilato and the whole crew who works on SSMS for noticing and taking care of this wonderful addition to SQL Server Management Studio. So what it is, is if you run a query and you get a query plan for it, now you can open that execution plan in a new tab so that you don’t lose it if you do something else within your query window. The number of times this has happened to me and I’ve had to rerun a query to get the query plan is absolutely infuriating. And so I just want to show you what it does and how it works. Now, I just got SQL Server Management Studio 22. I have not done any setup or anything on it. But I just want to show you what it does and how it works. Now, I just got SQL Server Management Studio 22. I have not done any setup or anything on it.

So if it looks weird or anything like that, I guess that’s why. But, you know, I think it looks nice, right? Dark mode and all that other good stuff. So what I’m going to do is just run this query and we’re going to see an actual factual execution plan show up down here, right? You can admire all its glory. Look at this. Oh, geez. We’ve got spills and look at all this stuff going on. Oh, boy. We got to tune this thing. But what if we wanted to like, you know, try some stuff and then and then I don’t know, like compare execution plans. What would we do? Well, we could go the clunky route and save it or we could right click and we could say show execution plan in new tab. It’s beautiful. Look at that. And then we do this and it pops up in a brand new tab. And now we can do whatever you want in this window without losing our query plan. It’s fantastic. It’s wonderful. We don’t have to worry about accidentally hitting something. or doing something without remembering to save stuff. We can just pop something off to a new tab and move on with our day, right? Get back to tuning queries, all that good stuff.

Now, there is one other issue that I have open for SQL Server Management Studio. And I think it’s probably an important one for query tuners and people who present about query tuning generally. Because nothing is worse than forgetting to turn on actual execution plans when you are when you’re going to do things, right? Because if you have a lot of different tabs that you’re dealing with or whatever reason, let’s just say, maybe you’re just a fellow old person who works with SQL Server quite a bit.

And the old memory might lapse here and there. I have an open issue to add an option to keep actual execution plans enabled for all tabs. Now, this isn’t obviously this wouldn’t be a default, right? Because not everyone wants execution plans on all the time. This could even maybe be something that you like a setting and presenter mode, because, you know, that’s maybe a little bit more geared towards the type of people who would who would who would like this button there.

But it is something that I think would be a useful addition to SQL Server Management Studio. So we’re going to I don’t know why I just refreshed that we’re going to I’m going to promote that issue in this video as well. The link for this will be down in the video description.

But SQL Server Management 22 Studio 22. I don’t know why I keep forgetting the word studio. This is this is gives you some idea about the forgetful nature of the aged population working with SQL Server. But looking good, looking real nice.

I like it’s all smooth and pretty and got got all the familiar buttons that I’m that I’m used to. So I don’t get lost and have another grandpa episode. Where’s my damn buttons?

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that you will download SQL Server Management Studio 22 and test it out. Give it a give it a give it a give it a little hoot and a holler and and see how it goes.

And I don’t know. It’s just about it’s Friday. We should we should not have to watch overly long videos on Fridays. We should we should be able to get back to enjoying ourselves day drinking, optimizing our wine storage cabinets, whatever it is that we do to enjoy ourselves.

I don’t know. Smoke four packs of cigarettes and stare at the sky. That’s sounds like a nice time.

Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Batch Mode Sorts and Row Mode Indexes in SQL Server

Batch Mode Sorts and Row Mode Indexes in SQL Server


Video Summary

In this video, I dive into the intricacies of batch mode sorts and rowstore indexes in SQL Server, sharing insights from a recent client-facing issue that highlighted the challenges and nuances involved. We explore how forcing parallel plans can sometimes lead to unexpected results, such as the need for additional sorting even when an index perfectly aligns with query requirements. I also discuss the trade-offs between optimizing for speed versus memory usage, providing practical advice on when to use the `MAXDOP` hint to achieve better performance in specific scenarios.

Full Transcript

Erik dishwaskeling here with Darling Data. And today’s video, we’re going to talk about batch mode sorts and rowstore indexes. And hopefully I said that right as I’m reading it from the teleprompter in front of me. Great. So this was an issue that a client of mine had trying to like, like various parts of their workload, you know, row mode execution just doesn’t cut it. You need to get batch mode stuff involved. But sometimes, especially batch mode on rowstore is a tricky thing to sort of get happening in the way that you want it to happen. So we’re going to talk about a little bit of that. And we’re also going to talk about, of course, the little caveat with batch mode sorts and rowstore indexes. So we have much on our plate. Now, I did mention that this was a client facing problem. And, you know, I do have clients. I do have nice people who pay me for my time to do things for them and have a client-facing problem. Help them with things. If you would like to be one of those lucky people, there is a link down in the video description where you can hire me to do SQL Server stuff. It’s a crazy scheme, I know, but it works. Trust me. You can also, by training, become a channel member, ask me office hours questions. And of course, if you find this channel content just so groovy and ghouly that you want to make sure that it spreads far and wide like the herpes simplex virus, you can, of course, like, subscribe and tell a friend. Get me in that sweet YouTube algorithm or whatever.

Past Dita Community Summit. Oh, boy. Oh, it’s so close. Two days of T-SQL pre-cons. Me and Kendra Little. The entire event is in Seattle, November 17th to 21st. So you should be there. Hopefully you bought tickets and all that stuff at this point because we’re only a few weeks away and sometimes planning this sort of travel takes some time. You got to start early, right? My wife tell you when she books travel, it’s like a million years in advance. So I had chat. So like by the time this video publishes right now, it’s still October. By the time this video publishes, it will no longer be October. So I had said chat GPT. My birthday is in November. Can you give me a birthday themed drawing? Just like iterate on the one that we’ve been using and it left in some Halloween this right? We still have a ghost and a vampire thing, but it is a festive birthday and I am kind of a spooky, spooky person. So we’ve got, we’ve got a ghost and a vampire showing up to my party. So anyway, let’s go talk about the stuff that we wanted to talk about. See, I’m so spooky. I have a black background sometimes. All right. So I think I already did this, but let’s just make sure. Yeah. Wonderful. All right.

We’ve got an index. So what I want to show you first is, and I don’t want to run this like actually like at real time run this because it takes a long time. So just really what I just want to show you is the estimated execution plan for this. Now, this is a query where I am forcing a parallel plan. And since if you watched my video about what Microsoft is doing to create their vector disc and indexes, you should know that the enable parallel plan preference hint is used in that code.

So it’s safe for production now, right? Because Microsoft uses it in production, so you can use it in production. So now it’s safe. They’ve blessed it. So what I want to show you specifically here is that when this query executes in row mode, right? And if we, let’s buy ourselves a little bit more query plan real estate up here, we can generally visually infer that this plan ran in row mode.

Because like, like we have a repartition streams and repartition streams there, they don’t like none of the parallel exchanges support batch mode segment and sequence project aren’t used in batch mode. We get a window aggregate operator for windowing functions when we execute in batch mode.

The filter, of course, could be in batch mode. Actually, I mean, it probably would be if I didn’t disallow it. But and then the gather streams is, of course, another parallel exchange operator. So that can’t be batch mode either. So this is a fully row mode plan. But the thing that I want to point out here is that because we have an index, right? And if we scroll back up here, I should probably sell a little bit of my query plan real estate. We have a non-clustered rowstore index that not only fully covers our query, but supports the window function specification exactly right.

So user ID and sort descending, which is what we’re asking for in here, right? So because we have this, we don’t have a sort operator in here where SQL Server has to like, like, like basically rearrange data from the way it’s stored somewhere to the way that the window function needs it to create its row number, right? So we don’t have to do that there. Now, next is me levels like, you know, try like, hey, like, get that out of the way. Like that’s, that’s gonna, that’s gonna come in handy later. The thing about the sort. So keep that in mind. Now I’m like, okay, well, we want batch mode, right? We’re like, like, we’re gonna use our auxiliary columnstore helper table, we’re gonna do this left join to it. It’s got a clustered columnstore index, it’s got no rows in it, we’re gonna do this thing so that we get a batch mode thing. And we’re gonna say no, but we’re gonna use this now safer production query hint in there. The thing is, though, if we get the estimated plan for this, it’s the same as last time, right? And like, we zoom around a little bit, we’ve got row and like, you know, the like, none of this stuff can be in batch mode, like, like repartition streams, segment sequence project gather streams, but in our filter operator is also still in row mode.

Now this was kind of an interesting one. And what it comes down to for batch mode on rowstore in this query plan specifically, is batch mode on rowstore takes one look at the text column in the comments table and says, hell no. The text column is in VARCAR 700, right? So if we remove that from the query select list, right, we’re gonna specifically name our columns in here. And we’re gonna say no text column, right, we’ve removed it from existence. And we run this, right, we all let’s just get an estimated plan for this. Now we see a much more batch mode plan. Right? But we now we have a sort, we’re using that same nonclustered index, right? So the data from the index is in order. But when we read from this, we’re reading from it in batch mode. Okay. So let’s run the query. And let’s look at how this thing fares. So let’s start over here on the left. And we get a do do do do memory grant of 1418 megabytes. So that’s about 1.4 gigs, I think, depending on how you like divide if it’s by 1000 or 1024, you might, you might have a slightly different take on exactly what that is. But let’s just call it 1.4 gigs for the sake of 1400 megs. Yeah, it’s about 1.4 gigs there. All right, all right, cool, we did the math. But this query finishes in about 1.3 seconds. But the thing is that, why is this sort here? And it’s kind of an interesting thing, because this only happens in parallel plans, right? So if we come down here, where we’re going to do almost the same thing, except we’re going to force this query to run at max.1, right, we’re going to say, hey, and just keep in mind for this query, we didn’t even need this to get batch mode on rowstore.

If we check the estimated plan here, right, we’ll see that we do, we have a very similar plan, but without the sort, right? So reading from the row mode index, with a max.1 plan, we don’t have to sort data here, we can actually rely on the order that this data came from.

So, but if we run this, at max.1, okay, so this whole thing, you know, like it’s a fully batch mode plan. So each operator only has the operator time for itself. So we have 5.2, and then another 137 milliseconds, and then another 35 milliseconds. So let’s just say it was about five and a half seconds total. But the memory grant for this is 120, 1024 KB, which is like one meg. So this is an interesting thing. And this is, this is specifically just a product limitation for the SQL servers implement implementation of batch stuff. It can’t like use the, like in a parallel plan, where batch mode is used like this. It cannot trust the order that things come out of the index. And it can, the index is like the, it’s not the, not the, not that the index is out of order, but the order that stuff gets read in, in batches might not be true to like the, the sort thing. So like it might not maintain things correctly. So we still have to sort data here. So the question for you becomes, what do you want to optimize for? If you want to optimize purely for query speed, well, it’s probably fine to have the, the, this thing run in parallel and have a sort in it and use like 1.4 gigs of memory.

If you want to optimize for memory usage and you don’t, and you’re willing to sacrifice some time for that, you can of course force the plan to run at max.1, not sort data and use less memory. But this was a very interesting thing. And I had to do a little bit of research on the batch mode sorting to make sure that I was correct in this. So, so I talked to some nice people who, who worked with the product and they said, yeah, that’s true. And I said, yeah, that’s great. So yeah, batch mode on rowstore, using windowing functions, when you get a parallel plan, right? Even if you have an index that perfectly puts your data in order for the windowing specification, the parallel plan sort of negates all that and you will still have to sort things. When a serial plan, right? You do not, like you see the, the can trust the order coming from getting stuff from the index and you do not have to sort things.

Reading from big tables with a single thread, even in batch mode does take more time, right? 5.2 seconds versus 829 milliseconds. So really depending on what you want to optimize for, you might choose to hint max.1, get rid of the memory grant, take a little bit longer, or say, screw it, use the memory, use the, use the extra threads, finish as fast as you can.

Anyway, that was about it there. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I’ll see you in tomorrow’s video where we will talk about, I don’t know what kit. Well, maybe I’ll just make it up as I go as usual. All right. Thanks for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.