A Little About Serializable Escalation In SQL Server

A Little About Serializable Escalation In SQL Server


Video Summary

In this video, I delve into a fascinating aspect of SQL Server that often goes unnoticed: how certain actions can escalate to the serializable isolation level without explicitly setting it. I explain why you might not always see these escalations in standard monitoring tools like `sp_whoisactive` or deadlock XML reports and demonstrate with an example involving indexed views, which can trigger this behavior when updating referenced tables. By walking through the process of creating an indexed view and performing updates that cascade to multiple tables, I show how to identify these hints using my `What’s Up Locks` tool and trace flags, revealing the complexity behind SQL Server’s locking mechanisms.

Full Transcript

Erik Darling here with Darling Data. And today’s video is kind of a fun one because something that, you know, like you’ll hear a lot, but not actually be able to see very easily. And this is about how doing some things in SQL Server will escalate to the serializable escalation level, but not tell, serializable escalation level, isolation level. I mean, it is an escalation level too, but, but not really tell you. And how like, you know, like all the normal stuff that you would look at if you were trying to figure this stuff out doesn’t really show you like, you know, like in a, in a meaningful way that the serializable escalation happened. So like, like if you’re looking at the block pro, if you’re looking at like SP who is active, right? Well, there’s blocking going on. It won’t, it can’t tell you about this. If you look at, um, if you look at like the block process report, the deadlock XML report, it won’t tell you about this. And the reason why is because the, the serializable, uh, isolation level escalation happens as a hint. So like it’s not setting the transaction level for like the whole thing to serializable, right? When, when you set transaction isolation level something, then you can see that in who is active. Then you can see that in the block process report.

Then you can see that in the XML deadlock report. If you just change the, if you ask for something different at the, using a query hint, it’s not for the whole transaction. And so SQL Server usually like, if you’re using all the defaults, we’ll just report read committed. All right. So, uh, we’ll talk about that in this video. Anyway, uh, if you think this stuff is interesting and you have interesting SQL Server problems or even really boring ones that you just want someone else to work on, you can hire me for consulting. Uh, if you want to learn more about SQL Server, stuff like this, you can buy my training. Uh, if you like this content enough to, uh, the, you know, uh, like donate a bit to the channel, you can become a channel member. All of these things are down in the video description, uh, along with, uh, the link to ask me office hours questions, which I try to answer five of every week. And of course, if you enjoy this content, uh, do the kind thing, uh, like subscribe, tell a friend, all that good stuff. Uh, the only thing that I have left on my speaking schedule, uh, through the end of the year is past data community summit, uh, taking place in Seattle, November 17th to 21st, where I will be banging out two days of T SQL pre-cons with Kendra little. Uh, they will, are undoubtedly the finest T SQL content ever produced. Uh, so I will, I will see you over there.

Anyway, let’s, let’s get this party start. Do do do come over here. So I’ve gotten in, so the way, so this can happen for two reasons. I’m going to show you one of them because showing you both of them would be boring and redundant. So, uh, I’ve created an indexed view. Uh, it’ll, it will happen for this. Um, if the index view has more than one table referenced in it. So like, like there’s an, obviously a join between users and posts in here.

This will also happen if you have foreign keys with cascading updates or deletes. So there are two things that at least I’m aware of where this will happen, right? So, uh, modifying tables that are referenced by index views, assuming that there is more than one table in the index view. Otherwise you just get regular X lock hints, uh, and cascading foreign keys. So I’m going to show you the example with, uh, index views.

So this view is already created or altered, and this index is already created. When I run this, I’ll get an error. Great. So, uh, one way that you can kind of see this is for some, this doesn’t like these types of locks don’t always show up when you, uh, modify a table reference in the index view. Um, that, that, that I, that I don’t have full details on, but I can tell you that for this update takes around four seconds.

And if we use, uh, my little helper thing called what’s up locks, uh, this is available at my GitHub repo. Uh, the short link for that is code.erikdarling.com. That’s where you get all my other store procedures like quickie store and pressure detector and stuff. But if we run that update and we get the locks and then we, uh, you know, roll back the transaction within that, we can see range XX lock.

So range locks like this are, uh, hints about serializable stuff going on. So that’s the first thing, right? Pretty, pretty obvious that we got some serializable locks that way.

The only way to see the hints that SQL Server supplies that I’ve found is to run the update. Um, you need to recompile hint. So the stuff shows up, but then to use this sort of smattering of trace flags to get other information out.

So if we run this, this will also take about four seconds and, um, down to the messages tab, you’ll have all this crazy stuff, right? There’s stuff, this stuff goes on basically forever. But if we look, um, if we look through this enough, we’ll see some hints in here, right?

So if we scroll down and we find where, uh, we start looking at tables and things, then we will see, uh, parts where SQL Server started applying hints. But, uh, it’s, it’s really kind of a nightmare to find and all this stuff. Uh, you really have to go scrolling and looking and, uh, it’s, it’s not a lot of fun.

Um, so, um, I’m not going to make you sit through all that, uh, but this is what part of the output in that looks like, right? You’ll see something like, uh, FIOP, which is, you know, physical, physical operation and, uh, range. And then we’ll see for the POST table.

Remember that was one of the tables that was in the indexed view. Uh, we’ll see, uh, some information in here. And then of course we will see hints applied. Those hints will be, uh, serializable internal and detect snapshot conflict.

Uh, detect snapshot conflict is in there in case you have snapshot isolation level enabled. The snapshot isolation level enabled in SQL Server needs to figure out if your rights are going to conflict with another one. But the important thing is the serializable internal.

And then on the indexed view itself, user post score, right? So this was on the POST table, but on user post score, we get some different ones. We get an upd lock and along with serializable, uh, because we actually have to update, we have to maintain the index view.

We’re not updating the POST table in this. Remember, we were just updating the users table. Uh, so the, but the POST table needs to be read from using the serializable isolation level for the indexed view itself.

We, we take an upd lock cause we have to maintain the index view, right? Cause the column in the users table we were updating is in the indexed view. So we have to update the index view.

And of course we get the same serializable hint when reading from the POST, from the index view to figure out which rows need to be maintained. Uh, and we of course get the detect snapshot conflict, um, hint as well, just in case snapshot isolation is enabled. So, like I said before, you have to go pretty deep into SQL Server stuff to find where these hints get applied.

Uh, it takes a lot of trace flagging and other annoying stuff to, to get this information. And then looking at the trace flag output, which is also not fun. Um, so, you know, there’s that, but once you find it, you can prove to people that, uh, doing certain things in SQL Server, primarily, uh, uh, creating an index view that spans more than one table.

And, uh, having to update one of the tables in that index view and, uh, and, uh, foreign keys with cascading updates or deletes will also show this same stuff. So it does happen. This is the only way I’ve found to really prove that it happens, but it does.

And, uh, that’s my story and I’m sticking to it. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you next time.

Adios. I’m hitting the wrong button. There we go.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Why Partitioning Is Not A Performance Feature In SQL Server

Why Partitioning Is Not A Performance Feature In SQL Server


Video Summary

In this video, I dive into why partitioning a table is not inherently a performance feature but rather a data management tool designed to facilitate quick swapping of partitions in and out. I demonstrate how setting up a table for partitioning can negatively impact query performance, especially when using aligned nonclustered indexes. Through practical examples with the Stack Overflow database, I show that queries on partitioned tables can significantly slow down compared to their non-partitioned counterparts, even when using clustered indexes. The video also explores the trade-offs between creating aligned and non-aligned indexes, highlighting scenarios where non-aligned indexes might be necessary but come at a cost in terms of performance.

Full Transcript

Erik Darling here with Darling Data, and today’s video is sure to be a video. We are going to talk about why partitioning is not a performance feature. It is a data management feature. When you partition a table, the primary objective is to be able to quickly swap partitions in and out. Anything else that you do with partitioning, it’s not that much fun. Splitting merging and merging partitions, all that other stuff. Managing partitions. Not fun, not performance. But this video is specifically about how when you set up a table in order to facilitate data management with partitioning, the performance of some queries can quickly go south. What we’re not, what I’m not talking about in this one is mixing clustered columnstore indexes with partitioning, which can in some cases help performance by getting you additional segment or two additional segment or row group elimination, however you prefer to talk about it, because most people are not doing anything quite that exotic with their tables. Most people are in the world where they are, they got it through somehow got this idea through their incredibly fixed goals that if they partition the table performance will get better. Wrong. If you look down in the video description, you’ll see all sorts of helpful links. You can hire me for consulting by my training, become a paid up member of the channel. If you want to support this high quality SQL Server content that I put out here. You can also ask questions on office hours. Those are free. And I answer five of them every week. And of course, if you enjoy this content, I do I do ask that you like subscribe and tell a friend maybe even 1020 30 100. If you have another like 1000 friends or something, just send them all the links.

Spam them. Spam them. Screw them. The only conference that I have left on my schedule is going to be past data community summit taking place in Seattle, Washington, November 17th to 21st, where me and Kendra Little are going to do two days of probably the most spectacular T SQL pre cons you’ve ever seen in your life. So I hope I hope you’re prepared. Anyway, it is it is Halloween month. So we are getting we’re having a Halloween database party in here. So let’s let’s let’s get cooking with this partitioning stuff. Why don’t we. So what I’ve done in the stack overflow database is really the only table and stack overflow that lends itself to partitioning well, because it’s long and skinny is the votes table. And I partitioned by a column in the votes table called creation date.

Because creation date gives me sort of the most even spread of part of partition data, even though it’s not like perfectly symmetrical, it is far more symmetrical than if I did like vote type ID or user ID because lots of people who voted once. So like, you know, stuff like, you know, stuff like that. So like even post ID, things get like no votes, right? Like things get like one vote. So it doesn’t make a lot of sense. So creation date makes the most sense. The partitioning setup itself doesn’t matter too much. It’s a partition range, right? For every year that’s in the table. So it’s like 20 2008 to 2013. So if I say, give me the min creation date from the partition votes table.

This is very quick, right? This is very quick, right? This is very quick, right? This is very quick, right? This is very quick. This turns out just fine. I have no complaints with this. This is okay. Now I’ve created some indexes below that we’ll talk about in a minute. But I just want to show you a quick difference.

Now we’re going to be not using the partitioning column anymore, right? So but I want to force SQL Server to use the clustered primary key on both the votes partitioned and the votes table here. Okay, so we’re doing the same query. We’re just hitting the partition table versus the non partition table and using the clustered index for both of these queries. When I run these two, things start to get a little wonky. They both return the same row, of course, but the partition table takes nearly two and a half seconds to do the exact same thing that the non partition table did.

All right. And this is again, these tables are identical aside from the partitioning. They both clustered on the the I mean, this one is only clustered on the ID column. This one is, of course, clustered on creation date and ID, right? Clustered primary key. But we’re selecting vote type ID. So that’s a little immaterial to this. They’re both page compressed. So there’s nothing weird there.

This just the just immediately aggregating vote type ID from the partition table takes like two and a half times as long, right? Go from one second to two and a half seconds. So already we’re at a little bit of a detriment. So the indexes that I’m creating the nonclustered indexes I’m creating this to show you some differences are we’re going to create one on vote type ID on the votes table, right? So this is not partitioned. We’re going to create one on the partitioned votes table that is aligned to the partitioning scheme, right?

So this is aligned on the partitioning scheme on creation date. And then we’re going to create a non aligned index on the votes table. So this is on primary rather than being on the partitioning stuff. And the reason I want to show you this is because once you create a non aligned index on a partition table, you lose all the ability to swap things in and out. Having a line nonclustered indexes means you can quickly swap partitions in and out of the table. Having non aligned indexes mean you’ll get an error if you try to do that.

Now, if your environment can if it’s palatable to your workflows and environment, you could, of course, create non aligned indexes to help queries. And then when you need to switch data in and out, you could drop them, switch the data and recreate them. That still might be faster than whatever you were doing before. I don’t know. It’s all about if you’re if it’s tolerable to your workflows or not.

I can’t tell you that unless you hire me. Don’t worry. My rates are reasonable. Anyway, we’re going to compare a few different queries using the aligned. Well, we’re going to compare to the non partition table and then we’re going to compare using the aligned and non aligned indexes on the part on the partition table. So the first one we’re going to do the same thing as before, but now we’re going to tell SQL Server use like this is really just for demo clarity.

And also because sometimes the optimizer will choose the non aligned index anyway, because it’s like, oh, yeah, I’d rather use this thing. So like I’m forcing SQL Server to use these just to show you like what things might look like if you only have aligned indexes on your partition table. So we’re going to run this one. This is against the non partition votes table, right?

And we’re telling it to use the the vote type ID index. This all finishes very quickly. Zero milliseconds, yada, yada. Fine. If we use the non aligned index on the partition table now, right, this is the non aligned version of the non non aligned nonclustered index. This also goes just fine, right? SQL Server very quickly finds that once we go and use the aligned index on the partition table.

So this is the one that we created that is aligned to the partitioning scheme. It is not so instantaneous anymore. This takes nearly two seconds. This almost exactly reflects the scanning that doing when we use the clustered index for this query took two and a half seconds. So adding a nonclustered index shaves like half a second off this. Let’s just be fair and say like, well, 1.898. That’s almost 1.9.

So let’s just call it like 600 milliseconds. We save 600 milliseconds creating an index. Not great. Right. Not not moving the needle for me. So like we can see that like having aligned indexes on partition tables is immediately slowing some types of queries down. I’m going to show you another type of query that slows things down. So let’s just say that we wanted to get the top five rows. Right.

So we’re going to say get the top five from the regular votes table and we’re going to tell it to use the vote type by the index for this. Right. Just to get demo clarity. We’re forcing the index here. This very quick. Right. Execution plan. Just what you would expect. We get five rows. We loop join. We key look up. We get the top five. Right. Take zero seconds across the board.

So we’re going to say that. Right. If we use the non aligned index on the partition table. Also very quick. Does the same thing. We have a compute scale error in here now, but the whole thing takes zero milliseconds anyway. Okay. Well, what if we do this on the aligned index? I could run this here. I could. But we’d be waiting a little while.

So I’ve run this ahead of time. You’re welcome. I try to save you some time because Microsoft won’t. If we run this and we get the top five from the partition table using the aligned index. This takes a full minute and six seconds. This looks a little bit different, doesn’t it? Right. If we look at what this does, we will be scanned the whole index. So we go from six seconds to almost 13 seconds. That takes about seven seconds. And then we spend the rest of the time. Well, we spend 45 seconds doing a 52 million row key lookup.

And, you know, nested loops joined to facilitate the key lookup. Right. So one row comes out of here. We go look it up down here and we do that 53 million times just about. Right. And then, you know, I guess a minute and three seconds there and then another three or so seconds getting out to the end of the plan here. So that plan looks a whole lot different from the plan that we got using the non aligned index. The non aligned index just goes, finds the top five rows, goes and looks up the extra columns we need and we’re done.

You’re not do all that other stuff. We are not able to get down to just the five rows when we use the aligned index. Again, the query plan for that is over here. And we are going to end the video with this nicely framed up because that’s a good way to end this video. So to summarize, partitioning, it’s a data management feature. It’s there for you to quickly swap partitions in and out of your partition table.

If you combine partitioning with clustered columnstore, you may see some performance improvements around segment or row group elimination. We didn’t dig into that in this video. We’ll do that in another video, probably. But there are lots of queries that SQL Server might have a very hard time with in your workload. If you decide to partition a table and not not index things any differently, you might need to create non aligned indexes to make these queries faster.

And you may need to drop and recreate those indexes around your partition swapping, assuming that you’re going to do that sort of thing. If you’re not going to do that sort of thing, there is absolutely no sense in you partitioning a table and then creating a bunch of non aligned, nonclustered indexes on it anyway. All right. I think that’s good enough here.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that you won’t partition your tables and I will see you in the next video where we will talk about something equally as interest. All right. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Lot About Multi-Statement Table Valued Function In SQL Server

A Lot About Multi-Statement Table Valued Function In SQL Server


Video Summary

In this video, I delve into the intricacies of multi-statement table-valued functions in SQL Server, exploring their unique behaviors and performance implications. Starting with a practical example, I demonstrate how these functions can behave differently based on whether they receive literal values or columns as parameters, highlighting the impact on execution plans and performance. By walking through various scenarios, including interleaved execution, rebinds, and option recompile, I aim to provide insights that can help you make informed decisions when working with such functions in your database projects. Whether you’re a seasoned SQL Server professional or just starting out, there’s plenty of valuable information here to enhance your understanding of these often-overlooked features.

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we’re going to talk about a multitude of interesting things around multi-statement table valued functions. That’s about as much as I can give you here because if I tried to explain it all here, the mind would boggle, you would implode, you would just have a bad day. You’d be like a fried egg in your head. So thank you, PowerPoint, for finally responding to me. If you look down in the video description, round about here, there are all sorts of helpful links. You can hire me for consulting, buy my training, become a paid subscribed channel member, ask me office hours questions, and of course, one of the most valuable things that you can do if you enjoy this content is to like, subscribe, and of course, tell a friend. Keeping it all to yourself is of course very selfish. So we want to avoid selfishness here at Darling Data. While you’re watching this, I will actually be in Utrecht. Amazing, isn’t it? I’m not going to be here where I’m recording this. I’m going to be way over there. I got on one of these things and flew. So that means the only thing left for me to talk about is past data community summit taking place in Seattle, November 17th to 21st, where I will have two days of delicious T-SQL pre-cons with Kendra Little. And then I’m home for a bit until someone else decides to pluck me from my domicile.

And send me spiraling out into the world to blather on about SQL Server stuff. Anyway, with that out of the way, let’s talk about multi-statement table valued functions and all of the interesting things that can happen. So I’ve got this function here, which is a multi-statement table valued function. You can tell because it returns a table variable, right? And I’m not saying that this is a good idea. Usually I prescribe against doing this sort of thing. But there are some interesting things to talk about if you ever want to know why. You shouldn’t do this sort of thing. So we’ve got that going for us. There are, of course, times when they’re fine and actually useful and stuff like that. But usually it’s in the hands of quite a skilled T-SQL practitioner. And if you do not, if you have not elevated yourself to that level, you should probably leave the dangerous stuff on the ground. Don’t pick it up.

This particular function has a thing that we’ll talk about later. We can call this some ominous foreshadowing here, where I’ve sort of treated the two parameters that this thing accepts as optional parameters. And I’ve done this. This is also something that I generally prescribe against. So don’t look at this and like aspire to this code. This is not what you should be doing. Okay, so leave this out, right? Like, say, Eric said, probably don’t do this unless you’re a skilled T-SQL practitioner who understands the full ramifications and is making an informed choice about writing bad code for a demo in Minecraft.

So what’s interesting here is actually, we should explain a little bit. Actually, I need to do one thing just to make sure that we don’t end up with anything funny. I don’t want to spoil any surprises. So we’re going to make sure I do that. But so what I’m doing is I am essentially selecting from the users table and I’m getting everyone with a reputation over 800,000. Right. And let me just add a column to this. Say c equals count big. So make things a little bit more clear here.

So the reason why I’m passing in the literal values that you see here is because of the data that I get back from this query. Right. So if I look at the users who have a reputation over 800,000, I want to know what their min and max creation date and last access date are. And just for you later, I want to show you how many of them there are.

So the min and max for those are 2008, 913 and 2018, 0902. And we get four of those rows back. Four. Remember the number four from this query. And now what I’m going to do is I’m going to execute this one and we’re going to look at what happens.

This takes about seven seconds. We did it. And we get eight rows back. But the eight rows that we get back are essentially for four users. Right. So these are the four users that we get back rows for. I know that it’s eight rows, but it’s four users that we get rows back for.

So each execution of each row that we produced from the four rows that we produced from the users table produces two rows from the function. Right. Four times two, eight. Cool. Let’s go look at the query plan. And what you might think looking at this query plan is, Eric, this took seven seconds, but I don’t see seven seconds of time anywhere in here.

And you would be absolutely correct. There is not seven seconds of time displayed anywhere in here. This all looks like it took 256 milliseconds. Well, if I were me and I were looking at this, I might think, well, did that thing just take a long time to compile?

But no, the compilation on that was very low, four milliseconds. All right. That’s not it. So what happened? Well, this function has a particular thing applied to it.

The particular thing applied to it is called interleaved execution. And we’re going to see that in the properties over here. I clicked on the wrong thing. Sorry about that.

So is interleaved executed is true. If it were not interleaved executed, we would simply not have this node. It would not say is interleaved executed false. It would just not be there.

Love how Microsoft does that. It makes you feel crazy. So this has a optimizer feature released in 2017 called is interleaved execution is true. And what that means is SQL Server ran the function and then kind of paused and then like tried to do some cardinality estimate stuff based on the function.

The way that it does that is by using this thing called a sequence operator. The sequence operator ensures that things happen in a sequence. It has sequenced the events.

So the sequence is it executes up here and tries to do some interleaved execution stuff where it tries to make a cardinality estimate based on what ends up in the table variable inside of the multistatement table valued function. Of course, the guesses aren’t really all that great, but we’re using a table variable. So you get what you get and you get what you deserve.

Okay, cool. Well, why is why is all this interesting? Well, let’s run this again. Let’s run this a second time.

And I know I apologize for making you wait another seven seconds. It’s not my fault. This is Microsoft’s fault. So if you would like to blame anyone or invoice anyone for your seven seconds, make sure it’s Microsoft. A second execution of this query plan, once it’s in the cache, now we see seven seconds.

Look, it showed up in here. This is no longer 200 something milliseconds from the stuff down here. Now we see the 7.1 seconds here.

So when you have a multistatement table valued function that receives interleaved execution, only the second execution shows the time spent in the function. Isn’t that fun? Isn’t that just great?

It’s wonderful. Doesn’t drive you nuts at all, does it? But what is interesting here, aside from that, is because we have passed in two literal static values here, we have done something. What is the thing that we have done?

Well, we have made sure that this function will only execute once. You can tell how many times this function has executed by the rebinds. All right?

So we have a rebind is something that happens with like spools primarily, where you might see rebinds and rewinds. A rebind means that you have gone out and you have fetched data into something. A rewind means you have used data in that something.

So primarily a spool. If you see like a lazy spool or something, you’ll see rebinds and rewinds. And the rebinds tell you how many times you went and put data into that spool. And the rebinds, like every time you like truncate the spool and put new data into it.

And the rewinds are every time you reuse the data in this spool. So it’s sort of like cache hit versus cache miss. Great.

What’s next? Well, if we run this a little bit differently and we use the creation date and last access date columns from the users table now, rather than the two values that we put in there, we’re going to use the creation date and last access date.

We’re going to do everything the same except that. This is going to take a little bit longer. And by a little bit longer, I mean 30 seconds.

And what are we going to do for 30 seconds? I don’t know. I could ask you how your day was. Ask you how school was.

What did you get up to, kid? Stuff like that. You can bond a little bit. Little father whatever you are bonding while this query executes. And, you know, you could answer me politely and say, I don’t know, whatever.

Whatever. School was school. But 30 seconds later, our chat will be over. Our bonding time will be done. And we’ll have a query plan to look at.

We get the same rows back, right? There’s the same, you know, four people with producing two rows apiece that we got back before. All well and good. But our execution plan is much different now.

There has been no sequence of events. And we get all the execution time up front. If we look at the properties of the multi-statement table valued function now, we have four rebinds, right? So we went from one rebind to four rebinds.

Because every time we correlated to this function, we had to pass in two new values. The two new values or whatever the creation date and last access date were from the user’s table. So not a good time here.

And another thing that’s not a good time here is we have this filter operator. And this filter operator is saying where the owner user ID column coming from the function equals the ID column coming from the user’s table. So this isn’t a great strategy in general.

Well, you know, if you’re going to do this sort of correlation outside, you should probably think about pushing this correlation inside, right? It seems like a reasonable thing. So we’ll add a parameter to this, to a new version of the function called multi-optional pushed, right?

And we’re going to add owner user ID. And we’re going to expand our bad idea where clause to include owner user ID here, all right? So let’s, and again, more ominous foreshadowing is the option recompile here.

So let’s create or alter this function. And now when we run this, right? So now we take the correlation out of here and we use the ID column here.

We’re going to start by passing in two literal values. Because what I want to show you is that as soon as you start passing in a column here, we have to start executing and rebinding the function more than once. This is where the extra time comes in, right?

So like when we had a situation earlier where, you know, the function took seven seconds, now we can sort of get that back by pushing the correlation in. And granted, we lose the multi, whatever, we lose the multi-statement table-of-added function interleaved execution thing. We don’t have that sequence operator anymore.

But now we have, and we do rebind this thing four times. Fine. Okay. Now we’re going to see the exact same thing here, right? And this is, the annoying thing is not that like this still takes seven seconds.

The annoying thing is that pushing that predicate in is not improving life for us, right? So like, we want to improve this in a way. What we’re messing up here is when, like, you can’t see it in here, right?

Because when you look at an actual execution plan for a multi-statement table-valued function, you don’t see what happens inside the multi-statement table-valued function. Like, up here, like, we can see what’s going on in the function. We select from posts.

We, you know, group by these things. We get a sum of this. We filter on the sum. Like, we can see what’s going on, right? If we get the estimated execution plan for this query, now we get an additional query plan back, right? The additional query plan that we get back is for the function.

We don’t get this back with the actual execution plan because then we’d see, like, four execution plans for the function, right? One for each row that comes out of the user’s table. If we’d returned a lot of rows from the user’s table, like 1,000 or 2,000, we’d kill SSMS.

But if we look at the scan over here, we can see that we’re doing the typical thing that happens when we use this sort of optional parameter query syntax, where we have, like, all of these predicates expressed out fully, like, owner user ID equals isn’t all this. Creation day is greater than isn’t all that.

Creation day is less than isn’t all that. So, like, this is why you don’t want to write queries that way. What’s particularly annoying is that if you add option recompile outside here, we still get the same problem, right? So, this is where things get a bit sticky and tricky, right?

We still get the same seven seconds here, and we still get the same index scan in here, where SQL Server does not get the parameter embedding optimization involved inside of the function. Even though we’re like, hey, query, option recompile, when we don’t have that inside of the function, things break, right?

Or rather, things don’t go all the way we planned. So, I’m going to add option recompile inside of the function, right? And we’re going to recreate this thing.

And now, when we run this, it’s very fast, right? And now, even when we put in all of the columns, it’s very fast, right? So, now, rather than seven seconds, this takes 200 milliseconds.

The cost is, of course, recompiling the function every time that it runs. But when you look at the execution plan in here, rather than doing that big index scan with all of the is nulling on there, we do get the seek predicates of the literal values in here.

So, there are ways that you can, like, if you’re just not allowed to rewrite multi, like, so, think of it two ways. If you’re not allowed to fully rewrite a multi-statement table-valued function, or if your multi-statement table-valued function is simply, like, too long and involved in too many, like, if, then, else, blah, blah, blah, things for you to turn into a single inline table-valued function, there are some things that you can do under certain circumstances to get your multi-statement table-valued function faster.

One of those things is, of course, like, option recompile, if you have the, like, sort of, like, is null parameter column thing going on. And the other is to start pushing, like, predicates that you would normally have outside of the function as things inside of the function.

It’s two ways of getting around that stuff. So, like I said, this was going to be a whole bunch of stuff about multi-statement table-valued functions all at once. We talked about the rebinds and the rewinds.

We talked about interleaved execution. We talked about sequence operators. We talked about all sorts of things that were interesting and fun. But now I’m out of things that are interesting and fun to talk about. And so it’s time to go.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video. And don’t forget to look in the video description for helpful links. All right.

Should I say that every time? Probably not. There’s no one over there. Don’t worry. I’m talking to myself. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Optimizer Nudging In SQL Server

A Little About Optimizer Nudging In SQL Server


Video Summary

In this video, I delve into optimizer nudging—a technique where we rewrite queries to guide SQL Server’s query optimizer towards the execution plan that best suits our needs. We explore various methods, such as using `TOP` in subqueries and leveraging primary keys for faster performance, while also touching on how index alignment can significantly impact query efficiency. By separating relational elements from informational ones, you can craft more optimized queries. Additionally, I provide links to valuable resources in the video description—ranging from consulting services to training opportunities—that can help you master this technique further. Whether you’re looking for professional assistance or want to support my content, there are plenty of ways to get involved and improve your SQL skills.

Full Transcript

Hello again. Welcome. I’m glad you’re here because we’ve got a couple interesting things to talk about. We are going to discuss optimizer nudging. And of course, by optimizer nudging, I mean the process of rewriting queries in order to get the optimizer to do what we want. So this is just a brief video about that. We cannot go into depth about that because then you’d have to pay me and then you wouldn’t be watching this for free. So we’re just going to, we’re going to just do a little bit about that. Uh, if you look down in the video description, right somewhere in here, uh, you’ll see all sorts of very helpful links. If you need help nudging the optimizer, well, you can hire me for consulting. Uh, if you want to learn how to nudge the optimizer further, you can buy my training. If you just want to support, uh, my, my continuing to provide free, free help with, uh, nudging the optimizer, you can become a channel member and contribute money to the channel. Uh, you can also ask me office hours questions. And of course, if you like this content, uh, the, one of the more, one of the most helpful things you can do is of course, like subscribe and tell a friend because if you keep it all to yourself, it’s very selfish. Uh, pass on tour. This will be one of the last opportunities for you to purchase a ticket for pass on tour Utrecht, October 1st and 2nd. So, uh, I forget.

I forget which day I forget which day this week. I have to stop saying that, but I’ll figure it out before, before I, before I mess anything up. And of course, past data community summit, Seattle, November 17th to 21st, two days of T SQL pre-cons with Kendra little. That’s me, me and Kendra little Kendra little. And I will be there teaching you amazing things about T SQL. And then I’ll be home for a bit until I don’t know, someone else decides to pluck me from my domicile and throw me out into the world and expose me to all of you. people again. So anyway, let us begin by nudging the optimizer and fun, various and fun and sundry ways or something like that. So, uh, I’ve got an index created here on the post table on creation date, uh, and then score. And, uh, you know, we’ll, we’ll, we’ll, we’ll talk more about this later, but for now, just stick with me. So, uh, we’ve also got this query and we’re saying, give me the top thousand rows, uh, where creation date is between these two dates or betwixt, not exactly between, because between would be less than and equal to two.

Right. Between is inclusive in that way. Whereas this is exclusive of the stuff. Uh, and if we run this query, uh, we’ll get what I would call a sort of ugly query plan. Right. Uh, we, we start by, uh, seeking into our index and we get kind of a lot of rows out of there. And then we, uh, we do a key lookup. And of course this, this is a row sort of a row by row operation because it is a nested loops join and the whole, and then we end up sorting the data over here.

So we still have, we, we, we created an index, but we still, we still have to sort data, which is, which is a bit upsetting. And then at the very end, we, we top out with our thousand rows and we, uh, we, this all takes about 750 milliseconds. Now the situation can of course get worse if we hit upon one of SQL servers, many tipping points.

For example, if we say select the top 1000 and we increment this date by one to 2013.0318. Remember this was 2013.0317 up here. Uh, SQL Server will neglect to use our nonclustered index entirely. And this will all take about 900 milliseconds after scanning the post table and all that other stuff.

So let’s say that we’re unhappy with this and we want to nudge the optimizer in again, various and sundry ways. Uh, we might decide to rewrite our query like this, uh, where rather than allowing SQL Server to choose what we’re, what the, between doing a key lookup and a clustered index scan. Let’s say we do this, right? And we say, select the top thousand rows, uh, from posts where P dot ID is in yada, yada.

And you know, we, we do all that same stuff. Well, we can get a much faster performance doing this because we’re kind of taking advantage of some stuff that we know about SQL Server. One, it digs primary keys. Primary keys make things very easy. So if we just, uh, say select star from the post table where the primary key, which is the clustered primary key, which is the ID column is in these top 1000 rows.

Well, SQL Server doesn’t really have much to worry about here. Cause this is the, this is a very narrow select list for the top 1000, right? And, uh, even if we were to bump this up to that 18 number, there was a tipping point before, well, SQL Server is still, uh, keen on doing what we want.

Right? So this is, this is all a much better arrangement than just writing the query like this. Now, of course, if when you are first writing queries, uh, you, you should, uh, arrange those queries logically in a way that makes sense, right? A simple and logical way that attempts to get data the way that you want it.

And then if you are unhappy with the performance of those queries at the outset, then you might experiment with different syntax. And this is kind of what this is all about. So, uh, adding in the, the, the select top 1000 to a sub query, we’re still almost doing a key lookup.

We’re just doing it a little differently. Right? Because we still have to look those columns up. We’re just writing the lookup in a manual way, right?

Cause we’re saying select star from posts up here. And then we’re saying, but you know, just where the ID is in this. And that’s what SQL Server does. It gets the thousand rows that we care about in here.

We still have to sort data, which we’ll talk about in a minute. Uh, and then we can see that this is where the top isolates those thousand rows we care about. And then this is where we go and do the lookup, right?

Cause we still, we still have, we have the nested loops joining. We do a lookup based on the thousand rows that we got based on the stuff we cared about up there. And we seeked into the index down there to get the columns we care about just for those thousand rows. The very, very handy thing.

And this comes, this comes down to something that I’ve talked about a bunch of times in other videos, which is sort of separating relational elements from informational elements. Informational elements are things that you’d want to show to people. And relational elements are things that you need to filter on and order by and other stuff like that.

So separating those two in your mind, like manually separating those in your mind, and then manually writing the query with those separations in mind can do very good things. Of course, one thing that is a little, a little annoying, and I only say a little annoying because, you know, it’s just how SQL works.

But you can’t, you couldn’t use exists for this. All right. So like a lot, a lot of like videos that I, where I’m like, Hey, exists and not exists. They’re pretty great. Problem is you can’t put top in an exists like this, right?

So like in, in, in this, in the, the query plan for this query up here, we very clearly have a top operator limiting the rows to 1000. But down here in this exists query, if we were, oh, scroll down a little too far, blew the whole gag. Uh, uh, if we do this, when we just hit control L notice, we don’t have a top here.

SQL Server completely throws the top away. Right? SQL Server is like, no, you don’t get a top. You think they’re not limiting rows in the exists.

Top is gone. All right. So we couldn’t do that here. We could use an outer top with exists. So top 1000 out here.

And then this in here, the thing is it was like, I mean, I’m gonna compat level one 60 for this. So I get batch mode on rowstore. So with batch mode on rowstore, of course, the query plan looks like this and I get, you know, uh, I don’t know.

It takes about 600 milliseconds and I get an adaptive join and all these other batchy things happen in here, which, you know, is fine, but it’s still not the plan that we’re really after. Right? Cause we want to limit those thousand rows before we do anything else.

Right? So like the adaptive join here is just like, and batch mode are just kind of like, oh, look, we saved the day. If we were using an older compat level, we might see a plan that looks like this with a loop join. Right?

And, you know, we kind of, we’re kind of just back to the original plan. We had that took about 700 something milliseconds, 750 milliseconds or whatever, but the 20 millisecond difference here isn’t, isn’t interesting. Uh, but it’s, it’s almost like we have, we’re still doing like the manual key lookup, right?

It’s just not called the key lookup here. It’s we’re just, cause we’re just joining the two tables together. Right? That’s what exists is doing.

It’s just joining essentially, or semi joining, depending on how you look at it. So we’re getting the top thousand rows from posts and saying P dot star where exists, but like this doesn’t like, I don’t know. I, the, the other, the other way of writing this that I showed you before with the in was, you know, with the, we could put the top 1000 in the in clause and just get those thousand rows.

We ended up way better off. It was like a hundred something milliseconds versus, you know, 700 something milliseconds. Now, one thing that is worth talking about is you could also consider this to be an index problem, right?

So like a very important concept when you’re performance tuning queries is aligning queries to indexes, right? So if we look, remember the index that I created up here was on creation date and score and score the, by default, this is going to be ascending order for both of these. All of the queries that we’re writing have creation date and ascending order, right?

So that’s why we still had to sort data across all of these. If you, if we were to think about this as like, you know, a query index alignment problem, and we took the descending, like sort out of score, right? We just said score ascending, then we would end up with a query that’s pretty fast naturally on its own, right?

Like this takes two milliseconds. Well, that’s even better, right? That’s, that’s pretty great. We don’t have to worry about sort of getting more in depth with things and rewriting queries and funny ways within and pondering why exists doesn’t allow top and all that other stuff.

So we might consider this just an index alignment problem. And if we create an index on creation date with score descending, right? So now score is in explicitly in descending order in this index, and we go and run that original query.

Well, guess what? Our original query doesn’t need rewriting either. So if you want to think about, you know, if you’re in a situation where you can’t really change indexes because maybe the tables are too big or you’re on standard edition and, you know, the blocking, you know, sometimes you have to think about rewriting queries in a way that takes better advantage of the indexes that you do have.

If you are allowed to change indexes, then you might, and you’re not allowed to change queries, which is a frequent thing for a lot of the sort of third party apps that I see people using SQL Server with, then, you know, doing something like this can make queries way better. Of course, if you’re allowed to change both, then it’s really, you know, chef’s choice as to how, what temperature that steak gets cooked at. You can, you can choose to either rewrite the query in a neat way, hopefully with some comments, or you could change, or you could add or change an index that would more better conform to the way that you have logically written your query.

But these are both forms of optimizer nudging, where we either have our queries conform to the indexes or have our indexes conform to the query. In either case, we end up with better performance. All right. I think that’s probably about good here.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And do remember to visit the video description down below for all of the helpful links that I can possibly provide for you. All right. Cool. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 35

SQL Server Performance Office Hours Episode 35



Questions:

* I have a large table where 60% of the rows are defunct, the table is updated nightly by an ETL process that takes several hours. To improve query performance during the day, would you recommend using filtered indexes, or an archive table?

* Hello there! If a table that has a computed (not persisted column) that uses GETDATE(), is used in a scalar function, that function is never going to be inlineable no matter what, regardless if that column is not utilized. A covering index will not work either.

* What features would you like to see added to constraints in SQL Server?

* I often performance tune reasonably simple parameter sensitive queries that search date ranges. Before I was touched by your magnitude, I would approach this by dumping the sub-queries into a temp table. Now, I use dynamic SQL. Do you ever use temp tables for simple parameter sensitive queries? Why/why not?

* Why do so many people hate business logic in databases, and even stored procedures?

To ask your questions, head over here.

Video Summary

In this video, I dive into a series of community-submitted questions during my Office Hours session, providing insights and advice on topics ranging from managing large tables with defunct rows to the intricacies of computed columns in SQL Server. I also discuss the potential benefits of using temp tables for parameter-sensitive queries and address common misconceptions about business logic in databases. Whether you’re looking to optimize your database performance or just curious about best practices, there’s something here for everyone. If you have a question of your own that wasn’t covered, make sure to check out the video description where you can find links to submit your Office Hours questions directly.

Full Transcript

Well, through the magic of time travel, when you’re watching this, I will be in Europe. So I will be at Red Gate, Netherlands stuff. So that’ll, that’s fun to think about. All right. Next week at this time, where will I be? Anyway, we are doing, because it is time for that, Office Hours, in which I answer five community-submitted questions. Wonderful, right? Good stuff. Look at me. If you want to submit a community question, look down on the video description. You’ll see all sorts of helpful links. One of them is to ask me Office Hours questions, and you can, you can go get your question answered on this thing, if you, if you have one that you, you deem worthy of my time. Of course, just about anything is worthy of my time. So don’t set the bar too high. The bar is somewhere in hell, probably. With some damn soul twisting on it. So submit your questions. Thank you. There are also other helpful links in there. If you want to hire me, buy my training. I recently dropped my performance engineering course, which is, there, there is an on sale link down there. If you’re interested, if you’re interested in that. And, you know, some other stuff too. Like I said, by the, when you’re watching this, I will be on, like, I will be in Europe for the Utrecht event.

So, uh, what you should do is if you, if you’re in the Utrecht area, uh, maybe, maybe this is your last chance to buy a ticket. So check that out. Uh, and of course, after Utrecht, I will be home for a long time, uh, relatively speaking, and then going to Seattle, uh, November 17th to 21st for the past data community summit. Where, uh, Kendra Little and I will be delivering, delivering like a fastball, uh, two days of T-SQL, uh, pre-cons. And then I will be home, uh, I don’t know, doing things. What will I be doing? I don’t know. Figure it out when I get there. Anyway, let’s answer these questions because that is what we do for office hours.

And, uh, you know, try to do our best. Anyway, uh, first question here is, do, do, do, do, do, do. I have a large table where 60% of the rows are defunct. What a, what a nice word, defunct. Uh, the table is updated nightly by an ETL process that takes several hours.

Well, I mean, my rates are reasonable. Just saying. Uh, to improve query performance during the day. All right. So it’s a little mixed bag of information here, isn’t there? At night, this process takes several hours. During the day though, uh, would you recommend using filtered indexes or an archive table?

Well, if 60% of the rows are defunct, um, it seems like they’re just taking up space, useful space in this table. There could be, you could, you could be doing other stuff with it. Um, I mean, sure. Filtered indexes are nice. They’re smaller, right? They’re more compact. And if your queries meet all of the, you know, sort of prerequisites for, uh, filtered index matching, then, uh, they, they could be quite useful to, to screen out 60% of the rows there.

But, uh, general, general, if I were you, I would probably be working on an archive process because, um, you know, there are all sorts of rules for filtered indexes, what, what kind of columns you can make them on, determinism, uh, you know, um, whether queries match to them appropriately, uh, stuff like that. So, uh, I mean, I would probably go with an archive process.

If you want to read about a really neat trick for that, uh, I actually have a post on my site. So if you search on erikdarling.com for, uh, simplifying archival processes, uh, you’ll see a neat piece of sort of nested DML query that will make archiving data a lot easier for you. Um, you know, uh, if you’re too lazy, then sure, filtered indexes might help depending on stuff.

But, uh, you know, generally, uh, I would probably, if I, if I had more than 60% of the rows in a table not being useful to queries, I would probably just want to get rid of them. All right. Uh, do, do, do, do, do, do. I don’t see a question mark here, but that’s okay. Uh, hello there. Hello back.

If a table that has a computed, not persisted column, well, it can’t be persisted if it uses get date because it’s not deterministic, uh, is used in a scalar function, that function is never going to be inlineable no matter what, regardless if that column is not utilized. A covering index will not work either. Um, yes. Uh, one of, one of the restrictions on UDF inlining, uh, is that, uh, you can’t have non-deterministic functions in there.

Things like get date, sysdate time, new ID, ran, things like that. Uh, those make, those make for non-deterministic results because they are, uh, they are decided at runtime, not, uh, stored forever and ever in a way that would make them deterministic. So, um, what, so since you can’t persist that computed column, then, uh, enabling trace flag 176, Paul White has a wonderful, uh, post called properly persisted computed columns, uh, in which he discusses how to, uh, get around what you’re talking about, but, uh, get date there is going to screw it up, uh, in the permanence.

So, um, maybe don’t do that. If you need other options, my rates are reasonable. Uh, here we go. Oh, this one has a question mark. What features would you like to see added to constraints in SQL Server?

Uh, I’m not that guy. Um, I, I realize that there are all sorts of, uh, constraints that other databases have that SQL Server doesn’t. Uh, I just don’t spend a lot of time, uh, creating them. Uh, so, you know, like, you know, I realize check constraints have some deficiencies when compared to other, uh, databases, particularly to Postgres.

There are no, like, real domain constraints, things like that. Um, you know, they’re just, like, really complex constraints, um, you know, don’t often work out well. Uh, you know, even weird little shortcomings, like foreign key constraints, uh, can’t be filtered or something like that. You know, there are, like, all sorts of things that might be nice to have, but, uh, I, uh, you know, I question the, uh, the amount of effort that would go into adding them to their product, uh, based on, or versus, uh, how many people would actually use them since, uh, I don’t see a lot of people generally using constraints all that well or all that wisely now. So what would I like to see added? I don’t know.

There might be some neat stuff in there, but, uh, in general, um, I don’t really see people using them anyway. Um, some temporal constraints might be interesting, like a really easy way to figure out, like, the most recent active row or something, or, you know, something like that, but, uh, you know, people are strange. People are strange. Sorry. I don’t have a more explosive answer for you there.

I just, I just don’t get excited about constrained data. It makes me just think of tight underwear. Uh, I often performance tune reasonably simple parameter sensitive queries that search date ranges. Before I was touched by your magnitude, oh, well, speaking of tight underwear, uh, I would approach this by dumping the subqueries into a temp table. Now I use dynamic SQL. Do you ever use temp tables for simple parameter sensitive queries? Why, why not? Yes. Um, the answer is absolutely yes. So, uh, where temp tables make for, uh, a good, um, sort of parameter sniffing, uh, sensitivity issue is, uh, uh, let’s say that, um, like, just to make things simple, let’s say you have a store procedure that accepts one parameter and, uh, the table that, and that parameter is a sensitive one and the table that you hit, uh, you know, depending on how many rows that, uh, that parameter produces could be a nightmare.

Right. And let’s say that, you know, like, you’re like, we’re not just selecting from that table. Let’s say there’s some like, I don’t know, joins or exists or not exists or other stuff like that. Um, it can absolutely help to use a temp table to dump the results of the select for, from that table into there, and then do your more complicated part of the query that doesn’t have any parameters touching it with the results of that temp table. Indexing that temp table could also be useful depending on what is going on elsewise in the query, of course, but, uh, you know, I can’t tell you that based on your question. Um, the only, the only thing you have to really be careful of is if, uh, you know, how sensitive that parameter thing is, if it’s a difference between like, you know, a few hundred to a few thousand to even like, maybe even like a couple million rows, um, then a temp table can be perfectly fine. But if the, the magnitude of your, uh, sensitivity is, um, in the tens of millions or hundreds of millions of rows, then, um, you know, probably don’t want to dump that into a temp table. So be a little bit cautious there. Know your data or something like that.

Love your data. Uh, let’s see here. We got, okay. This is question number. Let’s make sure we got two, one, two, three, four, five. So this is the, this is five questions. I did not screw that up this week. Lucky me. Why do so many people hate business logic and databases and even store procedures? Well, people tend to hate what they’re bad at. That’s it. Uh, I hate PowerShell.

I’m bad at PowerShell. Uh, I hate using it. I hate typing it. I even hate using an LLM to generate it because I just hate the way it looks. I hate dealing with it. Uh, um, I am not good at figuring out what PowerShell is doing and I have no interest in getting better at. Uh, I, I frankly think it’s kind of a crappy language. So I hate PowerShell. So I don’t want to use PowerShell. So I don’t use PowerShell and I hate PowerShell. So when people talk about like things like in, in a very generic sense, like business logic, uh, or store procedures being bad, it’s probably because they’re bad at them. Right. And if, if, if I were terrible at SQL and I was very good at C sharp or some other programming language, uh, Java, maybe, I don’t know. I could code in crayon with Python or something.

No, we got, I’d probably be like, oh, screw that database. Oh, it’s hard to do all that stuff. I can do all this stuff in application code that I know well. So, um, you know, they’re probably just people who are bad at SQL. They’re, they’re, they’re bad at designing things and within the database, that’s about the end of it. Um, I I’ve never seen, uh, I’ve never seen a valid argument for it. Uh, again, sorry. I’ve never seen a valid argument against business logic in databases or store procedures that didn’t, um, also express what would seem like willful ignorance about SQL as a language or the possibilities of SQL as a language. Um, or the rather the capabilities of SQL as a language, um, that like I’ve never just never seen one. Uh, they’re all just like, Oh, I did this one thing once and it was bad and it didn’t, it was slow or like long rambling things. And you’re like, well, if you, if you were any good at this, you would have done it different. Right. You tried this one thing and it sucked and you were like, never again.

Right. It’s like, I don’t know. It’s a weird take. It’s a very weird take. Like don’t, don’t have your database do anything with data. Don’t put logic where the data is. Okay. Okay. Well, you know, uh, you know, I guess a little bit further to that, if, if you’re that type of person, um, you know, you could very easily find, uh, teams that agree with your sentiment and go, go work there. Um, but I think, uh, if you are a ma, if you are managing that kind of team and you start making rules like that, um, that that’s, that’s a bad idea. You, you should, you should allow your team to, um, to work where they are most comfortable and most proficient, do not set arbitrary rules about where, uh, certain logic should live or which APIs or procedures artists, another API, uh, should be used to interrogate the, the data in a database because you’re, you’re just hamstringing them. If you say, no, you can’t do, you have to use an ORM now and you have to put all the logic in some other piece of code, then they’re, they’re not gonna, they’re not gonna do their job as well. So don’t be stupid. Anyway, that is five questions. I think we’re done here. Thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And once again, if you would like to ask a question here, uh, please look down in the video description where the, the link to ask questions is, because that’s, that’s the way to ask them. Otherwise I cannot, I cannot accept psychic questions here.

All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Introducing My SQL Server Performance Engineering Course

Introducing My SQL Server Performance Engineering Course


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Table Variable Deferred Compilation Depth In SQL Server

A Little About Table Variable Deferred Compilation Depth In SQL Server


Video Summary

In this video, I delve into the fascinating world of table variable deferred compilation in SQL Server, specifically exploring how multiple inserts affect its behavior. I walk you through a stored procedure that demonstrates inserting rows into a table variable and then querying it to see if SQL Server defers compilation after each insert. The results are quite enlightening, revealing that while initial executions show promising deferred compilation, subsequent runs start behaving more like parameter sniffing, retaining cardinality estimates from previous queries. This video aims to shed light on this lesser-known aspect of table variables and might help you troubleshoot strange behaviors in your own SQL Server environments. Whether you’re using SQL Server 2019 or higher compatibility levels, understanding these nuances can be crucial for optimizing query performance.

Full Transcript

Erik Darling here with Darling Data. And we’ve spent some time this week talking about different sort of cardinality estimates for things that don’t really get great cardinality estimates. And one of the things that I talked about earlier in the week, I think it was on Tuesday, was that up to a certain point, table variables behaved a lot like local variables and that you would, they would sort of compile with the batch, but would not get cardinality estimates. a lot of information until table variable deferred compilation came along. And this made me think a little bit because, you know, in the past I’ve definitely tested stored procedures just to be like, okay, if I run this once and the table variable gets a thousand rows in it, and the queries that touch it now can see that there are 1000 rows in the table variable, and then I rerun this and I put like a hundred thousand rows in the table variable, what happens? And it turns out that… the behavior of table variables turns into what was essentially the behavior of table valued parameters in the past, which was that they would sort of act like parameters and the cardinality of the table value parameter would get sniffed and reused over the course of a query plan.

So like to me that was sort of interesting, but then it made me wonder like, well, what if I insert rows into a table variable multiple times? Will SQL Server defer compilation for multiple inserts? And so that is the question we are going to answer today. We are going to judge the depth of table variable deferred compilation. We are going to judge its depth like we are at a powerlifting meet and we want to make sure that it is squatted deep enough to get all white lights. We want to see the crease of the hip. We want to see the butt a little below parallel. We don’t want anyone cheating on their squats saying, oh, I squatted 1300 pounds.

Your butt didn’t even break even, pal. Anyway, if you look down in the video description, you will see many helpful links. Most of the help that those links will provide is in the form of giving me money. Hire me for consulting, buy my training, become a supporting channel member.

And if you want to ask me questions for my office hours episodes, you can do that. Otherwise, please do like, subscribe, tell a friend. Assuming that your friends are physically existent in the world and capable of subscribing to a YouTube channel.

But if your AI girlfriend has a YouTube account, I’m willing to accept AI girlfriends. They remain your girlfriends. They remain your girlfriends, but I’m willing to accept subscribers in the form of AI girlfriends. All right. Married man. Don’t mess with me.

All right. So leaving the house, of course, I get to go to Utrecht. I don’t know. I made what I thought was a funny joke, but I don’t know if anyone laughed at it. It was, I trekked, you trekked. We all trekked to Utrecht. All right. That was funny.

I don’t know. I’m a little deflated on that one. Sorry. Delivery was lacking a little life. October 1st to 2nd, I will be there with the nice folks from Red Gate who are bringing Pass on Tour. Well, they brought me to New York City. That was very nice. Dallas and now Utrecht.

And of course, I will be in Seattle for the Pass Data Community Summit, November 17th to 21st, with two rock’em sock’em days of pre-cons with Ms. Kendra Little about T-SQL and all of the wonderful things that happen when you use T-SQL.

All right. The life changer that is T-SQL. There we go, PowerPoint. Only took you five seconds to catch up with that. Click. Great. Wonderful. Let’s party. Let’s look at table variable deferred compilation depth.

So what I wanted to test with this store procedure is, like I said, multiple inserts. So here we have a store procedure that accepts one user input, right? So we are still going, we’re not using a table valued parameter here.

We are just going to use a regular old table variable. But the table variable we are using will acquire the table variable deferred compilation intelligent query processing feature. And we will get on initial compilation a guess, right?

SQL Server will give us table cardinality for that. So if I run this and, you know, run all the code, we should see something interesting, shouldn’t we? So here we have user IDs, right?

We’re going to insert into that table. We’re going to select all of the user IDs that match our user ID first from the users table. And then do a count. So we got some cardinality to estimate.

Then we’ll do everything from the badges table. And then we’ll do everything from the comments table, right? So we got that here.

Then we’ll do everything from the post table. Oh, post, that’s a big one. And then since there are two columns in the post tables that have user IDs in them, I decided to do both of them, right? So we’ll do last editor user ID too.

And then finally, we will do votes. And what I want to see here is if after each insert, SQL Server actually defers compilation of the count query until all the rows are inserted, right? So that’s a good one.

So let’s run this. We have actual execution plans enabled. And when we run this, this will take a couple of seconds to run. No, because we don’t have any good indexes in place. That’s fine, though.

We got time to kill, right? You’re not busy. So we have every other query will be interesting to us. So let’s look at how these pair up. So we have the clustered insert from users, right, into our table variable.

That inserts one row. And we get a one row estimate when it’s selecting from the table variable. That is correct so far. That is not incorrect.

One might see a one row estimate from a table variable and think, my goodness, but the insert actually did only insert one row. So as we get down further and we insert user IDs, say, from now the badges table, we add another 9,363. And now look, our clustered index scan says 9,364.

This is great news. We have actually deferred compilation of the second select until later. Right?

This is great for us. Now when we hit comments, we add 46,7… Can you imagine leaving 46,737 comments? My Lord.

What are you doing all day? But this is interesting though because now our clustered index scan says 56,101. Which, you know, I’m not a math person, but I’m fairly sure that if you added that to that, you would get that.

So that makes sense to me. And if SSMS reframes this one more damn time, I am going to lose my entire mind. So now, let’s see.

Now we’re inserting into user IDs, selecting from posts where owner user ID fits up there. And we add 27,901. So this person has left almost 47,000 comments and posted almost 28,000 questions and answers. So that is insane to me.

But when we add the 27,000 to what we had before, look at our estimate. It holds up very nicely. This is great. And if we scroll all the way down here, we will see this pattern repeat where every time we add rows, we get the right sort of guess from our select from the table variable.

All right. Well, that’s good for a first execution, right? But now let’s do a second execution.

My favorite user in the Stack Overflow database is someone named Eggs McLaren. They’re a dummy account. And let’s see what happens now. So let’s do this.

We’ll enjoy ourselves here a little bit. And if we look at this execution plan now, well, this is still good, right? One of one, one of one. But if you notice down here, we seem to be retaining cardinality estimates from the past execution.

And so we’re not really deferring the compilation anymore, are we? We have sort of given up on deferring compilation because now we’re getting guesses of like 69 of 9,300. And if we scroll down a little further and we click here, so hopefully SSMS stops moving, then we’ll start getting, you know, like we don’t get a good guess here.

I mean, I realize this is the insert, but like we’re retaining cardinality estimates from like not only the insert queries, but also the select queries. We can only defer compilation so much.

So now we’re getting 91 of 56,000 and 65 of 27,901. And if we scroll down further, we will see this pattern repeat itself. So at least on the initial execution of a stored procedure, we do okay, right?

SQL Server is like, well, every time I do an insert, I’m going to defer compilation of the select from the table variable immediately afterwards until after that insert completes and we get table level cardinality. Golly and gosh, that’s wonderful.

I’ve never, never been so happy in my life. The problem becomes that on further executions, the table variable acts like a parameter. And now we sort of have table variable sniffing, which I don’t know if that’s good or bad, but it seems, it just seems like a fun new problem to be aware of and have to go solve for people.

So maybe this video will arm you to go do that. I don’t know. I don’t know if you’re using SQL Server 2019. I don’t know if you’re using compat level 150 or higher.

I don’t know if you have disabled the table variable deferred compilation database scope configuration. And moreover, I don’t know if you’re using table variables, but if you are and things are behaving strangely, this might just answer some questions for you.

Anyway, thank you for watching. Hope you enjoyed yourselves. I hope you learned something and I will see you in tomorrow’s video where, I don’t know. I don’t know what we’re going to do yet.

I believe it’s going to be a Friday though. So it will be necessarily amusing, right? Because Fridays have to be fun. Fridays aren’t fun. Might as well all just…

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Table Variable vs Local Variable Cardinality Estimates in SQL Server

Table Variable vs Local Variable Cardinality Estimates in SQL Server


Video Summary

In this video, I delve into the fascinating world of local and table variables in SQL Server, particularly focusing on how they behave differently when it comes to cardinality estimation. After exploring these concepts in yesterday’s video, I decided to dig deeper by comparing temp tables with table variables through a series of tests designed to reveal their unique behaviors. By running various queries that include equality and inequality predicates, we observe how the lack of statistics on table variables can lead to unexpected estimates. This exploration not only reinforces the importance of proper indexing but also highlights the subtle differences between local and table variables in cardinality estimation. Stay tuned for tomorrow’s video where I will address some lingering questions about table variable deferred compilation and its implications for query optimization.

Full Transcript

Erik Darling here with Darling Data. And if you watched yesterday’s video, I don’t have a great intro for this one, I’ll be honest with you. If you watched yesterday’s video, we asked and answered the question, why do local variables behave the way that they do with cardinality estimation? And hopefully the answer was sufficient for you. But, you know, looking at that kind of got me a little curious and thinking. about, you know, table variables and local variables. And I wanted to, I wanted to mess around with that just a little bit more. Just to, I don’t know. Some, some details you just never think to look at in detail. If you look down in the video description, my beautiful, wonderful video description, you’ll see all sorts of helpful links for you and me. You can hire me for consulting, buy my training, become a, a, a paid, a, paid, contributing member of the channel. If you don’t feel like doing any of that stuff. Well, if you be so kind as to ask me an office hours question. I do, I do like getting those. It’s a fantastic mailbag. I, I, some of the questions in there have truly blown my mind. And of course, if you enjoy, if you are titillated by this content, please do like, subscribe, tell a friend, family member. I don’t know.

Something like that. Uh, I will be out on the, well, I mean, by the time this gets published, I will be home from what was pass on tour Dallas, at least hopefully. All right. Fingers crossed. Uh, that means that the remaining, uh, travel for me will be, uh, Utrecht, the Netherlands, October 1st and 2nd, uh, and then past data community summit in the Seattle, November 17th to 21st, uh, where, uh, Kendra Little and I, uh, will be presenting two days of, uh, probably the best T-SQL pre-cons you’ll, you’ll see for the rest of your life. Set the bar impossibly high for anyone else ever. But that out of the way, let’s address the, the matters at hand here. Now, uh, I think that, you know, we, I’ve, I’ve covered to a sufficient degree, uh, uh, differences between temp tables and table variables. Um, in tomorrow’s video, uh, we’ll talk about table valued parameters.

Actually, maybe we’ll do that today. Yeah, we’ll know what we’ll do that today. Uh, so what I, what I wanted to sort of go over a little bit first is, um, just like sort of how table variable cardinality estimates, uh, sort of compare to local variable estimates. And so I wrote, um, what I think is a reasonable series of tests, uh, one using a temp table. We all know that temp tables get statistics on them. And because we know that temp tables get statistics on them, I want to make sure that I update my temp table statistics with a full scan.

And, um, I also want to, uh, use a table variable, which I, I believe that at this point I have indelibly proved that even with an index on them, uh, we, we do not get a statistics histogram for, for the index or for anything on a table variable. Um, if I were to try to run a statistics update for a table variable down here, I would get an error. It would, it would just yell at me. Um, I suppose it would be funny if I tried to do it in some error handling code and had it print out something ridiculous, uh, on failure, but you know, um, gotta save something for the paid material, I guess.

Uh, but then, uh, we’re, I’m going to run a series of queries, series of queries. Why have I never thought about that phrase before? It’s great. Wow. It’s like the other day, someone said the phrase benchmarking to me and I was like, it’s fantastic. I want to, I want to find a way to say that somewhere.

So, uh, we’ll just run a couple of full counts from both of the tables. Uh, we will run a count with an equality predicate. Uh, the equality predicate will be using a literal value for the table variable. And for the temp table, we will be using a local variable that we declared up here called zero, right? That’s that thing.

Uh, and that’s, that, that’s basically, that pattern is going to repeat for inequality predicates greater than, equal to, less than, equal to, uh, greater than, I don’t know. And, uh, of course, less than down here.

So this will be our first test to kind of see how the two compare and we’ll run all this and we’ll get all these results back. Won’t life be grand. So we’ll have a couple uninteresting areas.

The uninteresting areas are of course the inserts. We don’t care so much about those. And then once we get down here, we will have some vaguely interesting areas. So because I am in, uh, I am on SQL Server 2022 and I am using the, um, 160 compatibility level.

And I have not, uh, disabled the database scope configuration for a table variable deferred compilation. I will get a 100 row estimate from the table variable. But because the table variable has no statistics histogram on it and the temp table does, uh, we, we, this is about where, um, things, things might start to get a little interesting.

So as soon as zoom, uh, we can look at some of the more interesting stuff. So selecting from the, uh, table variable with a literal value of zero and saying where ID equals zero. Well, we get zero rows back, but SQL Server thought that we would get 10 rows back.

This is a stock, this is a stock 10% estimate, uh, unknown estimate for the, what do you call it? Table variable. Cool.

Now, because our statistics histogram on the temp table, uh, accurately describes the data in here, we get a one row estimate back from the temp table when we use the local variable. SQL Server is like, hmm, I think we can only get one row here. Now we could absolutely get a one row estimate from the table variable as well.

If we made the clustered index unique. SQL Server would look at that and say, oh, well, uh, this is a unique index. Uh, I know that I’m only going to get one value back.

I don’t know what that value is going to be, but I know I pretty sure that I’ll get one back. So we could change that and get a little, something a little bit different on the second one, but I’d rather just say that and move on. Now, if we come down to where we start having the, uh, inequality predicates, and this is where I should probably start like actually clicking on the plan.

So SSMS stops resetting where I was before. Uh, if we look at the inequality predicates, uh, for the, uh, table variable, we get 51. And what I did prior to recording this was I tried with larger numbers because sometimes like, you know, like 51 out of a hundred, like if you did like a thousand, you might get five, one and some other numbers.

Or if you did like a million, you might get five, one and some different numbers, but it was always like five, one and however many zeros would have been 51%. So for the table variable, uh, with an inequality predicate, uh, we always get 51% using the default cardinality estimator. Remember I’m in compat level 160.

So I’m using the new cardinality estimator say new, but it’s like 2014. So screw whatever. Uh, I, I have a test with the legacy one and the next tab over, which we’ll look at. And then, uh, this one for this one, we get 90.

This was the only one that like really meaningfully changed between, uh, the legacy and default cardinality estimators. This used to, this will, this will get a slightly different number in the other tests. Everything else is just about the same.

And if we scroll down through the, uh, range predicates, so the greater than equal to less than equal to all that other stuff, these will all get 30%. 30% will be true for either the table variable, uh, where there’s no histogram or the local variable with the temp table where there is a histogram, right? Like, so the local variable works to screw up the histogram a bit, uh, for the, for the temp table and the lack of statistics on the table variable, uh, screws with the literal value.

Right. So kind of a, we’re just like dueling banjos of kind of like, ah, but why, why, why would you do that? Uh, so if we scroll down, this will be the same 30, 30, 30, all over there.

Uh, the next set of tests I did, and this is where I really had to like, I, I did some scrounging on this to, to make it work in like, uh, the same way. So if you remember the last one, when I did the temp table, I had an update statistics with full scan, doing the full scan update, update statistics. And this test was not getting me kind of what I wanted to show you.

So I had to create statistics on the temp table, uh, sample zero rows and, uh, tell the SQL Server not to recompute the statistics ever. So, um, I think bit of an unlikely scenario in real life. So let’s run this and we, we got, we got execution.

We didn’t get execution plans. Let’s do that one more time. There we go. Execution plans. So, uh, getting down to where things maybe start to get interesting, uh, because, uh, the table variable deferred compilation is still on. Um, the guesses up here for things like the count are still going to say a hundred.

And I wanted that. I wanted to maintain that because, uh, I want to have a table cardinality for the table variable to compare to the temp table. So, uh, let’s click on that so we don’t completely lose our spot.

And let’s come down here to the first sort of point of interest. And here is where we have, uh, the literal value for the table variable, uh, which is now guessing within the legacy cardinality estimator, 32 rows, uh, instead of 10 rows. And for, when we use the, uh, local variable for the temp table with no statistics on it, essentially, we also get 32 rows.

So a temp table with no statistics acts strangely like a table variable, huh? Weird, right? Bizarre.

It’s almost like I’ve, I’ve said that before and I’m just repeating myself at this juncture. Uh, and then we get to the inequality predicates. Remember I said this is where it was different with the legacy cardinality estimator? Uh, and the other one, it was 90 and this it’s 68, right?

So this was, uh, the same for either the table variable, uh, with the literal value or the temp table with the local variable. So that was a little weird, right? Like one from 90 to 68, just changing the cardinality estimation model.

I don’t know that that’s all that, uh, great of a different guess. I’m not sure why that changed, but you know, here we are. Uh, for the rest of these, for the, for the, all the range predicates.

So greater than equal to less than equal to yada, yada. These all get 30, right? So 30 all down here, right? Everything is 30 from there on out.

It doesn’t matter legacy or default cardinality estimator. They all get this, uh, guess of 30% of the table. So this, like that, like that 30%, right? Three zero 30 out of a hundred 30%.

Again, when I cranked the numbers up to like, like a thousand or 10,000 or a million, it was always just 30%. Now, of course, this, um, you know, leads to the question a little bit. Uh, is there a difference between, uh, temp tables and table value, table, table variables and table valued parameters?

And the answer is of course, yes. But, uh, before we do that, um, because you are like something that I wondered about a little bit. And, um, maybe you, you would also wonder this because you’re a naturally curious person filled with wonder for the world around you is what would it look like if we mixed the local variable with the table variable?

So, uh, I did that, right? So this is all table variable and local variable. And the answer is that because the table variable has no statistics on it anyway, the local variable really doesn’t do anything interesting, right?

We get all the same guesses that we would have gotten sort of between the two. So it’s like for the equality predicate, it’s 10%. For the inequality predicate, it’s 90%. For everything else, it’s 30%.

Okay. Well, um, great. All right. Wonderful. We’ve again, once again, sort of found a new way to say table variables don’t get statistics on them. But what about table valued parameters?

Now, table valued parameters used to be a way to get table variables to behave more like parameters, right? Because, uh, it’s sort of like the way table variables behave, uh, with the table variable deferred compilation, uh, intelligent query processing feature, uh, where table variables act a bit like parameters. But we’ll, we’ll talk a little bit about a little more about that tomorrow.

So, um, with, uh, table variables. Now, if you’re using 2019 plus and you’re getting table variable deferred compilation, the first time your store procedure compiles, you’ll get table cardinality for the table variable and further executions will reuse that. Okay.

Uh, this generally works by deferring compilation of the execution plans for queries that access the table variable until after the table variable has been populated. Now, uh, when I wrote that sentence, I had some questions and that’s what tomorrow’s video will answer those questions.

So here is the table type that I created to have a table value parameter, uh, owner user IDs. We have an owner user ID column in here with a clustered primary key on it. We’re not going to make this one unique here, even though we could, uh, it wouldn’t really do much for us, but, um, I’ve already pre-run this query because, uh, if I didn’t pre-run this query, you and I would be sitting around for a while.

Now, I only put one row into our table value parameter, right? Or rather, you know, our table variable that is assigned this table type. So it’s a table value parameter magic, magically for us.

And the execution plan for this, it just does not really very well reflect, um, a situation where a cardinality estimation was done statistically. Uh, you might notice here that we get a one row estimate coming from the clustered index scan of the table value parameter. And if I duck down a little bit here, right, this is going to show off the fact that I do in fact do squats in real life.

Uh, SQL Server estimated 12 rows would come out of the post table, but we get almost 28,000 rows. And then when we come over here, SQL Server is like, well, no, that’s still wrong. And this is obviously still wrong.

And then we get, when we get over to comments, well, we are, we are just very, very wrong. SQL Server is like 319, but really it’s a much larger number. And so this thing, whole thing runs for two minutes and 20 seconds.

So I don’t know. I thought that comparing the, uh, temp table and table variable with a local variable thing. And looking at the cardinality estimation would yield like some, some really interesting stuff.

But all, all, all I really seem to do was just further prove that the lack of statistics on table variables and by extension, table value parameters can really screw things up. All right. Well, hopefully tomorrow, tomorrow’s video will give us a little bit more to think about.

All right. A little bit, give us a little bit more to, uh, to chew on in our lives. Uh, I’m going to talk a little bit about just how far table variable, table variable deferred compilation goes in a store procedure. All right.

Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About *Why* Local Variables Give Weird Cardinality Estimates in SQL Server

A Little About *Why* Local Variables Give Weird Cardinality Estimates in SQL Server


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 34

SQL Server Performance Office Hours Episode 34



Questions:

* Why do parallel batch mode sorts only utilize 1 thread?

* With all of the recent content frequently mentioning no locks and read uncommitted isolation level, Is there ever a situation that you would recommend using nolock or is it always avoidable?

* What’s required for my query to use batch-mode on my rowstore indexes? Like what version or compat level is required? I remember a while ago, a fake join to a columnstore index was required. Are there any hints I can use?

* Hi Erik, i admire all the work you do for the community, thanks a lot! You are one of the great minds of SQL Server.

* what happened to the old sqlperformance.com website? they just suddenly stopped posting. but the authors are still around and authoring on other places on the interwebs

To ask your questions, head over here.

Video Summary

In this video, I dive into answering five user-submitted questions during my Monday office hours session. Whether you’re curious about parallel batch mode sorts or the nuances of using NOLOCK hints, there’s something for everyone. I also take a moment to thank those who support my work through likes, subscriptions, and sharing with friends—every bit helps! Additionally, I share some exciting upcoming events where you can catch me live in person, including the Redgate Pass-on-Tour in Dallas and Utrecht, as well as the Pass Data Community Summit in Seattle. If you have any questions or want to see me at one of these events, make sure to check out the links provided in the video description for more details on how to get involved.

Full Transcript

Hey, it’s me, Erik Darling here with Darling Data. And in today’s video, we are of course, it is Monday, so we do office hours and I answer five of your wonderful user submitted questions. And that’s fun. Before we do that, if you want to ask me a question for office hours, the link to do so is down in the video description. If you would like to hire me for consulting, buy my training, or become a paid channel member to support the effort that I put in the video, and then I put into all this SQL Server content for you. All of those links are available there as well. If you are uninterested in a monetary exchange with me, you can do free things that make me just glow with happiness. You can like, you can subscribe, and you can tell a friend or two or 20 or a thousand, assuming you know that many people. I’m not sure how full your Rolodex is, but… Hopefully it’s more… Hopefully it’s not just your mom. But your mom is important.

If you would like to see me out on the road, live and in person, I will be in Dallas, September 15th to 16th, and Utrecht, October 1st and 2nd. These are the pass-on-tour dates that Redgate is putting on this year. Aside from that, I also have Pass Data Community Summit in Seattle, from November 17th to 21st, where I will be doing not one, not three, but two days of T-SQL performance pre-conference. I will be doing the pass-on-t-cons with Kendra Little, and that’ll be great for everybody. Especially you. Because you’ll finally learn T-SQL, and that’ll be great. Anyway, let us office hours away here.

All right. First up, why do parallel batch mode sorts only utilize one thread? Well, I’m gonna… So there are two parts to this. One, your question is correct about one part, but parallel batch mode sorts, of course, do the actual sorting work on multiple threads. They do not do the sort on one threads. The sort only produces one thread.

The best explanation that I’ve ever seen in detail of why is in a Paul White blog post that I will put in the show notes about why batch mode sort spills are so slow in SQL Server. It has to do with the sorting algorithm that SQL Server uses internally and the complexity of producing parallel output from a batch mode sort. There is, of course, one operator in all of SQL Server that currently does support receiving parallel threads from a batch mode sort, and that is the window aggregate operator that gets used when window functions are processed in batch mode.

So there’s that. But I’ll put the link to Paul’s post in the video description and everything, because anything further that I said about why that is would be pure plagiarism, because the technical details there are best left up to the best SQL Server consultancy in all of New Zealand. Outside of New Zealand, it’s, of course, me, but within New Zealand. That’s Paul White.

All right. So next. With all of the recent content, frequently mentioning no locks and read uncommitted isolation level, is there ever a situation that you would recommend using no lock or is it always avoidable? Well, there are. So, you know, if it’s a query that you don’t particularly care about and it’s causing problems and no one seems to really care about what the results of that query are, you are free to put no lock hints on it.

You know, that would be that would be the first place. A second way of thinking about it would be a situation where. So one thing that’s interesting about the no lock hint.

is that it allows for SQL Server to use what’s called an allocation order scan, which basically just reads like data file contents in the order that the files were created. And in a nutshell, there’s there’s a little bit more to it, but, you know, I don’t have days to talk about these things, which can be faster in some situations, especially if you are reading pages from disk, I think. And so if you have a data warehouse type workload where you have a lot of data that gets loaded, say, at night or in the morning or something, and then like the data is fairly static during the day, you could use no lock hints on your big tables to sort of get encourage the optimizer to use an allocation orders order scan versus an index order scan, which can be faster.

So that would be one use case for it there. It would be very similar to if you changed your data warehouse to be read only once data loads were finished. That’s another way to achieve a similar result.

So that would be when I might use no lock. Aside from that, you know, I will judge you for using it. What’s required for my query to use batch mode on my rowstore indexes?

Like what version or compat level is required? I remember a while ago, a fake join to a columnstore index was required. Are there any hints I can use?

So there are no hints that you can use that I’m aware of. And that would extend into saying there are no supported hints that I am currently aware of that would allow that would force batch mode to occur. The fake join to a columnstore index will still get sometimes.

So the the the the the tricks you can play on SQL Server, one of them is, of course, the fake join to a columnstore index. The other would be creating a non clustered columnstore index on one of the involved tables with a filter that cannot possibly contain a row like say where ID equals one and ID equals negative one. No ID can be one and negative one at the same time.

So there would be no rows in the filtered index, but you would still have a columnstore index on the table that might encourage the optimizer to use batch mode in some places that will that will not. So like the tricks that you can use, I’ve found are not baked as deeply into the optimizer is when batch mode on rowstore happens. Like like something like some like sometimes you’ll see when you do one of the tricks, you’ll still read from tables in row mode.

But when batch mode on rowstore happens, you can read from rowstore indexes using batch mode. So there are some subtle differences that can affect how effective the batch mode is in the execution plan. But to get to your actual question, you need to be on Enterprise Edition.

It does need to be SQL Server 2019 or better. And you do need to be in compat level 150 or higher. If you have control of your queries and you meet most of those most of those requirements, you can use the option use hint.

And you can say the option use hint, the compatibility level 150 or 160 or if you’re in the cloud or something 170 in order to get a higher compat level for the query than what your database is currently set to, which can allow for additional batch mode stuff to kick in. Batch mode on rowstore itself has a bunch of heuristics. It looks at like the size of the tables, the complexity of the query, the joins and everything like that.

And it makes a runtime decision about whether to use batch mode on rowstore for your query when that happens. Like I said, you can use various tricks to encourage it, but those various tricks often do not get batch mode across the breadth of operators that batch mode on rowstore does. All right. Next up.

Hi, Eric. I admire all the work you do for the community. Thanks a lot. You are one of the great minds of SQL Server. That’s very kind of you. This is I’m going to.

This is not a question. I just want to point that out, but it is very kind of you to say that. And I appreciate I appreciate the sentiment there. So thank you. I think you are.

I think you’re also a great mind, but I’m not sure in what community you are a great mind. But I am sure someone benefits from you somewhere. So good job. All right.

Oh, here’s a fun one. What happened to the old SQL performance dot com website? They just suddenly stopped posting, but the authors are still around and authoring on other places on the interwebs. Well, this is another SolarWinds special.

You see, when SolarWinds purchased SQL Sentry and the or whatever Sentry one and the SQL Sentry monitoring tool was part of that package, of course, two things happened. Both the SQL performance site that that SQL Sentry had hosted and paid people to blog on for years suddenly stopped being a monetary priority, as did the SQL Sentry monitoring tool. So again, as I’ve said before, someone at SolarWinds should be in jail for neglect of these resources for the SQL Server community.

But in a nutshell, that’s what happened. You know, I guess the authors there were not interested in blogging there for free and all sorts of things on the site have deteriorated. Code examples are no longer formatted well.

Everything is a nightmare. I do know that Aaron Bertrand has migrated the I think if all. Well, oh, no, that was from SQL blog that he did that.

I don’t know if Aaron has migrated any of his posts from SQL performance elsewhere, but Paul White has moved all of his posts to his his personal website, SQL dot Kiwi. That’s SQL dot Kiwi. Any posts that Paul had on the SQL performance site is hosted there with a very similar URL just at the SQL dot Kiwi domain.

And I believe for some of them, he has updated the contents to reflect newer changes in SQL Server where were applicable, I suppose. Anyway, that’s about what happened there. It is a shame.

You know, that’s no longer an up to date resource for things and it’s just sort of withering on the vine. But, you know, there’s still some good stuff there from back when people cared about things. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something and I will see you in tomorrow’s video. I think can never be quite sure. I think I think that’ll happen.

But, you know, maybe maybe I’ll just get drunk and decide to retire. Who knows? That’s still early.

All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.