It’s Time to Get Rid of the Cost Threshold for Parallelism Setting.

It’s Time to Get Rid of the Cost Threshold for Parallelism Setting.


Video Summary

In this video, I delve into the outdated and largely unnecessary `cost threshold for parallelism` setting in SQL Server. This setting has been around long enough to have outlived its usefulness and relevance in modern database management practices. I explain why relying on a fixed number for this setting is not only ineffective but also misleading, as it can lead to suboptimal query execution plans and resource overutilization. I argue that the focus should be on optimizing queries to run faster rather than obsessing over cost estimates, which are merely pre-execution metrics with no actual meaning in today’s computing landscape.

Full Transcript

It’s time to get rid of the cost threshold for parallelism setting. It is a setting that has long outlived its usefulness and even longer outlived its meaningfulness as far as how SQL Server should be considering the benefits of using parallelism when executing a query. There are far too many people out there in the world that think that there is some hidden perfect setting for their workload when there isn’t. They refuse to acknowledge that settings like this are just good, you know, general guardrails for things. They are not a fully bulletproof way of making sure that your workload always does the right thing. You might see people say you should start at 30 or 50 or 150 or 500 and maybe figure out what you should set it to based on the type of workload that’s running. But it really is all magic. It’s all magic number thinking and it’s really annoying to watch people chase their tails on this. It only ever meant seconds on one computer and it is not your computer. It is a unitless metric. If anyone ever called me a unitless metric, we’d probably end up in a fight. You shouldn’t be looking at unitless metrics to tell you how your modern workload should execute. You might think that there is some magic number thinking.

You might think that there is some magic number, but then find that really tragic optimizer costing issues leave important queries with a serial execution plan and options for forcing a parallel execution plan, either trace flags or use hints are all unsupported. And that is something that desperately needs to change, but that’s a topic for another day. Remember that max stop is not min dot. Max stop limits how many CPUs a query is. Can use does not instruct SQL Server. How many CPUs to use. So setting max stop will not force a parallel query. Likewise, you might, you might also, you know, set cost threshold for parallel parallelism in a way that, you know, not only prevents some very important queries from getting a parallel execution plan, but you might also set it in a way where too many queries end up getting parallel execution plan. Because they no longer qualify for exploring the parallel execution plan.

You may also set it too low, which is also in some handy quotes there and find that you’ve run out of worker threads because now everything is going parallel. And that six core server that you threw all the hardware in the world at is just completely overwhelmed by parallel queries all running at some dop or another. One of the, some of the, some of the most, some more frustrating things about it are that it puts way too much emphasis on a very, very flimsy data point. Uh, something that a lot of people, uh, something that a lot of people either refuse to acknowledge or it just never occurs to them or they have never read or listened to anything I’ve said is, And this is like maybe the most important thing in here.

Cost is an estimated pre-execution metric. When you look at your execution plans, there are no actual costs. There are estimated costs for everything.

There are estimates for a lot of things. And for some things, there is even an actual component that gets added after you execute a query. When you look at the execution plan, there will be things like actual rows and actual executions and other stuff like that.

There are no actual costs. So people start paying way, way more attention to operator and subtree cost when they’re trying to tune things. And it makes them think that a meaningful goal is reducing cost when it is not.

Reducing cost is not the goal of tuning a query. If it happens to be a byproduct of tuning a query, that’s fine. But it’s not a query tuning goal.

Your query tuning goal should be to have that query run much faster. Reducing cost is not a way to do that. You might have very high cost queries that run very quickly. And you might have very low cost queries that run very slowly.

There are a variety of reasons for that, of course. But cost is not an indicator of execution time. And in today’s world where compute is at a premium due to licensing, people will start paying attention to these cost things like they’re going to save them money.

That’s what’s going to drive costs down. You might as well be stuck in 2008 looking at logical reads or some other dinosaur metric. You might as well look at PLE or buffer cache hit ratio or, I don’t know, like context switches and disk queue length and other stuff that has just gone the way of, you know, just obsolesion.

One thing that I dislike about it that is, you know, maybe not the biggest deal in the world is that it tends to give people a very false sense of control. Sure, you can change this number whenever you want, right? It’s a thing that you, it’s a knob that you have control over.

You can change it to 49, 50, 51, 62, 75. You can go up, go up as higher, as low as you want with it. But you can’t change how the optimizer costs things internally, at least not in any supported way.

There are some, there are some DBCC commands that you can run, but, you know, most people aren’t going to start running those on their production servers just to see, see what if this and that happens. It can also be very confusing to the same people who are confused by a lot of, a lot of things in SQL Server who are just not experts or who have never spent much time with it. Or, you know, maybe they spent a lot of time with it, but, you know, they’ve been, they’ve been taking the backups and doing the index rebuilds for years.

And that’s, that’s their, that’s their single point of experience. They’ll see parallel plans with a cost that’s lower than their cost threshold for parallelism setting and think that SQL Server is broken, right? But that’s not true.

What, what, what, what no one gets is that every plan starts out as a serial execution plan. If that, if the cost, the estimated cost of that serial plan is higher than your cost threshold for parallelism setting. And SQL Server starts looking at parallel plans.

There are no natural inhibitors to a parallel execution plan, like a non-inlineable scalar UDF or an insert into a table variable or something. Then SQL Server will start looking at these candidate parallel execution plans. And if it finds a cheaper one, then it just might go with that.

After, you know, factoring in CPU reductions and whatnot. But, but you might end up like if, let’s say that your cost threshold for parallelism is 10. You might see a parallel execution plan with a cost of 8 or 9 or 1 or 5 or 0.

Because the parallel plan was cheaper than the serial plan. Well, while we’re on the topic of parallel plan costing, nested loops queries get an absolute screw job on parallel plan costing. Because the costs for anything that happens on the inner side of the nested loop are not applied any reduction.

But often nested loops queries benefit quite a bit from parallelism. Right? Serial nested loops queries at a certain point just drive the mind, just fries your brain. We’ve come to the point with SQL Server as a, let’s call it a mature software product.

Where we have enough stuff going on that falls under the intelligent query processing umbrella of features. There are many things that will happen or not happen based on other heuristics. I think probably, probably the easiest one to recognize in that category is batch mode on rowstore.

SQL Server will use various heuristics, heuristics about the type of query, the joins, the size of the tables and things like that to see, to figure out if batch mode on rowstore might be a good thing for you. And batch mode on rowstore leads to all sorts of other neat things like adaptive joins and whatnot. We’re at the point now where there is almost no sense in taking user input on what query cost should be before a query goes parallel.

There might be a good candidate setting to replace it with, but it might also at this point just be completely replaceable by some intelligent query processing feature that just uses heuristics. Again, similar to what batch mode on rowstore does to implement or explore the parallel query plan space. And there are also many feedback mechanisms where one might look, where the optimizer could look at the query plan and then the engine could execute the execution plan.

And then we could look at things after the fact and say, well, you know, we didn’t think parallelism would be good there, but we ended up with a lot of rows on a single thread. And boy, howdy. Maybe, maybe more threads would help.

We actually already have a setting called dop feedback, which is, I mean, plum useless the way it was designed. But, you know, I wasn’t the PM on that. So don’t blame me.

So there are things that would be better to do at this point, right? Like setting, like leaving this cost threshold for parallelism setting, like, you know, A, out of the installer, right? Because, like, there are so many other things in the installer now.

You can set maxed up in the installer. You configure 10 dB in the installer, right? There’s, like, stuff, like, you can even turn on, what’s it called? Like, perform volume maintenance hash, like the lock, the instant file initialization thing. You can turn that on in the installer now.

But they leave cost threshold for parallelism out of the installer. And by they, I mean Microsoft. There’s, like, nothing in the documentation that gives people guidance or anything on it. You can’t change it at all in Azure, right?

So who knows what Azure is doing, right? Azure SQL database, you can’t change it at all there. I mean, you can change it in managed instance. You can change it, like, if you have a VM. But Azure SQL database, you can’t change it there.

So perhaps there is some scientific exploration going on about a better way to gauge the relative benefits or drawbacks of parallelism for queries there. I don’t know. I don’t have any information on that.

But I just have to hope that at this point we can finally either drop it, drop cost threshold for parallelism as a setting completely, or we can finally start to give people some meaningful thing to do with it. The thing is, I just don’t think that there is a routinely meaningful thing that you could tell people that they should do with it.

They would solve their problems. Anyway, I’m done here. It’s Friday.

I’m going to go think about this and, I don’t know, stare at some red stuff in a fancy glass. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

How To Think Like A Batch (Mode) In SQL Server

How To Think Like A Batch (Mode) In SQL Server


Video Summary

In this video, I delve into the intricacies of batch mode friendliness in SQL Server queries, sharing practical examples and strategies to optimize your query plans. Starting with a common annoyance in a query plan, I demonstrate how certain operations can hinder batch mode execution, leading to suboptimal performance. By exploring alternative query structures and pre-aggregating data, we achieve more favorable execution plans that leverage batch mode effectively. The video also touches on the importance of understanding where batch mode excels and aligning your queries accordingly, providing valuable insights for anyone looking to improve their SQL Server performance tuning skills.

Full Transcript

Erik Darling here with Darling Data. And boy, I got a video for you today. We’re going to talk a little bit more about batch mode friendliness. I don’t know, these seem to do fairly well as far as peaking interest from people. So what the hell? Let’s do it some more. Before we get into that, the usual ol’ spiel schnitzel. If you want to support the content that I produce on this channel, you can sign up for a membership. It’s a good deal. It’s a good deal. Because then I’ll keep doing it and not just retreat into a cave and keep it all to myself on stone tablets. If you want to ask me questions from my Office Hours episodes, there’s a good link for doing that down in the video description. I have enhanced the link to give you more opportunities to add detail to your questions as well. If you need consulting, you know, consulting, looking at your SQL Server and you’re thinking, gosh darn, this thing is slow. I promise you, I can do all of these things and I can do them all well. And according to Beer Gut Magazine, I do them better than anyone outside of New Zealand. So you can hire me. And as always, my rates are reasonable. You can get my performance tuning training for about 150 US dollars.

That link and that discount code also fully assembled for you down in the video description. And of course, my new T-SQL trainings. The beginner content is fully published. There is about 23 hours of it. If you’re going to pass and attending Kendra Little and I’s pre-cons, you will of course get access to all this material. It is on pre-sale still at 250 bucks. But after the summer, when I have regained consciousness, the price will go up to 500. So I would really suggest that you get in on these purchases now. They will be far less negotiable than the everything bundle. And of course, this summer, I am also traveling a bit. The nice folks at Redgate have decided to pull me kicking and screaming from my home where I do these recordings. I will be in New York City, which I guess is not too far from home. August 18th and 19th. Dallas, Texas, which is moderately far from home. September 15th and 16th. And then Utrecht in the Netherlands, which is slightly more moderately far away from home than Dallas, I guess. And of course, you know what? Now that I think about it, Utrecht and Seattle are kind of about the same flight wise.

So I will also be slightly more moderately far away from home for past data community summit in November, the 17th to 21st. With that out of the way, though, let’s talk about some batch mode stuff here. So the first query that I want to show you has kind of an annoyance in it, right? We’ve got it. We’ve oh, oh, dear. I don’t know what button that was. Did did zoom it have a problem? Are we not? Are we not doing the zoom it show here? The zoom it show works now. Great. Some reason SP who ran and that just should never happen.

We’ve got a little bit of an annoyance in this query. Part of our join clause is looking for where user ID is no or we’re giving SQL service say I really want to join on this. But if the user ID column is no on the comments table, we can we can we can leave that in. The thing is, and I can only get an estimated plan for this because when I when I’ve tried to run it and get an actual execution plan, it is horrible.

So what we do in this query plan is nothing very batch mode friendly. We scan the clustered index of the post table and the comments table. And we we sort the entire post sort the post table. I mean, not the entire table. We sort the columns that come out of the post table. But, you know, we’re selecting, you know, owner user ID, post type ID and score.

And we are ordering by post type ID ascending. And then we go into a nested loops join nested loops joins no batch mode there. Uh, lazy table spoons, no batch mode there. Uh, stream aggregates no batch mode there.

Uh, these rocking with rowstore, right? We are not getting anything batchy at all in this plan. Uh, worse is if I tried to tell SQL Server, I would, I would prefer a hash join here, my friend, uh, we will get an error back. You’ll get this, all this red text telling us that the query processor could not produce a query plan and that we should resubmit the query.

Well, okay. We could do that or we could, could, could try some stuff, right? Now, what I find interesting about this particular query format is not so much the SQL Server can’t figure out like, like how to use batch mode or how to use a parallel plan or how to use a, uh, hash join plan. What I find interesting is that if we look at the post table and the owner and the, and the look at where owner user ID is null.

And we look at the comments table to see where user ID is null, which is effectively measuring the, or rather, uh, storing the same type of data, right? It’s like whoever owned the comment or whoever owned the post, there are no nulls in the post table that we have a zero there. And we have 336,000 null user IDs in the comments table.

Very interesting. So if we change the query to look like this instead, right? If we say, uh, where owner user ID is null and, uh, you see that user ID is null.

Uh, we can at least achieve a hash join plan here. However, this query will run for a very long time. It is not a good time.

Uh, we do, uh, was this the one? No, this is not the one. So, uh, there is still no sign of batch mode, uh, in this plan until we get to the very end where we have, uh, some batch mode on this final hash match aggregate. The thing is that we do all this other work in row modes and that’s, that is not exactly what we want.

We want more batch mode happening in our plan, not just one operator. That’s, it’s not terribly helpful. So, uh, what we can do, or actually before we do that, what I want to show you is without the hash join hint, I want to show you, uh, potentially one of the most deeply offensive query plans that you might ever see in your life.

Uh, and this is all row mode. Uh, I can promise you that, uh, there is a scan of both tables. Uh, then we repartition streams again, no batch mode here.

And then SQL servers like, Oh, a merge join. Yeah, great. I have no sorted input. So I’ll just sort both of these inputs. I’m just going to sort both of these to use a merge join worse, worse, worse, worse.

This is a many to many merge join. Uh, and, uh, I, and I tried running this one too. This one ran for about 30 something minutes before I was like, you know what?

Um, I think, I think I just need to go record something at this point. Cause I’m getting tired. I’m starting to get exhausted. So we have this all row mode work.

And then of course we have our one loan hash match aggregate occurring in batch mode here. Uh, it’s not that none of the operators leading up to the merge join support batch mode. We just don’t get it.

Right. And like, like, like merge join, it doesn’t support batch mode. And again, like the parallel exchanges, like the repartition streams and the gather streams over here don’t support batch mode. But of course with batch mode on rowstore, uh, we do, we do have support for reading from tables and sorting and computing scalars or computering scalars in batch mode, but we’re just not getting it here.

SQL Server does not naturally choose a very batch mode is plan here, which is not good for, not good for batch coin. I’ll be honest with you. I’m not having a good time with that one. So, uh, what we can do in order to make this query more batch mode friendly is do some pre aggregating on our own.

One of the big downfalls of even this plan with the hash join hint is if you notice, uh, SQL server does not make any attempt, uh, prior to the join to like group data together. Like we have two columns, owner, user ID and user ID, which are lousy with duplicates.

There are so many duplicates in there. Just an incredible amount of duplicates. Right.

Like just wild with them. Um, but SQL Server does not make any attempt to do a pre aggregate. Like it just like, no, I want to, I feel like fully joining all these things together. I want, I want all the rows joined together.

This is going to be great. It’s not great. It’s not a good time. Even the hash join plan runs a very, very long time. Now what we can do is we can force SQL servers hand a little bit and we can do some pre aggregating of our own.

Uh, we still need to do some outer aggregating, but it’s okay. Cause it’s a little bit of pre aggregating that goes a long way. So for example, we can, uh, select this stuff and we can do a little bit of pre aggregation on some of our columns and we can do like a group by an owner, user ID and post type ID out here, and then we can join that to another pre aggregated result here where we group by user ID and then do our join.

And then finally out here, group by post type ID. And when we do that, now why is, why are you not properly terminated? When we do that, we get a much more favorable execution plan. I’m just going to show you the estimated plan first.

Where we do get much more batch mode. We do not get full batch mode the way that we might want, but we get much more of it. And the, like another thing that we get is of course, like using our smart brains, we, we, or at least we’re, we’re using some smart brains out there.

I mean, I don’t know. Some, sometimes I have good ideas. Uh, but when we look at this plan, right, where we do the pre aggregation SQL Server is like, oh yeah, I think I can use batch mode here.

And when I do this aggregate, oh yeah, I can, I can, I can do batch mode there. And, and, and, but I know there’s still no batch mode for the repartition streams, but, uh, SQL Server is still does some fairly batchy things leading up to, uh, all of the, uh, this work.

Now, unfortunately. Unfortunately, the hash join that we do is in row mode, but that’s not the end of the world here. We still see a really, really big performance increase, uh, and, uh, the estimated execution mode of that, this final hash join, which is now going to have like much less work to do.

Cause we pre aggregated a bunch of stuff before the join. We’re going to have, we’re going to be leaning on this thing much, much less because we’re going to have far fewer things to, to go and aggregate altogether. And, and finally, when we run this query, this one actually finishes.

Like it, it actually just completes without me having to do much. And the, like you can, you can see like, you know, sure. There’s, is there stuff we could do for this thing?

Yeah. Maybe, maybe we could add in some indexes. Maybe we could tweak indexes a little bit. Maybe we could even like add columnstore indexes. I don’t know. We could, we could do maybe stuff that might be useful here, but just like doing things that are more agreeable to batch mode, like, like doing like the early group buys and joining the results of that.

And like, you know, just things along those lines, like think like things that batch mode is into doing, which is like, you know, I want to group by this stuff and aggregate this stuff and get some sums and some counts and some other, like, you know, data warehousey analytically things.

Batch mode is just like, oh yeah, no, I get it. I’m picking up what you’re putting down. I think, I think, yeah, I think, I think I’d be useful here. Yeah. I’m going to get in on this game. So when we do this, like, like a lot more stuff, like I said, happens in batch mode.

The, the, the hash join does not happen in batch mode, but that’s, that’s okay. Because like, so, you know, going back to like reading operator times in these plans, all of the batch mode operators, like, like these, like, they’re just going to show the time underneath the operator in here for like the wall clock time for what they did.

So, you know, it’s like 416 milliseconds, 186 milliseconds. And then like, when we get to the, the repartition streams, which is in row mode, then we get like, like the total time spent added up until you get to here.

And that’s the same thing with this, this side of the branch too, since this is batch mode and this is batch mode and this is batch mode. These things are all sort of individual, all individual.

But like, like one thing I do want to repeat here is that like, like looking at operator times for a parallel exchanges is not something that you should spend a lot of time with. But the point is we don’t spend 1.5 seconds in the hash join.

It’s like 1.5 seconds minus like 500 and minus like, I don’t know, like 186 plus 416. So like, you know, there’s, there’s some stuff in there that, you know, is, is okay. Right.

It’s like, we didn’t spend 1.5 seconds here or here or here. Like we just like, it’s, it’s cute. It’s cumulative, but the whole query finishes in about just 1.6 seconds. So when you are trying to write batch mode, friendly queries, again, the, the, the closer you write your query to align with things that batch mode is good at, the better the chance you have of getting batch mode operators in your query.

So just, you know, try to think like, like a batch, right? Like think where is batch mode useful and then try to align your queries to that rather than just like, you know, writing your typical query and wondering why batch mode doesn’t show up.

You might, I don’t know, maybe you’re, you, maybe you’re like already a batch mode person like by accident. And the way that you write queries is very batch mode friendly. But, you know, like if you’re coming from like, you know, sort of like, like role mode query tuning stuff, you might be really into doing like, you know, like, like apply and things like that, where, you know, you’re like into like trying to find these like navigational seek-y strategies, maybe like getting like parallel nested loops pushed in there.

But when you’re trying to get batch mode happening in your query plans, you’re, what you really should focus on is truly thinking in terms of where batch mode excels and what batch mode is good at.

And then like writing your queries to try to conform to that. Anyway, that’s good here. I hope you enjoyed yourselves. I hope you learned something and I will see you in tomorrow’s video where I will say something that is, I don’t know.

I don’t think it should be all that, all that weird, but maybe, maybe you’ll think it’s weird. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Spotting Batch Mode Opportunities in SQL Server Query Execution Plans

Spotting Batch Mode Opportunities in SQL Server Query Execution Plans


Video Summary

In this video, I delve into the world of batch mode opportunities in SQL Server, exploring how to identify and leverage these opportunities for performance gains. Starting off with a discussion on my upcoming consulting services and training courses, I share that you can support this channel by signing up for a membership or asking me questions during office hours. Moving onto the technical content, I analyze execution plans from previous videos where SQL Server missed out on using batch mode, even though it was clearly beneficial. By adding a simple columnstore index to a helper table and running trace flag 7418, I demonstrate how these changes can transform row-mode operations into batch mode, significantly reducing query times. Through detailed explanations of the execution plans and performance metrics, I aim to help you spot similar opportunities in your own queries.

Full Transcript

Erik Darling here with Darling Data, and we are going to talk in this video about batch mode opportunities. It’s kind of like the clash song, except way less catchy and cool, right? Anyway, my life, what am I gonna… Started out so promising. If you would like to support this channel, you can do that. You can sign up for a membership. If you would like to ask me questions for office hours, you can do that at that link. Both of these links are down in the video description. If you would like help with your SQL Server, perhaps you need help with your batch mode opportunities. I am available for consulting. You can hire me. I’ll show up. I will wear this Adidas shirt. And I will be just as clean and kempt as I am in these videos. Not drunk, and it’ll be fun. And as always, my rates are reasonable. If you would like to buy my performance tuning training, there’s 24 hours of it, just for you. Aren’t you special? Look at you. Special little thing you are. You can get all of it for about 150 bucks for life. You go to that URL and plug in that discount code.

And this is also helpfully assembled for you down in the video description. My T-SQL course, Learn T-SQL with Eric. That’s me! Is also available. All 23 hours, just about 23 hours of the beginner content is fully published. If you are coming to pass Data Community Summit and attending Kendra Little and I’s pre-cons there, you will, of course, get access to this material as I consider it companion content to what will be going on there. The advanced stuff is being worked on currently. That’ll all start going up after the summer.

And the other thing going up after the summer is the price. It’ll go up from 250 bucks to 500 bucks. So you should buy that now while it is still 250 bucks. Speaking of the summer, gosh, how am I going to get all this done? Red Gate is taking me on a partial world tour. You know, mostly small clubs and menus.

New York City, August 18th to 19th. Dallas, Texas, September 15th to 16th. And the Hamlet of Utrecht in the Netherlands, October 1st and 2nd. And that all leads up to Past Data Community Summit where the aforementioned T-SQL pre-cons, plural, will be taking place.

But with that out of the way, let’s talk about spotting opportunities for batch mode. Now, this was the query that I ran in the last video where I said, no, that default cardinality estimator sure didn’t do so good. And if we look at the execution plan for it, we’ll come back to that.

I’ve already run it because golly and gosh, why sit through that 8.5 seconds last time? I guess, I don’t know, maybe Windows Update wasn’t doing something in the background when I ran this one. This was 300 milliseconds or so faster, so I don’t know.

We got a speed boost from something. Sure wasn’t Microsoft. But looking at this execution plan, there are things that I do not love about it. For example, all of this stuff happens in row mode, like SQL Server.

It is the year 2025. We are in database compatibility level 160. You have, I’m using developer edition, which is an enterprise edition equivalent skew of SQL Server. The batch mode on rowstore feature is there.

It should, why wouldn’t you use it here? We are scanning 53 million rows just about. Why on earth would you leave this to row mode? What is on your mind?

SQL Server. Gosh. And it does it all throughout the plan. Another way that you can tell you’re not really getting batch mode on rowstore is because we still have these repartition streams operators. Now, sometimes these can still show up in mix batch and row mode plans.

But since none of the parallel exchanges support batch mode, we know that these operators are not happening in batch mode. Neither are these compute scalars. If you squint really hard or I zoom in like a reasonable presenter should do, we will see that these occur in row mode, as does this big old whopping hash join here.

Gigantic hash join. Huge hash join. Right?

52 million rows come in from each side. And what do we do? Row mode. SQL Server. Smackity smackity smack. What is on your mind, buddy? Let’s try this again.

Let’s give SQL Server some ideas about itself. Let’s say, hey, SQL Server. What might be a good idea here? Now, this table columnstore helper is a completely empty table. I’m just going to type in a demo real quick.

dbo.columnstorehelper. Just because I want you to see the execution plan here. We return no rows from this. The execution plan shows that we have a zero row clustered columnstore object in our database.

You can do this with a temp table or whatever other kind of thing you want to slap a clustered columnstore index on. But all we’re going to do here is say left join to our columnstore helper on one equals zero. One can never equal zero.

But there is now an object with a columnstore index on it somewhere in near or around our query. And so the optimizer is going to think somewhat differently about things. So if we run this, remember that was about eight and a half seconds, right?

We’re going to just go with it there. Now, what are we down to? 2.4 seconds. Jeez, SQL Server.

I think batch mode might have been a good call here. What do you think? What do you think? How do you feel about that one, SQL Server? Should we have used batch mode? Was that a good idea?

Well, probably. So now we have, I mean, we still have a scan of the votes table on both sides because we don’t have like a where clause on the votes table that we could like, you know, like filter rows and whatnot out. But you might notice that like, you know, this takes about 800 milliseconds.

Before, this is like 1.7 seconds. This takes just about the same time, right? What’s off by 20 milliseconds here. Not that big of a deal.

But, you know, especially considering what these times were before. And all of these things are happening now in batch mode, right? We have a batch here. We have a batch here. Yeehaw.

Look at us. Good job. Even our compute scalars are happening in batch mode. And now that big giant hash join that we were doing before in row mode with 52, almost 53 million rows coming on in from each side are happening in batch mode. Even this top end sort is happening in batch mode.

Now, like I said before, the parallel exchanges do not support batch mode. Boo. Neither does the top operator.

The top operator also does not support batch mode. So the plan timing in this one looks a little funny because, you know, for batch mode operators, which is like, you know, all this stuff, the times that you see in there are just the wall clock time spent in that specific operator. So it’s like 800, like in a row, in a rowstore query, it’s cumulative going from right to left, like the child ones build it up.

I’m going to show you a way to change that in a second. So these numbers, like these numbers are all just for the individual operators.

But by the time we get over here and we get to these row mode operators, these ones add up all the times for the stuff that happened before them. So the 2.4 seconds you see here and here is not 2.4 seconds a piece. You remember this whole query finished in about 2.4 seconds.

We can validate that by going to the properties and looking at query time stats and seeing that there was about 18 seconds of CPU time and 2.4 seconds of elapsed time. So that’s one good sign that your queries could possibly do with some batch mode is when you have gigantic scans of tables, especially in parallel, and big old hash joins. But they’re happening in row mode.

It’s usually not what you want, right? Especially if you have any say over it. Batch mode really helps because the more rows get involved, you know, row mode just does exactly what it sounds like. It processes a row.

Even though we’re not like using a cursor or a loop or something like that, iteratively, like inside, like this is why query plan operators are often called iterators. Because they are iterating over rows. And SQL Server in row mode pipelines all this stuff, so it’s like one row and one row and one row.

Granted, that happens pretty quickly because, you know, the people who made SQL Server were pretty good programmers or something like that. But batch mode is much faster here because batch mode processes up to 900 rows at a time depending on the size of those rows. Sticks all those rows on a CPU register and uses something called SIMD, which is single instruction multiple data, to run CPU instructions over batches of rows at a time.

Which, which when you have many millions of rows is typically a good idea because it removes all the CPU boundness from your queries. So let’s look at another example that’s, that builds on this example. So I’m using a trace flag here, 7418.

This one came out in SQL Server 2022. And what this, what this trace flag does, which is, you know, technically undocumented and unsupported. So, you know, don’t go messing around in production with this one because who knows, right?

I can’t tell you everything that it does in effects or even if like that might cause stack dumps or assertion errors and, you know, whatever else product failures. I can’t tell you. So for, for demonstration and testing purposes only, we are going to run this query, which builds on the query that we were just running.

So what this is trying to do is add in some more information about missing IDs in the votes table. Let’s say that we wanted to summarize all of the data that we, all of the missing ranges in here. So we want to find the range, like the start range of when things go missing and the end range of when things go missing.

This is a query that will do that. We have our ID plus one. We have our min ID minus one.

And we have our not exist query here in order to find the non-matching rows with our terribly non-sargable predicate here. And then we have this final, this final predicate on our query here in order to figure out like, like, cause we don’t want to get like the last value in the table because that like, that’s not actually a missing one. That just, we just don’t need that last bit.

So I’m going to run this whole thing at once. And the two things that I want to show you are one, I mean, the query plan is the most important part. We’re dumping this into a temp table.

So it doesn’t matter much what else we’re doing with it, right? We’re not, not doing anything else terribly interesting, right? We put 5.4 million rows into a temp table, but this execution plan also happens entirely in row mode. I know it’s a little hard to see here, but you know, when, like, like if you, if you just kind of understand, the pattern of the, like the, like what, what you’re going to see in these tooltips, everything is happening in row mode.

Even once again, this gigantic hash join between two tables and all the, all the work that gets done in here. So this is, this is like very similar. Like you might, so the important thing here is like most of the operators in this query plan are not eligible for batch mode, but a small segment of them in this query plan are, in this section of the query plan are like, like granted, like any of the like data acquisition operators, like clustered index scans are absolutely eligible, but you know, things like top and stream aggregate and the, the parallel, parallel exchanges, like repartition streams and distribute streams and gather streams over here.

Aren’t nested loops joined sure. Isn’t, but I wish it was, it’d be so cool if it was, boy, I wish we had batch mode nested loops. I don’t know.

Maybe that’s just really hard to do, but this whole thing once again takes, uh, well, this one takes a little bit longer, right? This one actually, if we go over to the very end here, let’s go look at the properties and let’s go look at the query time stats.

We’ve actually been lied to a little bit. The elapsed time on this was actually almost 14 seconds, 13.7. Why that doesn’t show up here, uh, appropriately?

Well, uh, you know, like I said, the, the, the, the, um, um, wall clock time on parallel exchanges is bonkers. Bonkers. Not in a good way. Not in like, this is going to be a fun night bonkers.

Like this is like, Oh, I’m getting arrested. Cool. So like you might have a query plan where the, like a lot of the operators are not eligible for batch mode, but you might still spot like this is, this is like, like the, the last one I showed you was very simplified.

This is like that same section of the query plan, but with a bunch of stuff around it. Cause I want to teach you what to focus in on, which is like this pattern in here. Thank you tool tip for showing up on, on, on, on uninvited.

So it’s like this pattern in here, like we spend, like, if we think about like the amount of wall clock time that we spend in this plan, a lot of it is right in here. Right. Like there’s a lot going on in here.

So let’s do what we did before. Right. So I have trace flags 74 18 on. So this, this query plan is showing all of the operators is only having the wall clock time of themselves. Right.

Like even the row, even the row mode ones are only showing like the wall clock time that they consumed. So that’s like, did I say that? I think I forgot to say that about the trace flag. That’s what this trace flag does.

It makes it so when you have a query plan, all of the operators in the query plan, uh, will only show the wall clock time that they are responsible for. So it makes row mode plans act like batch mode plans, uh, in the, in that timing regard. So like all the stuff that you see in here, you know, even though it’s not happening in batch mode, just uses that timing.

So let’s, let’s put this in now. Right. So we have like just a hair under 14 seconds for this. Right.

Remember this lied to us when we looked at the query time stats wall clock time was almost 14 seconds. So let’s put this in, right? We’re going to tag in our columnstore helper friend here and we’re going to run this and we’re going to, we’re going to see what happens. I spent all that time warning you about the trace flag without ever actually explaining what the trace flag does.

So this no longer takes, uh, this no longer takes almost 14 seconds. If you look at the query time stats for this, now we are down to six and a half seconds. And the, you know, like the, the sort of annoying thing is that like, like a lot of the, like the plan that we get, it reuses a lot of the operators from the last plan.

Right. Like we still have like this whole section is still identical and this whole section is still identical, but this section in here now is all batch mode. And we can tell it’s all batch mode because the repartition streams that used to be in here are gone.

So it’s a little annoying that we were like, Hey, SQL Server batch mode would be really cool to do here. Wouldn’t it? And SQL Server was like, I gotcha.

But then like we get a, we like stream aggregate does not support batch mode, right? Like this thing, like no batch mode, this thing. I don’t know. Like it could get batch mode, right?

Like it’s, it’s possible, but this, this thing, no, it just uses row mode. But up here, this scan of the votes table uses batch mode. Right.

This scan of the votes table uses batch mode, even though the storage is rowstore. Don’t get too, don’t get confused there. The compute scalars are both in batch mode and this hash join happens in batch mode. So we were able to at least affect part of the plan with that columnstore index being in there.

We didn’t get a fully batch, full batch mode on rowstore plan because batch mode on rowstore, of course, goes much deeper into query, like the, into like the query optimizer. Optimizer than just sort of tricking SQL Server into like, Ooh, you tripped and fell and landed on some batch mode. So like, like we got at least partial batch mode in here, which improve things, but we don’t get like the full batch mode experience.

There are of course, you know, further, there are of course ways that we could change this query to probably make it a bit more batch mode friendly. But that sounds like the subject for another video. That sounds like a great video, Eric.

So we’re going to call this one here. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you over in the next video where I’m, I have not decided what, what I’m going to do next. So it’ll surprise you as much as it surprises me.

All right. Thank you for watching. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About: Old vs New Cardinality Estimators In SQL Server

A Little About: Old vs New Cardinality Estimators In SQL Server


Video Summary

In this video, I delve into the reasons why I often prefer the legacy cardinality estimator over the default one in SQL Server. Using a practical example involving an identity column backfill process, I demonstrate how the legacy CE can provide more efficient execution plans compared to the new CE. By running the same query with both estimators and comparing their performance and execution plans, I highlight key differences that might influence your choice of cardinality estimator during query tuning. Whether you’re looking for a straightforward explanation or want to see real-world implications, this video offers valuable insights into when and how to leverage the legacy CE for better query optimization.

Full Transcript

Erik Darling here with Darling Data and we are going to talk in this video because I actually answered an Office Hours question recently where someone asked why I generally prefer the old cardinality estimator, old or legacy, compared to the new cardinality estimator or default cardinality estimator as Microsoft and all its blue Azure-y hubris calls it. And I’m just going to show you an example today of a query. Now, I know this is an example of one query, but it’s just a good example of kind of generally why I tend to prefer the legacy cardinality estimator and why when I am tuning queries and I am using a version of SQL Server and a database whose compatibility level dictates that we are using using the default cardinality estimator using the default cardinality estimator and using the legacy one. This is not, of course, a thorough undoing of everything that the new cardinality estimation model attempts to, you know, do differently than the legacy model, but it’s just an example of why I tend to prefer it and why I will always try it out. There are, of course, worst times in my query tuning life when I’m maybe using the legacy one and I might say, hey, let’s give the newer one a shot. Let’s give it a try. Let’s see how it goes. The worst thing that happens is that either the query finishes in the same amount of time or takes longer and we can say that didn’t work. Let’s try something else. So here we go with that. But before we do, of course, I mentioned Office Hours. If you want to ask me questions privately that I answer publicly, that link right there is how you do it. It’s down in the video description. There’s also a link where if you think that this channel is worth is as little as $4 a month, leaving your bank account and going into my bank account, you can sign up for a channel membership to support all of this wonderful material that I produce. I am also an acting SQL Server consultant active, maybe play play one on TV, play one on YouTube. If you need help with your SQL Server, you can hire me. And as always, my rates are reasonable. Hooray for reasonable rates. Anyway, my performance tuning training, if you want it for about 150 bucks for the rest of your life, that’s the link, that’s the discount code. The forming pattern here is that it is also in the video description.

My T-SQL course, Learn T-SQL with Eric is also available currently at a pre-sale price of $250. I recently finished recording all of the beginner material and am hard at work working on the advanced material now. So that will get done after the summer. The price of the video course will go up to $500 when that is complete. So I suggest you do that sooner than later. I am speaking a lot all over the place this summer. New York City, Dallas and Utrecht, August 18th to 19th, 15th to 16th, and October 1st to 2nd. Of course, all that is right before I go to Seattle for Pass Data Community Summit. Kendra Little and I will be delivering two T-SQL pre-cons at Pass Summit together over two days. So I hope to see you at both of those. But with that out of the way, let’s talk about this turkey here.

Now, let’s say that we have a table and that table and it has an identity column. Let’s pretend it’s called the votes table and let’s pretend it’s in the database called Stack Overflow 2013. Think that’s reasonable, right? The reasonable set of things that we can pretend. And we realize one day that our ID column is not as contiguous as we would like. And maybe we would like to go and backfill it. So we start designing a process to find all of the missing rows in the votes table. In this case, our job is to find the first, the lowest value that is missing from the votes table and then assign that to something and then do an insert to start backfilling rows in there.

Maybe that sounds a little silly, but I’ve seen plenty of places start needing to backfill their identity columns. And, you know, depending on various local factors, this might be a reasonable way of doing it. So if we just select the top 10 from the votes table and we look at the ID column, when compared to the row numbering column, right, this is not a column from the table.

This is just the row numbering that comes back from SQL Server Management Studio. We’ll notice about right here that things go a little amok on us, right? We are clearly missing ID 8, right?

We are like this goes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. This goes 1, 2, 3, 4, 5, 6, 7, 9. Ah, boy, we’re missing ID 8. All right, so our goal is to write a query that will find us the earliest missing ID.

Sounds pretty easy. One way of doing that would be to write a query that looks about like this. You would say I want to select the top 1 missing ID, which in this case, because of what we’re trying to accomplish, we would need to add 1 to the ID column here.

And we’re going to say where not exists, select doesn’t matter from the votes table, where v2.id minus 1 equals v.id. And we will order our results by v.id to make sure that they stay perfectly deterministic.

Since ID is already the clustered primary key of the votes table, it is guaranteed to be unique. And since it is indexed, it is presented to us in order. Now, you might look at this and scream a lot about sargability and whatnot, and I hear you.

I hear you. There’s a lot to be said for sargability. But we’re going to run this query in two ways. Right now, my database is in compatibility level 160.

1.6.0 for SQL Server 2022. So I’m going to run this query once using compatibility level 160 and all its attendant properties. And then we’re going to run the exact same query the exact same way down here, except we’re going to add in this use hint to force legacy cardinality estimation.

All right? So that’s the only difference between these two things is one is using the default cardinality estimator, and the other one is using the legacy cardinality estimator.

You might notice that it’s been a little while since things started running. You would be a very observant person if you picked up on that. These both return ID8, right?

They both return the correct result, but the execution plans are quite different. So this top plan is using the default cardinality estimator. You’ll notice that it took 8.6 seconds right here.

And if we scroll over here a little bit, yeah, my head’s not in the way. We’re off to a great start, aren’t we? We spent 1.7 seconds fully scanning the votes table. We spent 2.2 seconds fully scanning the votes table here.

We’re going to ignore the timing on the repartition streams operators for now because the wall clock timing on parallel exchanges is a complete disaster. So we’re going to ignore that for the time being.

And we’re going to look at this. So this is just about this whole section in here is where 8.5 seconds winds up, right? It’s not like something weird happened over here.

Like we have this top end sort, but this top end sort wasn’t like spilling a bajillion, 52 million rows to disk. And we were like, ah, God, we can’t take it. It’s tempdb.

We broke tempdb. It’s not that. If we look down here at this query where we use the default legacy cardinality estimator, oh, dear old me, this query chose a completely different execution plan, right?

One, it’s single threaded, right? I mean, first off, you might want to know it’s right here. It takes one millisecond.

This is a single threaded execution plan. There is no parallelism at all in any of this whole entire thing. But there are some funny looking numbers.

Like, for example, 900 of 1, 4, 5, 4, 6, 0, 0, 0. So that’s eight digit number. That’s 14 million rows, 14 and a half million rows.

It only took 900 rows for us to find it. So the big difference here is if we look at the properties of the clustered index scan here, there will be this row, estimated rows without row goal.

So SQL Server estimated that it would have to read this many rows to get stuff out of there. But, you know, the number of rows that it took was 900, right? So the actual number, we only needed 900.

SQL Server was like, it might take a while, but we only actually ever needed 900. But the important thing here is that this exists here, right? So we have estimated rows without row goal here.

We have estimated rows without row goal for the second time we touch the votes table. But if we click on these up here, that estimated rows without row goal thing disappears, right? It’s not in here, right?

Even though we have a top in there, SQL Server used a top end sort up here. So this one just used a regular top. So some slight visual differences in the execution plans. But this is sort of in general why, again, when I’m tuning queries and I am using the default cardinality estimator and I get a rather suspicious looking plan, I say to myself, Eric, we should check in on that legacy cardinality estimate.

We should see how Legacy Cardinality Estimator is doing today. Let’s see. Maybe we can bring it some snacks or, you know, just go give it a call. Have a little chat with it.

See how it’s hanging in there. Because a lot of the times, even if you don’t see like a performance difference this drastic from like eight and a half seconds to one millisecond, you can at least, you know, get some feedback from it and see if there are any differences.

And, you know, sometimes you do see something this dramatic, just like I did. Again, you know, you can go on and on about sargability and subtracting one from something that you’re comparing here. But, you know, Legacy Cardinality Estimator just does a better job here.

So you might find this in your queries as well as you are going through and tuning things. And like I said earlier, you may also find the opposite is true sometimes. You may find that the Legacy Cardinality Estimator does a rather rotten job of things occasionally.

And you might find that testing out the new Cardinality Estimator will do a better job. If you want to test out the new one, what you can do is say force default Cardinality Estimation. And you can use that use hint to test your queries out using the default Cardinality Estimator.

But anyway, that’s enough for now. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And I will see you over in the next video where I’m going to talk about… So I did a couple of videos sort of recently about writing batch mode friendly queries. And I realized one thing that might be useful for people would be a small bit of education on how to recognize query patterns where batch mode may be useful.

So we will do that. And one of these queries might even make an appearance in there. You might even see this exact same starting point.

So that’ll be fun for us anyway, won’t it? All right. Thank you for watching. All right.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 22

SQL Server Performance Office Hours Episode 22



To ask your questions, head over here.

In retrospect, what was the best thing about SQL Server 2022? For me, it’s Query Store hints.
why does index cleanup replace unique constraints with unique indexes?
why table value parameters get a different estimate than table variavles?
why do you not care about logical reads?
I missed you at SQLBits this year. Will you be there next year?

Video Summary

In this video, I dive into answering five user-submitted questions during an office hours session, providing insights and solutions to common SQL Server challenges. We cover topics ranging from the best features of SQL Server 2022 to why unique constraints might be replaced with unique indexes in certain scenarios, as well as the differences between table value parameters and table variables when it comes to query estimates. Additionally, I explain my reasoning for not focusing on logical reads when identifying slow queries, emphasizing that duration and CPU usage are more telling indicators of performance issues. The session also includes a bit of personal reflection on upcoming SQL Server events and community summits, including Pass Data Community Summit in Seattle, where Kendra Little and I will be delivering T-SQL pre-cons. Whether you’re a seasoned DBA or just starting out, there’s something for everyone in this episode!

Full Transcript

Erik Darling here with Darling Data. Nice to see you too. Fancy meeting you here, all that good stuff. It is time for an office hours in which I answer five user submitted questions. I don’t know how many users actually submit these. It could all be one person. Or I don’t know. It could be five different people. Anyway, I hope that I answer your question this week so you don’t feel left out. But before we do, the usual the actual old song and dance, if you would like to support this channel, memberships are available. I have an unlimited supply of those. If you enjoy this content and you would just want to support my efforts to keep caring enough about doing it, you can sign up for a membership. Otherwise, you know, all the other stuff. If you want to ask me questions that I answer on these episodes, I have a slightly different URL up here now. This one, goes actually to my website rather than directly to the Google form because there is some additional information on the website about if you need to ask questions about code or execution plans and you need to share them. So I’ve changed the link there a bit and I suppose I’ll fix it in the YouTube videos as well. Or at least the ones that I, I don’t know. We’ll figure it out. Anyway, I’m available for consulting as well. I got an unlimited supply of that. Never seemed to run out of consulting, health checks, performance analysis, hands on tuning, dealing with performance emergencies, and of course, training your developers so that you don’t have any more performance emergencies. Good Lord, that’s quite a bit of service. And as always, my rates are reasonable. My performance tuning content, all of it 24 hours is available for 75% off, which means about 150 US dollars. You can you can of course go to that link and plug in that discount code to get the everything bundle over there. And if you want to pick up my new T-SQL course, which has all 23 ish or so hours of beginner content currently published, you can do that now for the pre-sale price of 250 bucks. That’ll be going up to 500 bucks after the summer once the advanced material lands.

And speaking of summer, boy, is it hot out. New York City, Dallas and Utrecht will all be graced with my presence over the summerish months with the Pass on Tour events. Redgate has decided that they’re going to smuggle me to various places to talk about SQL Server stuff. So that’ll be fun. Especially fun for you, I hope. And of course, Pass Data Community Summit will be in Seattle, November 17th to 21st, where Kendra Little and I are delivering not one, but two T-SQL pre-cons. So we’ll have a great lot of fun there. But with that out of the way, let’s do this whole office hours shindig. Let’s have some fun here. All right. Our first question. Let me, where is Zoomit? There you are. Where’s my little pink dot buddy? There we go. All right. In retrospect, what was the best thing about SQL Server 2022? For me, it’s query store hints.

I wish I had the same love and affection for query store hints and plan forcing. I suppose they’re great when they work, but it’s less fun when they suddenly stop working and you’re like, wait, what happened to the thing that I just told it to do? It worked for a while. Why is it not working now? And then you have to go do it again and kick plans out. It’s not fun. As far as SQL Server 2022 features, let’s see. I don’t know. That was 2019. No, well, that’s going to be in 2025. Gosh, you got me. SQL Server 2022. I suppose there were some decent linguistic improvements to window functions.

But like features, I don’t know. Let’s just let’s just throw it out there for for again. It’s cool when it works, but probably the parameter sensitive plan optimization is a nice, as they say, down payment on, you know, fixing quite a quite a pernicious issue in databases generally. So that’s that’s that’s about it there. 2022. Kind of a kind of a bummer. Kind of 2014 ish, kind of 2017 ish in that it’s it’s not very interesting generally 2025. I don’t know. All right. Here’s a good one. Why do index cleanup replace unique constraints with unique indexes?

Well, my friend, you’re you’re you’re referring to my store procedure SP underscore index cleanup. And the reason why it replaces unique constraints with unique indexes, which is only sometimes is if you have a unique constraint on, let’s say, column a to get, you know, real, real, real worldy there. And you have, let’s say, an either a unique or non unique nonclustered index on column a, maybe with other key columns or actually, no, not with other key columns. I lied. Other key columns would mess it all up on column a with like other included columns.

Then, then, then SP index cleanup will either make either either drop, give you a script to get rid of the unique constraint because unique constraints are backed by an index anyway. So if you already have a unique nonclustered index on that column with some includes, then like you don’t really need the unique constraint still. But if you have a non unique nonclustered index and what it’ll do is give you a script to make the non unique nonclustered index unique and also get rid of the unique constraint because it’s sort of a duplicative facility at that point.

So, um, like, you know, when, like, there is like, I guess a question that sometimes comes up, it’s like, oh, should I use unique constraints or unique indexes? And, um, you know, I, I do prefer the unique index because you have a bit more flexibility with the unique index than you do with just unique constraints as far as like included columns and like some other options go. So, so that’s, that’s, that’s about that there.

All right. Next up. Let’s see here. Um, got it. The same person, right? This one, uh, why table value parameters get a different estimate than table variables? Well, um, I’m going to guess you mean table variables.

Uh, so table value parameters are of course backed by, uh, table variables or rather presented to store procedures with table variables. Uh, but since they are presented to store procedures as parameters, uh, table valued parameters tend to get parameter sniffed the way that, uh, other parameters do. But, you know, it’s, it’s really only for like the table level cardinality.

So like you might find that you execute a store procedure with a table valued parameter. And when you pass it in with, let’s say like 10,000 rows, then you get a 10,000 row table cardinality, table level cardinality estimate from it. Um, and then it’ll keep that until you, you know, so recompilation occurs for whatever reason.

Uh, but then if you were to pass it in instead for first compilation with a table valued parameter that has a thousand rows in it, it would just use that thousand rows over and over again. So table valued parameters are a little bit different in that they tend to get sniffed like parameters rather than, um, you know, being treated like table variables, which, uh, the, the cardinality estimates that you get, get from those do depend a bit on, uh, version edition and, uh, database compatibility level among some other things. All right.

Hey, let’s look at this well, well, well-formed, well-structured question. Why do you not care about logical reads? Well, just because logical reads don’t tell me how long if a query was fast or slow. They don’t indicate that.

Uh, I want to find queries with performance problems. So I go looking for queries that, you know, uh, have a high duration and or a high CPU. Uh, so that’s, that’s it.

Logical reads don’t, don’t tell you if a query was slow or not. I want to find slow queries. So I find queries that use a lot of some mixture of wall clock or CPU time. I don’t know.

That, that, that seems, that seems fairly straightforward to me. Uh, anyway, let’s go on. Uh, oh, oh, how sweet are you? Hey, look at you. Someone, some lucky person.

Hey, wait a minute. Come on, zoom it. Uh, some lucky person out there got to go to SQLBits. I missed you at SQLBits this year. Will you be there next year? Another well-structured question. Good job out there.

Uh, I don’t know if I’ll be there next year. Uh, SQLBits has changed the way they do their pre-cons. And, uh, they, they are now curated. Uh, they curate the speakers.

I was not curated for this past SQLBits. So if you would like to see me curated, uh, for SQLBits, then you are, of course, you know, welcome to express that opinion to the SQLBits organizers.

I don’t know how much good it will do. I don’t know. I don’t know, uh, what their curation process is. But, um, I don’t know. Maybe, maybe there’s a cure for it.

Anyway, uh, that, that, that gets us through five questions here. Uh, they’re short ones this time around, I guess. It’s easy for me then. Uh, anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you… …soon. Somewhere. Somehow. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Carry Over Sort vs Batch Mode Window Functions

Carry Over Sort vs Batch Mode Window Functions


Video Summary

In this video, I delve into an interesting query optimization topic that has been relevant for years but is now less pertinent due to the advent of batch mode in SQL Server. I explore why certain complex query syntaxes were necessary before batch mode existed and how they can be replaced with simpler, more efficient methods today. Specifically, I demonstrate the carry-over sort method—a technique often used when window functions weren’t available—and contrast it with modern approaches using window functions. By running sample queries in SQL Server Management Studio, I illustrate why the old method is not only slower but also less accurate for certain scenarios. The video aims to provide insights into query optimization and help viewers understand how leveraging batch mode can significantly improve performance without resorting to outdated techniques.

Full Transcript

Over the years, there’s been a lot of very interesting and intricate query syntax ginned up to deal with row mode performance issues that don’t really exist anymore in a world where batch mode exists. One of them is called the carry over sort. I don’t know if you’ve maybe ever run into a problem like this or whether you’ve ever stumbled upon this problem. I admit that the name is a little outlandish, but it comes up a lot when you’re dealing with data analysis type work. And, you know, prior to there being batch mode, there was one specific way that you typically wanted to write queries if you wanted to sort of emulate what what a windowing function would give you for finding the max value for a row, not per column in a row, but just per row, that batch mode largely solves. So, I’m going to talk about that today. Once we get over to SQL Server Management Studio, I’m going to show you probably why the query you think that you want to write is not right. Just how slow, how the process of the process of doing this using row mode is and how batch mode improves that. But of course, we’re going to contrast that with the carry over sort method, the way that it has been presented. This is not my syntax. It’s something that I’ve run across and I just wanted to make sure that I was comparing it accurately to modern versions of SQL Server. Of course, you may still need the carry over sort method of doing this if you are using standard edition of SQL Server. The reason for that is because if you are using standard edition of SQL Server, at least as of the recording of this video, Microsoft still does not think that you have paid them enough to be your friend and they have deliberately hobbled anything involved in the way that you have paid them enough to be your friend.

there’s nothing to build. Yo, of course, otherwiseulanning you have final version of SQL flagship and will be rewrite your friend or of Spain. you have had made all the power졌uj assodo.com Be it would be an exact same way that you have posted positive feedback button to the world And that is now patreon.com buradazt neuron will понять how to run substitute feedback of how to run with optimal name in optimal name know if you want to contribute to this channel you can sign up for a membership link down in the video description for that it’s just a show of appreciation for all this cool content that i write and record for free you can also like comment subscribe and ask me questions for my office hours episodes if you need sql server can help consulting help boy howdy i do all this stuff and as always my rates are reasonable if you want to get all my performance tuning training i have 24 hours of it for about 150 us dollars uh you can in that in that lasts you for life you get it forever and always uh just go to that url plug in that discount code which is also a fully assembled link down yonder and you can you can you can start your learning today uh my new t-sql course uh the beginner content is all done and recorded it’s about 23 hours over 69 modules uh if you are going to attend kendra little and i’s uh t-sql past pre-cons you will get access to this material and uh right now the course is on pre-sale for 250 bucks it will go up to 500 bucks when the course is fully recorded after the summer i am hard at work on all the advanced material now so that isn’t that isn’t that spectacular for you how hard i work uh if you would like to see me live and in person i am going on tour with the red gate road show uh the pass on tour dates new york august 18th and 19th dallas september 15th and 16th and utrecht not just an art supply store it’s a hamlet in the netherlands october 1st and 2nd and then of course past data community summit in seattle november 17th to 21st so with that out of the way uh let’s let’s pratty or party whatever it is so i have uh pre-run a couple things here uh the first the first two things that i have pre-run are uh the the the the version of this query that a lot of people uh will write uh or maybe tried to write at some point in the in the far distant past and we’re unhappy with the performance of uh which is basically to uh select some stuff within a CTE and most importantly in that CTE generate a row number and the goal of the row number is for each post type id because that’s what we’re partitioning by we want to order by uh the the creation date uh converted to just a date it’s a date time we’re just converting it to a date then ordered by owner i user owner user id descending and then ordered by id descending with id descending acting as a bit of a tiebreaker because id is unique and these other things are not guaranteed to be unique either individually or in concert the second thing that i’ve i’ve run is a query that this is hinted to use uh optimizer compatibility level 140 because i do not want batch mode on rowstore to kick in and be enabled for this query the second thing i’ve included with the first uh first running of things is the query that a lot of people think could replace this but this is not the right query to write for this because what this is doing is it’s getting you the max value for each column grouped by post type id that is not correct correct because the the the real sort of algorithm if you want to call it that is that we are ordered first by creation date then ordered by owner user id then ordered by id this is getting the max for each of those individually the first thing that i want to show you is the results because showing you the results shows you where these two methods uh no longer agree as far as uh the included data goes so uh post type id one the owner user id is different between these two uh it’s also different it’s also a different owner user id for post type id two and post type id well three of course right there um the ids are different for these two as well uh so really the the max method just does not give you the correct results there’s just too much different in here focusing over on the execution plan uh we can ignore this one because this one does not give us the correct results so let’s just get that like way out of the way we’re not thinking about this one at all this is the row mode version of the query and you’ll see if you follow along the operator times here this thing runs for just about well a little over 14 seconds i like to say 15 seconds that feels good to me so let’s just say this thing ran for like 15 seconds right nice nice fizz buzzy number there uh and and like really you know there’s just not a whole lot to say about this generally uh in row mode queries like this are quite painful even if you have a reasonable index for sql server to use to make the the window function go faster um you you often it’s often just a terribly inefficient way of writing and running the query so what what i want to show you next is uh the carry over sort method so what the carry over sort method aims to do is get the max you see the max starts here and the max ends way down here but what the max is doing is basically assembling a string based on the getting the max of all three of the columns that we care about the normal carry over sort thing uh does not include as much complexity uh in uh for the second column as mine does the problem that i was running into is that there are negative owner user ids in the post table and sure i could have filtered them out but that’s that’s cheating a little bit uh we want to maintain uh all user ids we want to make sure that we even include the negative ones because what when the max is when the max is a negative number then we we need to consider that don’t we we can’t just not return a result for a row because we didn’t feel like dealing with some abnormal potential abnormalities in the data so i have a case expression here and the case expression just says when owner user id is less than zero then i add some x’s to the left uh which um is different from what we’re doing when owner user id is greater than zero or greater than or equal to zero uh because with that we are right padding the number so we’re like adding like zeros to the right of it the reason that i did the x’s is because if i zero padded it things would have gotten messed up down in the select query in the select query uh we are basically asking for a substring and converting the substring to the correct data type so if i were when you do that with the id column and you have a zero padded number and you convert a zero pad right zero padded number to an integer it’s no problem right because like we just you just remove the zeros from the front of it and you give give the rest of the number the problem with the negative number is that you have you essentially have a string and if i added zeros in then i wouldn’t know if there were naturally occurring zeros in the number uh so i used x’s and i replaced the x’s to avoid confusion there but the carryover sort method just does this we still group by post type id we’re just we’re using the max function across like we’re we have three different columns in the max function uh post type id sorry creation date owner user id and id and like we’re assembling like the max across those three right so max encapsulates all three of those columns and if i run this query and we got a little bit of highlighting to do here don’t we uh this will return correct results at least if we at least compared to the first query that we ran with the window function and uh so like all the results here match what we get from the window function version but we get it much faster what’s cool here is that none of this query even though we’re in compat level 160 uses batch mode the scan of the post table happens in row mode you can see that just sort of over right next to my big head there uh this compute scalar also happens in row mode the hash match aggregate which is totally eligible for batch mode still goes in row mode uh so basically and like and of course parallel exchanges like gather streams don’t support row mode so this whole thing finishes in about two seconds without using any batch mode whatsoever so back before batch mode was really a cool useful thing this was a good method to get the max value per row like i said earlier this can still be very useful if you’re on standard edition because microsoft you didn’t if you’re on standard edition microsoft doesn’t think that you paid them enough to have your queries perform well so you don’t get like even if you’ve got batch mode to like batch mode to happen here you would be limited to a dop of two for your batch mode queries so you would not be like like i have max dop set to eight for this this thing will have used like a dop of eight for uh the query right you can see degree of parallelism right there eight so this thing would have used uh eight cores up for for this whole thing and spreading the workload of 17 million rows out across eight across eight cores nice and efficient is pretty efficient for row mode uh and for batch mode but like 17 million rows across two threads even in batch mode you’re likely going to see some performance fall off there like whatever the like i don’t know maybe maybe you’ll get real lucky and the trade-off won’t be too terrible but um that the standard edition limitations there are really quite a pain the next thing that we’re going to that i want to show you is the uh the the window function version of this again but without that compat level uh 140 restriction on it down here to prevent batch mode on rowstore for kicking in so if we run this query this will take about two seconds now right we get back the correct results this finishes in 1.8 seconds just like the carryover sort method that i showed you above and we don’t have to write that crazy max syntax where we convert like dates and numbers to strings and pad things and all that other stuff with the the synthesis developer edition which is the enterprise edition equivalent skew i’m not using uh 2025 standard developer edition because why would i hurt myself that way um that we we are we are just using regular uh edition here regular developer edition here so this runs at a degree of parallelism of eight and this runs nice and quickly and efficiently so if you’re writing uh window fun so like you know like really the the idea of this window function right it’s like we just want to get like the top row for each post type id so i’m filtering to where row number equals one for all these if it’s if your goal is to find like the max value of something for a row based on whatever criteria i just kind of picked three columns at random from the post table that seemed to like make sense i guess i could have thrown score in there if we felt like it um but uh if you’re doing that like i still would have had to deal with potentially negative numbers because scores can be negative in the post table so that like that maybe that wouldn’t have saved me too much time or trouble with the carryover sort syntax but uh if your goal is to like sort of find data like this and you’re filtering to where row number equals one batch mode can make these queries crazy fast like you don’t even have to add indexes like just let batch mode on rowstore kick in read from your table and batch mode process all the data in batch mode like it’s a way better uh way of running like big data crunchy queries like this if but if you don’t if you’re not in a situation where batch mode on rowstore can kick in for you like if you’re not on sql server 2019 plus and you’re not on enterprise edition and your database is an incompatibility level 150 or better and there’s no batch mode on rowstore uh like naturally occurring and you know like depending on if you can change stuff like you know you can you can mess around with like you know columnstore indexes uh you can mess around with uh putting like a like just like having an empty table in your database with a clustered column store index or creating a temp table with a clustered columnstore index and like left joining to that thing uh you can do all sorts of stuff to get like partial batch mode on rowstore but it it does not go as deep into your query plans as batch mode on rowstore like the intelligent query processing optimizer feature does just kind of weird i think but it’s just something that you learn to live with when you are tuning queries across a variety of strange environments anyway uh i this is just something that caught my interest and i felt like talking about um uh i i hope you enjoyed yourselves i hope you’ve learned something i hope maybe there was some good educational point in this video that you are able to take away from it even if it’s not uh memorizing the the crazy syntax in here uh i feel like perhaps there were a good a few good educational moments aside from that but anyway thank you for watching and i will see you over in the next video adios you

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Window Function Tricks for SQL Server 2022+

Learn T-SQL With Erik: Window Function Tricks for SQL Server 2022+


Video Summary

In this video, I delve into the latest enhancements to T-SQL in SQL Server 2025 and reflect on what Microsoft has or hasn’t done to improve it. While there are some minor additions like regex support and aware clauses, much of my focus is on the improvements from SQL Server 2022 that make working with window functions more efficient. I explore new features such as `ignore nulls` and `respect nulls` in `lag` and `lead` functions, which significantly simplify finding last non-null values. Additionally, I discuss shared window clauses, demonstrating how they can be used to reduce redundancy and improve query readability by allowing common window specifications to be defined once and reused across multiple window functions. Despite these useful additions, the overall sentiment remains that Microsoft has largely neglected T-SQL enhancements in recent releases, leaving SQL Server users with a sense of being left behind compared to other database systems.

Full Transcript

All right, so we are going to continue with a little bit of teaser material from the Learn T-SQL with Erik course. Again, that is still on presale price until after the summer when the advanced material drops. There’s a link to purchase it down below. It’s 250 bucks right now. It’ll last you for the rest of your life. And of course, if you are attending past data community summit and you’re coming to Kendra Little and I’s T-SQL pre-cons, you will get access to this material for free because it is completely free. It’s a companion material to what we will be teaching. So, you know, SQL Server 2025. There are release notes all over the place for it. And there is not a single worthwhile enhancement to T-SQL to talk about. Sure, we got regex. Okay. You have any idea how many people that’s going to screw up? Regex and aware clause. I mean, cool. Like, as a consultant, like, yeah. But as far as, like, things I’m excited about, nothing. I think, you know, one way, one way you can sort of judge how much Microsoft cared about a specific SQL Server release is by how much T-SQL has, been sort of alleviated of the many things that it has been missing for many years that are in the SQL standard. And this one is rather laughable. You know, I guess Microsoft is busy trying to get Fabric to catch up with Databricks.

So they have ceased trying to get SQL Server to catch up with, like, every other database on the planet. So, cool. Anyway, the only T-SQL enhancements that I have thought were kind of neat were back in SQL Server 2022 when Windows Functions got a couple neat new things. Windows Functions got nothing in SQL Server 2025. We have once again been left in the dustbin. We are on the shelf. We are not having a good time.

So, like, if you ever spend time, like, just like, I don’t know, like, if you read T-SQL blogs for fun, you read SQL Server blogs for fun, you may have found a particular brand of problem across posts over the years called the last non-null value. This, of course, this, of course, this, of course, did get easier with window functions. Before window functions, it was, forget it, like, queries would never finish. But even with window functions, it takes, like, multi-step queries in order to get the last non-null value for something.

So, if we run a query like this and we say, like, you know, we get the last commenter is lag user ID one over order by creation date, you’ll notice that there’s a lot of, you know, nulls in here. So, if we wanted to find the last non-null value, we would have to, like, we would have to essentially, like, like, run this query and then run another query to sort of, like, to get those other values. It gets very complicated very quickly.

What SQL Server 2022 added is a couple things that you can stick into, like, the lag lead window functions to either ignore nulls or respect nulls. Now, you can think of it what you will, that SQL Server Management Studio 21’s parser has a bunch of red squiggles in this query because it does not recognize the syntax from SQL Server 2022. All right.

So, we have SQL Server Management Studio 2021, which became GA, like, I don’t know, a couple months ago at this point. And we have SQL Server 2022, which came out, like, three years ago at this point. And the parser is still like, I don’t know what that means.

So, you know, we got dark mode. Okay. But I promise you that this query will run successfully.

What I’ve added to this query are the lines ignore nulls for this one and respect nulls for this one. So, ignore nulls makes finding the last non-null thing a lot easier because this will give you the last non-null value in the column. Right.

So, this ignore nulls just gives us the value that we want over and over again. Granted, this isn’t a very interesting data set, but the respect nulls, we get all of this stuff back. Right.

Now, forever, we have had the ability to pass in a third input to, like, lag and lead and stuff. I’m just going to spread this syntax out a little bit so it’s a little bit more obvious what I’m doing in here and why there are some rows that have a very strange big number in them.

And that is because I am adding a third optional input to the lag and lead functions, which give you a default value for anything that would have been produced a null because of the function. So, the results in here, you’ll notice that both of these lines have the integer maximum for them.

That just, that’s because we filled in a blank with that optional third parameter. Now, there’s other neat stuff that came out in SQL Server 2022 for window functions as well. Like, you can now have shared window clauses.

So, like, if you were writing window functions with, like, similar, like, window specifications in them, you would have to, like, write that over and over again and your queries could get very, very big with window function specifications. But now what you can do is you can say something like this, right?

Notice we’re just saying over x here, right? And typically, over x would be like, huh, what is x? Well, x is what we have defined down here.

This window x, right? We have, it’s almost, it almost looks like a CTE for your window function. Isn’t that scary? Right? Window x as partitioned by owner user ID, order by rows between unbounded proceeding and current row.

So, both of these window functions, sum and average, can share a common window clause. Now, would that Microsoft were so kind as to give us more neat enhancements to T-SQL like this in SQL Server 2025, you and I could be talking about much newer cool stuff.

Here we are, though. But what’s even neater, I think, about the common window clause is that you can actually stack them so that they inherit window clause specifications from higher up ones.

So, it almost looks like stacked CTE when you read them. So, here, notice that we’re not using x anymore. We’re using T-S and A-V, right?

So, this is the window specification for sum, and this is the window specification for average. And if we look down here, this is where I’m doing the magic work for this one. We have window x as partitioned by owner user ID, right?

And then we’re saying, comma, T-S as order by score between rows unbounded proceeding and current row. And then, just for, you know, a little bit of texture in the demo, A-V is ordering by score descending between unbounded rows proceeding and unbounded rows following.

So, they’re both going to partition by owner user ID, but then they’re both going to do something slightly different with the order by. So, score for this one is ascending.

Score for this one is descending. This one is going from the beginning of the results to the current row. And this one is going for the entire result set. So, unbounded proceeding and unbounded following. So, it’s like the entire thing is what we’re getting the average as.

And now, we can allow our window functions to not only share a window clause, but to inherit and share window clauses. So, we can get back even, we can, I don’t know, make much more interesting queries without, I don’t know, I guess that’s actually still kind of a lot of typing now that I think about it.

But, it saves you some space up here in the select list. It makes that cleaner and tidier. I’ll give it that much. So, that’s just a couple cool things from 2022. Hey, we got Regex.

Ding. Like and subscribe. All right. Cool. Thanks for watching. I hope you enjoyed yourselves. I hope you learned something. And I’ll see you over in the next video where we’re going to talk about some stuff that batch mode makes a whole lot faster.

So, we’re going to take a little break from being depressed about Microsoft’s abandonment of SQL Server generally. And we’ll talk about some stuff from back when they cared.

That’ll be a good time. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Window Functions and Aggregates

Learn T-SQL With Erik: Window Functions and Aggregates


Video Summary

In this video, I delve into a fascinating aspect of window functions that isn’t always highlighted in discussions about these powerful SQL Server features. I demonstrate how to use window functions not just for simple column aggregations but also for more complex scenarios where you need to preserve and order aggregated data within a temporary table. By incorporating row numbers based on aggregate values, you can ensure that when querying the results later, they are ordered by the most impactful findings first—something many overlook in their SQL practices. I walk through an example using Stack Overflow data, showing how to calculate averages of counts over different time spans and order them effectively. This technique is particularly useful for troubleshooting and reporting purposes where you need to prioritize high-impact issues. Additionally, I share insights on the flexibility of window functions, illustrating that they can be used in a variety of creative ways beyond just row numbering, making your SQL queries more powerful and versatile.

Full Transcript

Erik Darling here with Darling Data. And to continue the Tickler material for Learn T-SQL with Erik, for which all of the beginner content is now published. There’s about 23 hours of it across an entire 69 modules. Do with that information what you will, but again, the presale price is still $250 until the advanced stuff drops after the summer. So, we are going to, I’m going to show you something that I think is very neat about window functions that not a lot of people pick up on about window functions. So, you may have noticed that I have a couple of store procedures that try to help people troubleshoot various aspects of their SQL Server.

Some of those store procedures have sort of like a roll-up of findings, an aggregated roll-up of all the findings in there. And what I found was that, of course, you know, it’s very easy to, you know, produce the sort of summarized output. What was not easy to do was to order that summarized output later.

So, what I ended up learning to do to make that easier is when I insert the data into the table, I have not only the aggregations of the things, but also a row number that gets produced based on the aggregations of the things. So, for example, like if we were just going to query the stack overflow database, the aggregation plus the ordering would look something like this. We’re going to get a count of all the posts, right?

So, we’re getting the post type ID up here and we’re producing some text. And so, like, because we’re producing some text, you couldn’t order by the text in the output. So, it’s just going to be like texty ordering.

We’re not ordering by like what had the most of something, which is what I would want to do. I’m going to like prioritize the high impact stuff first. But what I learned to do was like put that count into the order by clause of the row number function, which looks like this, so that I can preserve the numbered output for when I want to order it coming out of the findings temp table.

Like to just do this in one query would be trivial to do this when you’re like putting data into a temp table and then you want to select it out later with a specific order. This, this, this becomes a little bit trickier, but the results look like this. And of course, I’m not connected to SQL Server, which would help.

But now, when I run this query, I’m not ordered by the text output. I’m ordered by which posts had the most rows. Now, I know that for this standalone query, I could have just said order by count big descending.

That’s, that’s no problem. The point is that now I have this row number column. So when I want to, when I want to select data out of like my findings table later, I’m going to go to the next one. So I can say like order by finding ID and then this row number column.

So I have the top stuff at the top. So like, you know, post type ID to having 11 million posts and post type ID one having 6 million posts. Those are up at the top of like whatever thing I want to show you.

But, you know, this is something that not a lot of people understand about window functions is that you can partition by, you can order by and you can pass aggregates into them. Just like you do. Like, you know, like another aggregate or something like you could do like sum divided by count or something like that.

Like you could, you can, you can have these things intermix and sort of live together. So just like an easy example of that would be like doing some average, like averages of counts. Like you don’t like a lot of, like I’ve seen a lot of queries written like this where like someone goes and like gets the count and like one CTE or something.

And then does the averages after they’ve like done the counts, which is kind of silly because you can just do it all in one place. The only thing you have to be a little bit aware of, of course, is how you choose to set up your, your, your window range and row specification to deal with that. So what I’m doing in this query is I am hitting the post table again.

I’m doing a little bit of fanciness in here in order to, in order to like give, give myself like consistent, like accessible aliases for these two expressions across my whole query. And what I’m doing is I’m asking for an average of the count over different spans of time. Now, since I want three, six and 12 month averages for the three month average, I have to say between two proceeding and current row, which feels a little funny.

Cause you’re like, I want a three month average. I should put three proceeding in current row, but that would give you four rows, right? You want two proceeding and the current row.

That’s three total. And it’s the same thing for the six month average and the 12 month average for the six month average. You want five rows. And for the 12 month average, you want 11 rows and proceeding. Now, granted for the, the 12 month average, it wouldn’t matter as much because it may be like the last row anyway.

So you’re like, like you could just say between unbounded proceeding and current row if you wanted. But if there’s more than that, if there’s more to that, then you would need to, you know, obviously make sure that you are very specific about it. But now when I run this query, I’m able to just all in one sort of fill swoop of things, get the average number of posts across three, six and 12 and all 12 months in the year 2013.

If we were doing multiple years, that’s where you have to be really careful with the 12 month one. Um, but what you’ll see here is that when we look specifically at these three columns that produce the averages, uh, you’ll see the, the three month average here across all three of these, uh, that, that is the same. But then for the six month average, this number changes from this number.

So this is just another three month average here, whereas this is really producing the six month average across these, right? So like this 12 month average here too, we’ll have the same thing across the first six months, but then, uh, like where this thing for every three months will reset and give us a new set of averages. This gives us two, two window frames of add of averages, the six months and six months.

And then this one, like where this agrees up to the six month part, this is, this really, this really like departs here. And then we have like the full 12 month, uh, um, you know, sort of like, uh, average across all 12. So the main message here is that, um, window functions, uh, are not just sort of limited to, you know, like a column in your tape, in your table.

You can do all, you can, you can assign all sorts of stuff to them. Uh, you saw there where, uh, the first query where I was ordering by account big, uh, and this one, I’m getting the average of count bigs over different spans of time. So there’s a lot of neat stuff you can do with window functions that, uh, are like, may not be just like immediate, obvious and apparent to you based on the way you see a lot of window functions out in the world written.

But there are some really cool things you can do with them that often get overlooked. Like you’ll hear people talk about like, you know, how cool and powerful they are, but they kind of just give you the same examples of like, you know, the kind of like, I don’t know, like, here’s a row number over and over again. Like here’s this different things you can do with row number and you’re like, okay, row number. Great.

But like, like think of all the other neat stuff that you can do for your sort of like neat analytical queries. Anyway, uh, that’s enough here. Again, this is all tickler material from learn T SQL with Eric, uh, still on sale for 250 bucks. There’s a link down in the video description. If you want to see much, much more material like this, uh, I don’t know.

Otherwise, I don’t know. Go live in the T SQL dark for the rest of your life. See if I can. All right. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Common Table Expression Fork Bombs

Common Table Expression Fork Bombs


Video Summary

In this video, I dive into a fascinating concept known as a CTE fork bomb, exploring how recursive Common Table Expressions (CTEs) can lead to exponential growth in query execution. I start by setting the stage with simple sample data and gradually build up complexity, illustrating how nested loops joins within CTEs can multiply the number of rows, leading to significant performance impacts. By breaking down each step of the execution plan, I highlight the importance of understanding these patterns for optimizing queries and avoiding potential performance pitfalls. The video is packed with detailed explanations and visual aids, making it a must-watch for anyone interested in deepening their knowledge of SQL Server query plans and optimization techniques.

Full Transcript

Erik Darling here with Darling Data. Feeling very high energy today, very… very just pumped up. Let’s go. Let’s go get them. Today I want to talk about a CTE fork bomb. If you’re not familiar with what a fork bomb is, you can sort of consider it to be like viral replication where one cell becomes two cells and two cells become four and four. It just keeps, like, getting bigger, right? And that’s what we’re gonna do today. If you would like to support this channel, you can do so. There’s a link to become a member down in the video description below. If you want to ask me questions during my Office Hours episodes, you can do that. Otherwise, the usual like, comment, subscribe stuff is all available to you should you feel so encouraged to do something. So if you need SQL Server consulting help, well, that’s me. Health checks, performance analysis, hands-on tuning, dealing with performance emergencies and training your developers to not write fork bombs on your servers. All good. All good and worthwhile things there. You can get all of my performance tuning content, about 24 hours of it for 75% off. Again, link down in the video description. That brings it down to about 150 bucks and you get that for the rest of your life.

The T-SQL course is now half done. All of the beginner content is online and published. There is about 23 hours of it across, last count, 69 modules. So that’s fun there. Of course, past pre-con attendees will get free access to all of this companion material. It is on sale right now for the pre-sale price of $250. That price will go up in the fall to $500 as soon as everything is said and done. I am doing a lot of outside the house stuff this summer. I will be in New York City, shockingly, August 18th and 19th. I will be in Dallas, Texas, September 15th and 16th.

And I will be in Utrecht, that old Netherlands thing, October 1st and 2nd. And of course, I will be at Pass Data Community Summit from November 17th to 21st in Seattle, Washington, assuming that Seattle is still a city at that time. But with that out of the way, let us talk about this CTE fork bomb thing. Because, you know, making fun of CTE never gets old. At least not for me anyway.

So we’re going to create some sample data here, some simple sample data, because we don’t want to create overly complex sample data that will confound and confuse the masses out there, do we? We want very simple, straightforward demonstrations so that everyone can understand everything. So before I get to the actual fork bomb, there are some agreements that you and I must come to.

We must agree on these concepts so that by the time we get to the fork bomb, you understand fundamentally what is happening. So if we run this query that joins together the two tables that I just created and populated with data, we will get back 255 rows. And if we look at the execution plan, there will be one scan of the table T1, one scan of the table T0, and one merge join in order to produce those 255 rows.

If we look at the details here, there is one number of executions for that scan. And there is one number of executions for this scan. Good stuff.

We can agree on these things. If this query were to use a nested loops join, say like this, where I’m going to force some things to happen, the query plan would change. We would no longer have a merge join.

We would have a nested loops join. And this table would 255 rows would be read from T0. 32,767 rows would be read from T1.

What changes aside from the nested loops join is we had a merge join before, if you remember back that long ago, you little goldfish. This scan still has one execution. But now we have something different on the inner side of the nested loops join.

Now we have an index seek, and we have 255 executions of the index seek. All right, 255 right there. So when you have a nested loops join, the thing on the nested loops join will execute once for every row on the outer side of the join.

That’s this part to go get rows. Since I don’t know how the cops are going to sound on the new microphone. So it’s my first one.

So lucky us get to experiment together. If every time a row comes out of here, since the way that the data is designed, every row will match. Right.

So all 32,000 rows in this table have a match here, which means all 255 rows in this table have a match here. So we do this seek 255 times, we find 32,767 matches, and then we aggregate those down to 255 based on the number of unique IDs that came out of t0. Now, if we were to put that query into a CTE, and we were to join that CTE to itself, the query plan would change yet again.

Now the query plan. Well, I mean, A, we have a hash join up here, but now we have two nested loops joins. We actually have two copies of that query that runs because of course, if you are a frequent watcher of my videos, you will know that Microsoft SQL Server at this point in time does not offer a mechanism to materialize the results of a CTE.

That lack of materialization means that every time you reference a CTE in an outer scope, you will have to rerun the query in the CTE. So we actually have the same plan run twice, right? There’s the first copy of the plan of the execution of the query, if that’s easier for you to deal with.

And down here is the second one. Now, what this means is that we have two scans of the table t0, right? And see a number of executions.

Oh, you know what? That is hiding behind my head. Let’s try that again. So we have two scans of the table here, right? One or rather one scan of the table here. We have another scan count of one for this table here for a grand total of two.

Now, this index seek into t1 has 255 executions and so does this one. So we have two total scans of the table t0 and we have 255 scans a piece, which is, I’m going to guess around 510 seeks into t1 total. Now, since there’s a hash join up here, right?

We have a hash join that brings these two results together. Remember when I said from the CTE, join the CTE to itself on the ID column. So this is how SQL Server chose to join those two CTE queries together with a hash join.

To simplify things quite a bit, let’s just say, you know, for the sake of making sure that we stay in agreement, that the first query plan up there, the one, the uppermost plan above my head, the outer side of the join ran, did all its work, went to the hash join, built a hash table, and then the inner side of the query ran and the hash join did its thing to, you know, compare rows in the hash table and all that other stuff.

So the, let’s just say the outer side of the query ran, got to the hash join, then the inner side of the query ran and like got, then got to the hash join and comparisons were made and we decided which rows matched and which didn’t at the join. So these re you really do have two executions of this. Another sort of easy way to see that is that you like, when you look at the operator times, you know, you have like this part of the plan executes and it takes like four milliseconds of accumulated time across all the operators here.

Then you have this part of the plan and there’s four milliseconds of time across all of the accumulated operators. And then you have the hash join up here, which is happening in row mode. So this is the four milliseconds here plus the four milliseconds here plus one millisecond of time spent in the hash join.

All right. So that’s how that looks. Ergo, which I’m told is the, which I’m told is a word, which is, which is just great, I guess.

If we combine the CTE join situation with a nested loops join, the query inside the CTE will be executed, not just once, but once per row that goes into the loop join. To see what I mean, what we’re going to do is instead of just doing a join, we’re going to do a cross supply with a top one. The cross supply, I’m not saying cross supply is bad.

It’s just there for a little bit of convenience because cross supply does often get optimized to a nested loops join. So we’re going to use this for convenience. So I can show you this execution plan.

Notice the top part of the plan, right? Everything looks the same up there. We still have basically the same two copies of the plan that run, right?

There’s like that looks very similar to the original one. And then down here, we have that second copy of the query that ran just preceded by a top operator here because we had that top one. Right.

So now we’re going to see the one scan here and we’re going to see the 255 seeks here. What’s going to change is that instead of having one scan here. Now, this is an index seek now, and we did 255 seeks into this.

Well, the estimated CPU cost is 255, too. That’s that’s amazing. I was like, huh?

Okay. Number of executions, 255. Okay, cool. So every for every row that came up out of here, right? We got we aggregated everything down to 255 rows here.

All 255 rows went into the nested loops join, let’s say one at a time. And every time this nested loops join got a row, it went down here and said, hey, seek into here. And so we did that 255 times.

From here, we went into a nested loops join and we hit this thing 255 times. Okay, like this is this is the same, right? We still have 255 here and we still have 255 here.

This number didn’t really multiply here because we still we have a nested loops join here that’s sort of protecting us. So let’s have a little fun with this. Let’s add some more work.

Let’s further amuse ourselves. Because we are we are we are nothing if we cannot amuse at least ourselves. If we can’t amuse ourselves, what have we got? So we’re going to add some more work to the initial CTE.

We’re going to add some window functions in. We’re going to add average and row number and count big. Right.

So we’re going to we’re going to make SQL Server do some more work in the initial part of the query. And this is where things I think get kind of interesting. So if we run this whole thing now, we’re going to have to do some multiplication math. Right.

So let’s let’s zoom out a little bit here. I’m actually going to. Oh, let’s see. Can I get that? Let’s see my head. My head is going to be between two operators, but that’s okay. So if we if we look at this top part of the plan, it will be a mirror image of the bottom part of the plan up to a degree.

Up to a point rather. Let’s move this way over here. That’s probably about good.

Oh, tooltip. You just had you had to do it to me, didn’t you? So if we look at this part of the plan, right, if we start up here, you’ll see, you know, like the index scan on T0 255, the index scan on T1 32767. And then going over, there’s a merge join and a segment and a sequence project like that’s doing some like the row numbery stuff.

And if we go over here a little bit further, and if you just kind of keep your eye on the bottom part of the plan, you’ll notice that they are essentially mirror images of each other aside from a couple extra operators here. But they both do the exact same thing. Where things get interesting here, I think, is that we still have the same pattern where every row, the 255 rows that go into this loop join that bring this reference to the CTE and join it to this reference to the CTE.

We still have 255 rows that go in there, but way down over here, things really start to multiply out that didn’t work out. So if we look at this part of the plan specifically, notice that this isn’t 255 anymore. This is 65,025, which if you don’t have a calculator handy or just a lot of fingers, that is 255 times 255.

If you look at this number, this is 8.3 million and change. That’s 255 times 32,767, which is that number up there. So now we have sort of fork bombed our CTE by using, but with the nested loops join, because every time this nested loops join runs, we end up multiplying the number of rows in the table by the number of rows that come out of the loop join.

So if you sort of compare the numbers going across, like this merge join up here, this has 255 rows go in. This merge join down here has 65,025 rows go in. Or rather, like come out, sorry, because you have the 255 rows go into each.

Right. And that’s that that happened because we get we aggregate this down over here. Right.

This gets squished and this gets squished to 65,000. So the 8.3 million gets aggregated to 65,025. Here, the 32,000 gets aggregated down to 255. So now, like like we instead of having 255 rows go out here, we have 65,000 rows go across.

And you can see that the number of rows because we have fork bombed ourselves with the nested loops join, the number of rows that go across here are going to be much larger for the bottom part of the query. And you can actually see far more time end up across all those operators. If you look up here before we go into this nested loops join, we have only used five milliseconds of wall clock time.

If you look down here, look at how the time builds up. I don’t know what my head’s going to be sort of in the way. It’s 518 milliseconds in here, nine milliseconds in here.

We get up over 800 milliseconds by the time we get to here. So all of these accumulated operator times get to 800 right there. And then as we go across SQL Server dealing with the number of rows on a single thread, like we just add more and more time to this.

So there’s a there’s a window spool here. We get up to 1.1 seconds. And then after this segment, we have a we have a table spool and we get to 1.2 seconds.

And then we do this all this stuff and we get up to 1.242 seconds. So depending on like usually when, you know, like we talk like I talk about CTE, I’ll say something like, you know, your CTE will run once for every reference you make to it. But depending on the query plan that SQL Server chooses, your CTE might run way more times than that.

Right. So if you’re if you’re CTE joins to itself or joins to itself or, you know, like just joins repeatedly to like say you, I don’t know, throw a third table in the mix. And let’s say you have to join your CTE to one table and then you have to join your CTE, maybe to another column in that table or to a different table.

Depending on the join choice, your CTE might end up executing way, way more times than just once per reference. If you choose a nested loops join, it’ll execute once for every row that goes into the nested loops join and then has to run your your your reference to that CTE. So isn’t that fun? Isn’t there just so much fun in query plans?

Isn’t there just so much interesting, exciting stuff that just makes your day? Mine too. All right. Cool.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will undoubtedly talk about more fun and exciting execution plan stuff. All right.

Cool. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 21

SQL Server Performance Office Hours Episode 21



To ask your questions, head over here.

What’s the biggest T-SQL re-write that you’ve done for a customer? Conversely, what weird query tuning trick have you done which returned maximum gains for minimum code change (that isn’t option recompile or index tuning).
I have a non tech question, kinda. You clearly have a talent for taking a complex topic and breaking it down to its simplest form to show the underlying theory. Any tips for this? It’s the number one thing I struggle with when writing blog posts. Thanks!
Thank you for your handy tool, sp_quickiestore. I was recently trying to get more information about a stored proc using the stored proc name but it returned there was no information, but when i used a text, which is part of the stored proc, it returns the information I needed. The stored proc has multiple (3) plans because of PSO, could that be the reason it failed when I searched based on the stored proc name?
Hi Erik! When we update our PMS app to the latest version, the database update involves re-running ALL procedures and triggers (modified or not) with DROP and CREATE (not my decision – I have to stick with that though). What is the downside of the above operation? Does it make any sense to run the sp_updatestats after that? Thanx!
why haven’t you been talking about sql server 2025?

Video Summary

In this video, I dive into some of your most pressing questions about SQL Server and T-SQL tuning. We tackle topics ranging from the biggest T-SQL rewrites I’ve undertaken to the most effective query tuning tricks that yield maximum gains with minimal effort. You’ll also get tips on simplifying complex concepts for better understanding, whether you’re writing blog posts or just explaining things in general. Additionally, we discuss the nuances of using stored procedure names versus text when querying the Query Store and explore the downsides of dropping and recreating procedures during database updates. Lastly, I share my thoughts on SQL Server 2025, highlighting both its potential and the areas where it falls short. Whether you’re a seasoned DBA or just starting out, there’s something here for everyone to learn from.

Full Transcript

All right, you heathens. It’s time to give in to our darkest desires, and we do office hours. This is where I answer five user-submitted questions at a time and try to give the best semblance of an answer that I can provide. The usual stuff here, if you like this channel and you want to, and you feel that it’s worth your wallet, you can become a member of the channel to support my efforts to bring you the highest quality SQL Server content known to humankind. That’s down in the video description. If you want to ask questions that appear on Office Hours, this link is also down in the video description. It’s very easy and anonymous. You don’t have to hardly do any work. Other things that are useful to me, liking, commenting, and subscribing, because that’s, you know, I guess cool too. If you need if you need a consultant for SQL Server, you would like to hire me to come work personally with your deepest, darkest data, you can hire me for all of these things. Health checks, performance analysis, hands-on tuning of the worst of your worst, dealing with your performance emergencies, and training your developers so you do not have performance emergencies anymore. I do all that stuff and more. And as always, my rates are reasonable.

All right, come on. Next slide. There we go. I clicked. If you would like to buy my performance tuning content, you can get all 24 hours of that for about $150 US buckaroos. No tariffs added with that discount code, and that will last you for life. If you would like to get in on the pre-sale prices for my T-SQL course, Learn T-SQL with Eric. That’s me. You can get almost all of the beginner material is now out and publicly available. Many hours of content. The price will be going up to $500 once the advanced material is done after the summer. And if you are attending past data community summit in Seattle, and you’re coming to the T-SQL pre-cons that Kendra Little and I are teaching, you will, of course, get complimentary access to the course because this is companion material to the course.

That means it is not the same material, but it is a good companion to the material there. So if you’re attending the pre-cons, you’ll get this stuff. Ain’t that your lucky day? And, of course, this speaking schedule is going to be grand. The Red Gate Roadshow is taking me on tour. I will be in New York City. Surprise!

August 18th and 19th. Dallas, September 15th and 16th in Utrecht. Rolls right off the tongue October 1st and 2nd. And, of course, past data community summit taking place in Seattle, November 17th to 21st. So I will be live and in person in all of these places, I don’t know, to answer your questions, give you hugs and high fives, tell you you’re awesome at your job, whatever you need me to do.

Anything for a buck. But with that sort of the way, let’s party. Let’s do these office hours-y questions. And let’s zoom in here and make sure that we are nicely framed up, make sure everything is legible above my gigantic head.

And we’ll start with this handsome devil up here. What’s the biggest T-SQL rewrite you’ve done for a customer? Conversely, what weird query tuning trick have you done which returned maximum gains from minimum code change that isn’t option recompile or index tuning?

So I have written, I have rewritten entire applications for people. Like, I mean, like maybe not every single store procedure because like, you know, they’ll be like, hey, we don’t use this store procedure anymore. Which, you know, that’s, that’s cool with me.

But, you know, pretty much like, like every store procedure that was, you know, currently in use, I’ve done, I’ve done rewrites for. Or, you know, like, like depending on the development team a little bit, like there are some people, like there are some times when like I can rewrite, you know, like a handful of store procedures and just be like, like, you know, like, like follow this pattern generally and, you know, other stuff. And if you get stuck on anything, let me know, we can work on it together.

But, you know, I like really like, like hundreds of store procedures and functions. There was, there was one client. I, I, like, I think if I remember the final count in the rewrite folder, it was like, it was something like 56 scalar UDFs that I, I, I hand rewrote.

And that was, that was just the UDFs. It wasn’t even like the store procedures and other stuff. So that’s that, that’s that answer.

Sure. The, but of course, like the, the biggest, the biggest query tuning trick is, you know, it’s probably just like getting batch mode involved when it’s appropriate. Not only because, you know, like not even adding a, like columnstore index, but just, you know, playing some trick on SQL Server so that batch mode gets involved somewhere opens up a lot of, opens up a lot of doors.

You know, obviously it’s better with columnstore as a data source for a lot of these things, but sort of generally like getting batch mode involved solves a lot of problems really quickly. That would otherwise take a lot of index tuning and consolidation and, you know, query rewrites and trying 50 million things to get nudge the optimizer towards some specific pattern or path that I care about. But, you know, batch mode is probably the easiest one to do there.

If I had to pick a second place, you know, as far as just like, you know, like, like bang for the buck, it’s, you know, like breaking up queries that are, you know, miles of CTE. And, you know, getting, like using temp tables at certain like logical breaking points to materialize results. Like everyone thinks that the CTE materializes a result, but it doesn’t.

And so using temp tables in place of that is often very valuable as well. All right. Next up, what do we have here?

Oh boy. I have a non-tech question, kind of. All right. Shortly. You clearly have a talent for taking a complex topic and breaking it down to its simplest form. Oh, thank you.

That’s what I’m known for. Being simple. To show the underlying theory. Any tips for this? It’s the number one thing I struggle with when writing blog posts. Thanks.

So the way that I teach is the way that I learn. If I don’t break things down, like for myself step by step, I get lost and don’t learn things. So I need to break things down into very simple terms that fit into my brick head and make sense to me there.

If I had to give you advice about how to do that, it would be like when you’re writing a blog post, you know, in your head, it’s really easy to like logically jump from one thing to another to get the words out. But like when you speak it out loud and you like see the stuff on the screen and you like something catches your eye and you’re like, wait. Like go to talk about it and try to explain something.

And if you get stuck on something, that’s something else that you need to put in the post. That’s another thing that you need to add in to further break this thing down to make it explainable. Like a lot of blog posts, like even mine, like, you know, I’m not going to pretend I’m not guilty of like, like gloss over some stuff and like leave some details out either because, you know, it’s like a whole other blog post to explain it or it’s like too much of a detour.

But like if like if you want to be able to do that, like like like don’t just write your post, like read it out loud or like rehearse the material out loud. So you have a better idea of like not only what you want to say about it, but like like you really find yourself getting deeper into the nooks and crannies when you have to talk about stuff like out loud. Like you don’t even have to. I mean, you can record it if you want.

I don’t know. But like having that sort of extra added thing where you’re like, like, oh, but is this the well, no, let’s not. OK, never mind. So that would be that would I don’t know. That would be my advice is speak your content out loud because that will force you to to think more about everything that you are looking at and everything that you are saying.

And if you hit one of those unexplainables, that is that is often a good sign that that you need to break things down a little bit further. All right. We’ve got got quite a thing here. Oh, dear. That didn’t work out well. Let’s try that. Let’s try that little rectangle again.

Thank you for your handy tool, SB Quickie Store. I was recently trying to get more information about a stored procedure using the stored procedure name, but it returned. There was no information. But when I used a text, which is part of the stored procedure, it returns the information I needed. The stored procedure has multiple plans because of PSO.

Could that be the reason it failed when I searched based on the stored procedure name? So, yes. So like what what you what you guessed is most likely correct, because in the in the parameter sensitive plan optimization, like like there is an object ID in the XML, but the like the way that the plan is expressed is a lot like dynamic SQL, where it’s like almost completely detached from the object ID of the thing that called it.

So like in query store, there’s an object ID for the procedure. But if like your procedure doesn’t do anything really like meaningful or anything that query store captures, depends on your capture settings that like puts that into query store, then it’s going to say no other stuff that like, you know, has messed me up trying to find procedures and query store was like, like non defaults, like like non DBO schema.

Like there is a there is a like a procedure schema and a procedure name parameter for it. So if it’s not in DBO, that’d be another thing to try. I don’t think that quickie store handles square brackets gracefully.

I did some work to try to make it so that the procedure name parameter was sort of overloaded. So if you put in like, you know, like like square bracket DBO dot procedure name, it would like it would use parse name to break that stuff out.

But I forget how far I took it. So like like making sure that you don’t put the procedure schema and name in the square bracket to be another thing to try. But I think for you specifically, you are generally correct that that would be why it didn’t show up.

It is kind of a pain in the butt. But at the same time, like like I, I really want to avoid XML parsing in quickie store because querying the query store underlying views is kind of painful enough. You know, like Blitz cache working on that was a lot different because getting data out of like the plan cache, like aside from the XML parsing bit was generally pretty quick.

You know, of course, like depending on the size of stuff and a million other details. But like, you know, the XML parsing was what really took up time in there. Like querying the query store stuff is such a misery that I got.

Like I don’t want to add XML parsing in there to like go look for like that hidden object ID in the plan XML for the to detect the parameter sensitive plan optimization stuff. There might be some shortcuts around that. I just haven’t looked much at it yet. You know, like that that’s sort of why the query text search thing is in there.

And I understand that the query text search part of it is not as fast because we’re wildcard searching a bunch of query store data for some some query text. But yeah, I don’t know. It’s something I’ll think about, but I don’t know that I’d really get to it very quickly, at least at this point. All right. One one more question here. Hi, Eric. Hey, how’s it going? That’s me.

When we update our PMS app to the latest version, the database update involves rerunning all procedures and triggers modified or not with drop and create. Not my decision. I have to stick with that, though. What is the downside of the above option? Does it make any sense to run the update stats after that?

So I don’t think it does answer your questions kind of backwards. I don’t think it makes sense sense to run update stats after that. I think the one thing that annoys me about drop and create is that will create new object IDs for everything that gets dropped and created.

And for me and my analysis procedures. So SP quickie store being one of them. Human events block viewer.

Like there’s a lot of different ones where there are like there’s like not always like the procedure name or the. Like an object name there in there. Sometimes it’s just based on an like an object ID and I have to decode objects in the database by like object ID database ID.

And if you’ve gotten created like dropped and created your objects and they get new object IDs, I can’t I can’t resolve those names. And so that messes me up. So that’s the real downside there is you hurt you hurt me wound me terribly.

These these practices aside from that, I don’t I mean, I don’t really I can’t really think of anything that would be all that annoying with it. You know, you’re going to lose query plans. You’re going to.

You know, the plan cache sucks anyway. You know, you can have a bunch of recompiling stuff when you start creating query plans. But again, the plan cache sucks anyway. So I don’t know.

Not not a whole lot. Not a whole lot to go on there. All right. Question number five. Why haven’t you been talking about SQL Server 2025? So I’ll be very honest with you.

I don’t find anything all that compelling with it. All the stuff is just dumb to me. You know, like vector. OK, great.

Vector search. Cool. OK, fine. It’s there. You know, I care about T-SQL enhancements. I care about performance enhancements. And, you know, like like there is some neat.

There are a few neat things in 2025 that I do want to talk about. The optional parameter thing. The the optimized Halloween protection using using accelerated database recovery. Like, you know, the optimized locking stuff.

Even though the optimized locking stuff has kind of been around for a little bit in Azure. Like there are a few things in there that I think are cool and that I want to talk about. But here’s the thing. Microsoft has been so heavily invested in screwing up fabric that they didn’t take a lot of money.

There’s a lot of time out to screw up stuff in SQL Server 2025. So a lot of it is just kind of like like there’s just not a lot in there aside from like the dumb AI stuff. And like it was like, oh, it’s ground to cloud to fabric.

Who? Come on. It’s like ground to cloud to nowhere. Right.

Who cares? Anyway, those are those are my five answers to these five questions. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And I will see you in another video sometime soon where I will most likely be continuing to try to pedal my course. Learn T-SQL with Eric. Because it’s a good one.

Paul White Tech reviewed it. So at the very worst, at the very worst, it’s it’s it is entirely technically accurate. So I’ve got that going for me.

All right. Thank you for watching. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.