SQL Server Archives | Page 21 of 141

Carry Over Sort vs Batch Mode Window Functions

Posted on July 25, 2025January 23, 2026 by Erik Darling

Carry Over Sort vs Batch Mode Window Functions

Video Summary

In this video, I delve into an interesting query optimization topic that has been relevant for years but is now less pertinent due to the advent of batch mode in SQL Server. I explore why certain complex query syntaxes were necessary before batch mode existed and how they can be replaced with simpler, more efficient methods today. Specifically, I demonstrate the carry-over sort method—a technique often used when window functions weren’t available—and contrast it with modern approaches using window functions. By running sample queries in SQL Server Management Studio, I illustrate why the old method is not only slower but also less accurate for certain scenarios. The video aims to provide insights into query optimization and help viewers understand how leveraging batch mode can significantly improve performance without resorting to outdated techniques.

Full Transcript

Over the years, there’s been a lot of very interesting and intricate query syntax ginned up to deal with row mode performance issues that don’t really exist anymore in a world where batch mode exists. One of them is called the carry over sort. I don’t know if you’ve maybe ever run into a problem like this or whether you’ve ever stumbled upon this problem. I admit that the name is a little outlandish, but it comes up a lot when you’re dealing with data analysis type work. And, you know, prior to there being batch mode, there was one specific way that you typically wanted to write queries if you wanted to sort of emulate what what a windowing function would give you for finding the max value for a row, not per column in a row, but just per row, that batch mode largely solves. So, I’m going to talk about that today. Once we get over to SQL Server Management Studio, I’m going to show you probably why the query you think that you want to write is not right. Just how slow, how the process of the process of doing this using row mode is and how batch mode improves that. But of course, we’re going to contrast that with the carry over sort method, the way that it has been presented. This is not my syntax. It’s something that I’ve run across and I just wanted to make sure that I was comparing it accurately to modern versions of SQL Server. Of course, you may still need the carry over sort method of doing this if you are using standard edition of SQL Server. The reason for that is because if you are using standard edition of SQL Server, at least as of the recording of this video, Microsoft still does not think that you have paid them enough to be your friend and they have deliberately hobbled anything involved in the way that you have paid them enough to be your friend.

there’s nothing to build. Yo, of course, otherwiseulanning you have final version of SQL flagship and will be rewrite your friend or of Spain. you have had made all the power졌uj assodo.com Be it would be an exact same way that you have posted positive feedback button to the world And that is now patreon.com buradazt neuron will понять how to run substitute feedback of how to run with optimal name in optimal name know if you want to contribute to this channel you can sign up for a membership link down in the video description for that it’s just a show of appreciation for all this cool content that i write and record for free you can also like comment subscribe and ask me questions for my office hours episodes if you need sql server can help consulting help boy howdy i do all this stuff and as always my rates are reasonable if you want to get all my performance tuning training i have 24 hours of it for about 150 us dollars uh you can in that in that lasts you for life you get it forever and always uh just go to that url plug in that discount code which is also a fully assembled link down yonder and you can you can you can start your learning today uh my new t-sql course uh the beginner content is all done and recorded it’s about 23 hours over 69 modules uh if you are going to attend kendra little and i’s uh t-sql past pre-cons you will get access to this material and uh right now the course is on pre-sale for 250 bucks it will go up to 500 bucks when the course is fully recorded after the summer i am hard at work on all the advanced material now so that isn’t that isn’t that spectacular for you how hard i work uh if you would like to see me live and in person i am going on tour with the red gate road show uh the pass on tour dates new york august 18th and 19th dallas september 15th and 16th and utrecht not just an art supply store it’s a hamlet in the netherlands october 1st and 2nd and then of course past data community summit in seattle november 17th to 21st so with that out of the way uh let’s let’s pratty or party whatever it is so i have uh pre-run a couple things here uh the first the first two things that i have pre-run are uh the the the the version of this query that a lot of people uh will write uh or maybe tried to write at some point in the in the far distant past and we’re unhappy with the performance of uh which is basically to uh select some stuff within a CTE and most importantly in that CTE generate a row number and the goal of the row number is for each post type id because that’s what we’re partitioning by we want to order by uh the the creation date uh converted to just a date it’s a date time we’re just converting it to a date then ordered by owner i user owner user id descending and then ordered by id descending with id descending acting as a bit of a tiebreaker because id is unique and these other things are not guaranteed to be unique either individually or in concert the second thing that i’ve i’ve run is a query that this is hinted to use uh optimizer compatibility level 140 because i do not want batch mode on rowstore to kick in and be enabled for this query the second thing i’ve included with the first uh first running of things is the query that a lot of people think could replace this but this is not the right query to write for this because what this is doing is it’s getting you the max value for each column grouped by post type id that is not correct correct because the the the real sort of algorithm if you want to call it that is that we are ordered first by creation date then ordered by owner user id then ordered by id this is getting the max for each of those individually the first thing that i want to show you is the results because showing you the results shows you where these two methods uh no longer agree as far as uh the included data goes so uh post type id one the owner user id is different between these two uh it’s also different it’s also a different owner user id for post type id two and post type id well three of course right there um the ids are different for these two as well uh so really the the max method just does not give you the correct results there’s just too much different in here focusing over on the execution plan uh we can ignore this one because this one does not give us the correct results so let’s just get that like way out of the way we’re not thinking about this one at all this is the row mode version of the query and you’ll see if you follow along the operator times here this thing runs for just about well a little over 14 seconds i like to say 15 seconds that feels good to me so let’s just say this thing ran for like 15 seconds right nice nice fizz buzzy number there uh and and like really you know there’s just not a whole lot to say about this generally uh in row mode queries like this are quite painful even if you have a reasonable index for sql server to use to make the the window function go faster um you you often it’s often just a terribly inefficient way of writing and running the query so what what i want to show you next is uh the carry over sort method so what the carry over sort method aims to do is get the max you see the max starts here and the max ends way down here but what the max is doing is basically assembling a string based on the getting the max of all three of the columns that we care about the normal carry over sort thing uh does not include as much complexity uh in uh for the second column as mine does the problem that i was running into is that there are negative owner user ids in the post table and sure i could have filtered them out but that’s that’s cheating a little bit uh we want to maintain uh all user ids we want to make sure that we even include the negative ones because what when the max is when the max is a negative number then we we need to consider that don’t we we can’t just not return a result for a row because we didn’t feel like dealing with some abnormal potential abnormalities in the data so i have a case expression here and the case expression just says when owner user id is less than zero then i add some x’s to the left uh which um is different from what we’re doing when owner user id is greater than zero or greater than or equal to zero uh because with that we are right padding the number so we’re like adding like zeros to the right of it the reason that i did the x’s is because if i zero padded it things would have gotten messed up down in the select query in the select query uh we are basically asking for a substring and converting the substring to the correct data type so if i were when you do that with the id column and you have a zero padded number and you convert a zero pad right zero padded number to an integer it’s no problem right because like we just you just remove the zeros from the front of it and you give give the rest of the number the problem with the negative number is that you have you essentially have a string and if i added zeros in then i wouldn’t know if there were naturally occurring zeros in the number uh so i used x’s and i replaced the x’s to avoid confusion there but the carryover sort method just does this we still group by post type id we’re just we’re using the max function across like we’re we have three different columns in the max function uh post type id sorry creation date owner user id and id and like we’re assembling like the max across those three right so max encapsulates all three of those columns and if i run this query and we got a little bit of highlighting to do here don’t we uh this will return correct results at least if we at least compared to the first query that we ran with the window function and uh so like all the results here match what we get from the window function version but we get it much faster what’s cool here is that none of this query even though we’re in compat level 160 uses batch mode the scan of the post table happens in row mode you can see that just sort of over right next to my big head there uh this compute scalar also happens in row mode the hash match aggregate which is totally eligible for batch mode still goes in row mode uh so basically and like and of course parallel exchanges like gather streams don’t support row mode so this whole thing finishes in about two seconds without using any batch mode whatsoever so back before batch mode was really a cool useful thing this was a good method to get the max value per row like i said earlier this can still be very useful if you’re on standard edition because microsoft you didn’t if you’re on standard edition microsoft doesn’t think that you paid them enough to have your queries perform well so you don’t get like even if you’ve got batch mode to like batch mode to happen here you would be limited to a dop of two for your batch mode queries so you would not be like like i have max dop set to eight for this this thing will have used like a dop of eight for uh the query right you can see degree of parallelism right there eight so this thing would have used uh eight cores up for for this whole thing and spreading the workload of 17 million rows out across eight across eight cores nice and efficient is pretty efficient for row mode uh and for batch mode but like 17 million rows across two threads even in batch mode you’re likely going to see some performance fall off there like whatever the like i don’t know maybe maybe you’ll get real lucky and the trade-off won’t be too terrible but um that the standard edition limitations there are really quite a pain the next thing that we’re going to that i want to show you is the uh the the window function version of this again but without that compat level uh 140 restriction on it down here to prevent batch mode on rowstore for kicking in so if we run this query this will take about two seconds now right we get back the correct results this finishes in 1.8 seconds just like the carryover sort method that i showed you above and we don’t have to write that crazy max syntax where we convert like dates and numbers to strings and pad things and all that other stuff with the the synthesis developer edition which is the enterprise edition equivalent skew i’m not using uh 2025 standard developer edition because why would i hurt myself that way um that we we are we are just using regular uh edition here regular developer edition here so this runs at a degree of parallelism of eight and this runs nice and quickly and efficiently so if you’re writing uh window fun so like you know like really the the idea of this window function right it’s like we just want to get like the top row for each post type id so i’m filtering to where row number equals one for all these if it’s if your goal is to find like the max value of something for a row based on whatever criteria i just kind of picked three columns at random from the post table that seemed to like make sense i guess i could have thrown score in there if we felt like it um but uh if you’re doing that like i still would have had to deal with potentially negative numbers because scores can be negative in the post table so that like that maybe that wouldn’t have saved me too much time or trouble with the carryover sort syntax but uh if your goal is to like sort of find data like this and you’re filtering to where row number equals one batch mode can make these queries crazy fast like you don’t even have to add indexes like just let batch mode on rowstore kick in read from your table and batch mode process all the data in batch mode like it’s a way better uh way of running like big data crunchy queries like this if but if you don’t if you’re not in a situation where batch mode on rowstore can kick in for you like if you’re not on sql server 2019 plus and you’re not on enterprise edition and your database is an incompatibility level 150 or better and there’s no batch mode on rowstore uh like naturally occurring and you know like depending on if you can change stuff like you know you can you can mess around with like you know columnstore indexes uh you can mess around with uh putting like a like just like having an empty table in your database with a clustered column store index or creating a temp table with a clustered columnstore index and like left joining to that thing uh you can do all sorts of stuff to get like partial batch mode on rowstore but it it does not go as deep into your query plans as batch mode on rowstore like the intelligent query processing optimizer feature does just kind of weird i think but it’s just something that you learn to live with when you are tuning queries across a variety of strange environments anyway uh i this is just something that caught my interest and i felt like talking about um uh i i hope you enjoyed yourselves i hope you’ve learned something i hope maybe there was some good educational point in this video that you are able to take away from it even if it’s not uh memorizing the the crazy syntax in here uh i feel like perhaps there were a good a few good educational moments aside from that but anyway thank you for watching and i will see you over in the next video adios you

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Window Function Tricks for SQL Server 2022+

Posted on July 24, 2025January 23, 2026 by Erik Darling

Learn T-SQL With Erik: Window Function Tricks for SQL Server 2022+

Video Summary

In this video, I delve into the latest enhancements to T-SQL in SQL Server 2025 and reflect on what Microsoft has or hasn’t done to improve it. While there are some minor additions like regex support and aware clauses, much of my focus is on the improvements from SQL Server 2022 that make working with window functions more efficient. I explore new features such as `ignore nulls` and `respect nulls` in `lag` and `lead` functions, which significantly simplify finding last non-null values. Additionally, I discuss shared window clauses, demonstrating how they can be used to reduce redundancy and improve query readability by allowing common window specifications to be defined once and reused across multiple window functions. Despite these useful additions, the overall sentiment remains that Microsoft has largely neglected T-SQL enhancements in recent releases, leaving SQL Server users with a sense of being left behind compared to other database systems.

Full Transcript

All right, so we are going to continue with a little bit of teaser material from the Learn T-SQL with Erik course. Again, that is still on presale price until after the summer when the advanced material drops. There’s a link to purchase it down below. It’s 250 bucks right now. It’ll last you for the rest of your life. And of course, if you are attending past data community summit and you’re coming to Kendra Little and I’s T-SQL pre-cons, you will get access to this material for free because it is completely free. It’s a companion material to what we will be teaching. So, you know, SQL Server 2025. There are release notes all over the place for it. And there is not a single worthwhile enhancement to T-SQL to talk about. Sure, we got regex. Okay. You have any idea how many people that’s going to screw up? Regex and aware clause. I mean, cool. Like, as a consultant, like, yeah. But as far as, like, things I’m excited about, nothing. I think, you know, one way, one way you can sort of judge how much Microsoft cared about a specific SQL Server release is by how much T-SQL has, been sort of alleviated of the many things that it has been missing for many years that are in the SQL standard. And this one is rather laughable. You know, I guess Microsoft is busy trying to get Fabric to catch up with Databricks.

So they have ceased trying to get SQL Server to catch up with, like, every other database on the planet. So, cool. Anyway, the only T-SQL enhancements that I have thought were kind of neat were back in SQL Server 2022 when Windows Functions got a couple neat new things. Windows Functions got nothing in SQL Server 2025. We have once again been left in the dustbin. We are on the shelf. We are not having a good time.

So, like, if you ever spend time, like, just like, I don’t know, like, if you read T-SQL blogs for fun, you read SQL Server blogs for fun, you may have found a particular brand of problem across posts over the years called the last non-null value. This, of course, this, of course, this, of course, did get easier with window functions. Before window functions, it was, forget it, like, queries would never finish. But even with window functions, it takes, like, multi-step queries in order to get the last non-null value for something.

So, if we run a query like this and we say, like, you know, we get the last commenter is lag user ID one over order by creation date, you’ll notice that there’s a lot of, you know, nulls in here. So, if we wanted to find the last non-null value, we would have to, like, we would have to essentially, like, like, run this query and then run another query to sort of, like, to get those other values. It gets very complicated very quickly.

What SQL Server 2022 added is a couple things that you can stick into, like, the lag lead window functions to either ignore nulls or respect nulls. Now, you can think of it what you will, that SQL Server Management Studio 21’s parser has a bunch of red squiggles in this query because it does not recognize the syntax from SQL Server 2022. All right.

So, we have SQL Server Management Studio 2021, which became GA, like, I don’t know, a couple months ago at this point. And we have SQL Server 2022, which came out, like, three years ago at this point. And the parser is still like, I don’t know what that means.

So, you know, we got dark mode. Okay. But I promise you that this query will run successfully.

What I’ve added to this query are the lines ignore nulls for this one and respect nulls for this one. So, ignore nulls makes finding the last non-null thing a lot easier because this will give you the last non-null value in the column. Right.

So, this ignore nulls just gives us the value that we want over and over again. Granted, this isn’t a very interesting data set, but the respect nulls, we get all of this stuff back. Right.

Now, forever, we have had the ability to pass in a third input to, like, lag and lead and stuff. I’m just going to spread this syntax out a little bit so it’s a little bit more obvious what I’m doing in here and why there are some rows that have a very strange big number in them.

And that is because I am adding a third optional input to the lag and lead functions, which give you a default value for anything that would have been produced a null because of the function. So, the results in here, you’ll notice that both of these lines have the integer maximum for them.

That just, that’s because we filled in a blank with that optional third parameter. Now, there’s other neat stuff that came out in SQL Server 2022 for window functions as well. Like, you can now have shared window clauses.

So, like, if you were writing window functions with, like, similar, like, window specifications in them, you would have to, like, write that over and over again and your queries could get very, very big with window function specifications. But now what you can do is you can say something like this, right?

Notice we’re just saying over x here, right? And typically, over x would be like, huh, what is x? Well, x is what we have defined down here.

This window x, right? We have, it’s almost, it almost looks like a CTE for your window function. Isn’t that scary? Right? Window x as partitioned by owner user ID, order by rows between unbounded proceeding and current row.

So, both of these window functions, sum and average, can share a common window clause. Now, would that Microsoft were so kind as to give us more neat enhancements to T-SQL like this in SQL Server 2025, you and I could be talking about much newer cool stuff.

Here we are, though. But what’s even neater, I think, about the common window clause is that you can actually stack them so that they inherit window clause specifications from higher up ones.

So, it almost looks like stacked CTE when you read them. So, here, notice that we’re not using x anymore. We’re using T-S and A-V, right?

So, this is the window specification for sum, and this is the window specification for average. And if we look down here, this is where I’m doing the magic work for this one. We have window x as partitioned by owner user ID, right?

And then we’re saying, comma, T-S as order by score between rows unbounded proceeding and current row. And then, just for, you know, a little bit of texture in the demo, A-V is ordering by score descending between unbounded rows proceeding and unbounded rows following.

So, they’re both going to partition by owner user ID, but then they’re both going to do something slightly different with the order by. So, score for this one is ascending.

Score for this one is descending. This one is going from the beginning of the results to the current row. And this one is going for the entire result set. So, unbounded proceeding and unbounded following. So, it’s like the entire thing is what we’re getting the average as.

And now, we can allow our window functions to not only share a window clause, but to inherit and share window clauses. So, we can get back even, we can, I don’t know, make much more interesting queries without, I don’t know, I guess that’s actually still kind of a lot of typing now that I think about it.

But, it saves you some space up here in the select list. It makes that cleaner and tidier. I’ll give it that much. So, that’s just a couple cool things from 2022. Hey, we got Regex.

Ding. Like and subscribe. All right. Cool. Thanks for watching. I hope you enjoyed yourselves. I hope you learned something. And I’ll see you over in the next video where we’re going to talk about some stuff that batch mode makes a whole lot faster.

So, we’re going to take a little break from being depressed about Microsoft’s abandonment of SQL Server generally. And we’ll talk about some stuff from back when they cared.

That’ll be a good time. Anyway, thank you for watching.

Going Further

Learn T-SQL With Erik: Window Functions and Aggregates

Posted on July 23, 2025January 23, 2026 by Erik Darling

Learn T-SQL With Erik: Window Functions and Aggregates

Video Summary

In this video, I delve into a fascinating aspect of window functions that isn’t always highlighted in discussions about these powerful SQL Server features. I demonstrate how to use window functions not just for simple column aggregations but also for more complex scenarios where you need to preserve and order aggregated data within a temporary table. By incorporating row numbers based on aggregate values, you can ensure that when querying the results later, they are ordered by the most impactful findings first—something many overlook in their SQL practices. I walk through an example using Stack Overflow data, showing how to calculate averages of counts over different time spans and order them effectively. This technique is particularly useful for troubleshooting and reporting purposes where you need to prioritize high-impact issues. Additionally, I share insights on the flexibility of window functions, illustrating that they can be used in a variety of creative ways beyond just row numbering, making your SQL queries more powerful and versatile.

Full Transcript

Erik Darling here with Darling Data. And to continue the Tickler material for Learn T-SQL with Erik, for which all of the beginner content is now published. There’s about 23 hours of it across an entire 69 modules. Do with that information what you will, but again, the presale price is still $250 until the advanced stuff drops after the summer. So, we are going to, I’m going to show you something that I think is very neat about window functions that not a lot of people pick up on about window functions. So, you may have noticed that I have a couple of store procedures that try to help people troubleshoot various aspects of their SQL Server.

Some of those store procedures have sort of like a roll-up of findings, an aggregated roll-up of all the findings in there. And what I found was that, of course, you know, it’s very easy to, you know, produce the sort of summarized output. What was not easy to do was to order that summarized output later.

So, what I ended up learning to do to make that easier is when I insert the data into the table, I have not only the aggregations of the things, but also a row number that gets produced based on the aggregations of the things. So, for example, like if we were just going to query the stack overflow database, the aggregation plus the ordering would look something like this. We’re going to get a count of all the posts, right?

So, we’re getting the post type ID up here and we’re producing some text. And so, like, because we’re producing some text, you couldn’t order by the text in the output. So, it’s just going to be like texty ordering.

We’re not ordering by like what had the most of something, which is what I would want to do. I’m going to like prioritize the high impact stuff first. But what I learned to do was like put that count into the order by clause of the row number function, which looks like this, so that I can preserve the numbered output for when I want to order it coming out of the findings temp table.

Like to just do this in one query would be trivial to do this when you’re like putting data into a temp table and then you want to select it out later with a specific order. This, this, this becomes a little bit trickier, but the results look like this. And of course, I’m not connected to SQL Server, which would help.

But now, when I run this query, I’m not ordered by the text output. I’m ordered by which posts had the most rows. Now, I know that for this standalone query, I could have just said order by count big descending.

That’s, that’s no problem. The point is that now I have this row number column. So when I want to, when I want to select data out of like my findings table later, I’m going to go to the next one. So I can say like order by finding ID and then this row number column.

So I have the top stuff at the top. So like, you know, post type ID to having 11 million posts and post type ID one having 6 million posts. Those are up at the top of like whatever thing I want to show you.

But, you know, this is something that not a lot of people understand about window functions is that you can partition by, you can order by and you can pass aggregates into them. Just like you do. Like, you know, like another aggregate or something like you could do like sum divided by count or something like that.

Like you could, you can, you can have these things intermix and sort of live together. So just like an easy example of that would be like doing some average, like averages of counts. Like you don’t like a lot of, like I’ve seen a lot of queries written like this where like someone goes and like gets the count and like one CTE or something.

And then does the averages after they’ve like done the counts, which is kind of silly because you can just do it all in one place. The only thing you have to be a little bit aware of, of course, is how you choose to set up your, your, your window range and row specification to deal with that. So what I’m doing in this query is I am hitting the post table again.

I’m doing a little bit of fanciness in here in order to, in order to like give, give myself like consistent, like accessible aliases for these two expressions across my whole query. And what I’m doing is I’m asking for an average of the count over different spans of time. Now, since I want three, six and 12 month averages for the three month average, I have to say between two proceeding and current row, which feels a little funny.

Cause you’re like, I want a three month average. I should put three proceeding in current row, but that would give you four rows, right? You want two proceeding and the current row.

That’s three total. And it’s the same thing for the six month average and the 12 month average for the six month average. You want five rows. And for the 12 month average, you want 11 rows and proceeding. Now, granted for the, the 12 month average, it wouldn’t matter as much because it may be like the last row anyway.

So you’re like, like you could just say between unbounded proceeding and current row if you wanted. But if there’s more than that, if there’s more to that, then you would need to, you know, obviously make sure that you are very specific about it. But now when I run this query, I’m able to just all in one sort of fill swoop of things, get the average number of posts across three, six and 12 and all 12 months in the year 2013.

If we were doing multiple years, that’s where you have to be really careful with the 12 month one. Um, but what you’ll see here is that when we look specifically at these three columns that produce the averages, uh, you’ll see the, the three month average here across all three of these, uh, that, that is the same. But then for the six month average, this number changes from this number.

So this is just another three month average here, whereas this is really producing the six month average across these, right? So like this 12 month average here too, we’ll have the same thing across the first six months, but then, uh, like where this thing for every three months will reset and give us a new set of averages. This gives us two, two window frames of add of averages, the six months and six months.

And then this one, like where this agrees up to the six month part, this is, this really, this really like departs here. And then we have like the full 12 month, uh, um, you know, sort of like, uh, average across all 12. So the main message here is that, um, window functions, uh, are not just sort of limited to, you know, like a column in your tape, in your table.

You can do all, you can, you can assign all sorts of stuff to them. Uh, you saw there where, uh, the first query where I was ordering by account big, uh, and this one, I’m getting the average of count bigs over different spans of time. So there’s a lot of neat stuff you can do with window functions that, uh, are like, may not be just like immediate, obvious and apparent to you based on the way you see a lot of window functions out in the world written.

But there are some really cool things you can do with them that often get overlooked. Like you’ll hear people talk about like, you know, how cool and powerful they are, but they kind of just give you the same examples of like, you know, the kind of like, I don’t know, like, here’s a row number over and over again. Like here’s this different things you can do with row number and you’re like, okay, row number. Great.

But like, like think of all the other neat stuff that you can do for your sort of like neat analytical queries. Anyway, uh, that’s enough here. Again, this is all tickler material from learn T SQL with Eric, uh, still on sale for 250 bucks. There’s a link down in the video description. If you want to see much, much more material like this, uh, I don’t know.

Otherwise, I don’t know. Go live in the T SQL dark for the rest of your life. See if I can. All right. Anyway, thank you for watching.

Going Further

Common Table Expression Fork Bombs

Posted on July 22, 2025January 23, 2026 by Erik Darling

Common Table Expression Fork Bombs

Video Summary

In this video, I dive into a fascinating concept known as a CTE fork bomb, exploring how recursive Common Table Expressions (CTEs) can lead to exponential growth in query execution. I start by setting the stage with simple sample data and gradually build up complexity, illustrating how nested loops joins within CTEs can multiply the number of rows, leading to significant performance impacts. By breaking down each step of the execution plan, I highlight the importance of understanding these patterns for optimizing queries and avoiding potential performance pitfalls. The video is packed with detailed explanations and visual aids, making it a must-watch for anyone interested in deepening their knowledge of SQL Server query plans and optimization techniques.

Full Transcript

Erik Darling here with Darling Data. Feeling very high energy today, very… very just pumped up. Let’s go. Let’s go get them. Today I want to talk about a CTE fork bomb. If you’re not familiar with what a fork bomb is, you can sort of consider it to be like viral replication where one cell becomes two cells and two cells become four and four. It just keeps, like, getting bigger, right? And that’s what we’re gonna do today. If you would like to support this channel, you can do so. There’s a link to become a member down in the video description below. If you want to ask me questions during my Office Hours episodes, you can do that. Otherwise, the usual like, comment, subscribe stuff is all available to you should you feel so encouraged to do something. So if you need SQL Server consulting help, well, that’s me. Health checks, performance analysis, hands-on tuning, dealing with performance emergencies and training your developers to not write fork bombs on your servers. All good. All good and worthwhile things there. You can get all of my performance tuning content, about 24 hours of it for 75% off. Again, link down in the video description. That brings it down to about 150 bucks and you get that for the rest of your life.

The T-SQL course is now half done. All of the beginner content is online and published. There is about 23 hours of it across, last count, 69 modules. So that’s fun there. Of course, past pre-con attendees will get free access to all of this companion material. It is on sale right now for the pre-sale price of $250. That price will go up in the fall to $500 as soon as everything is said and done. I am doing a lot of outside the house stuff this summer. I will be in New York City, shockingly, August 18th and 19th. I will be in Dallas, Texas, September 15th and 16th.

And I will be in Utrecht, that old Netherlands thing, October 1st and 2nd. And of course, I will be at Pass Data Community Summit from November 17th to 21st in Seattle, Washington, assuming that Seattle is still a city at that time. But with that out of the way, let us talk about this CTE fork bomb thing. Because, you know, making fun of CTE never gets old. At least not for me anyway.

So we’re going to create some sample data here, some simple sample data, because we don’t want to create overly complex sample data that will confound and confuse the masses out there, do we? We want very simple, straightforward demonstrations so that everyone can understand everything. So before I get to the actual fork bomb, there are some agreements that you and I must come to.

We must agree on these concepts so that by the time we get to the fork bomb, you understand fundamentally what is happening. So if we run this query that joins together the two tables that I just created and populated with data, we will get back 255 rows. And if we look at the execution plan, there will be one scan of the table T1, one scan of the table T0, and one merge join in order to produce those 255 rows.

If we look at the details here, there is one number of executions for that scan. And there is one number of executions for this scan. Good stuff.

We can agree on these things. If this query were to use a nested loops join, say like this, where I’m going to force some things to happen, the query plan would change. We would no longer have a merge join.

We would have a nested loops join. And this table would 255 rows would be read from T0. 32,767 rows would be read from T1.

What changes aside from the nested loops join is we had a merge join before, if you remember back that long ago, you little goldfish. This scan still has one execution. But now we have something different on the inner side of the nested loops join.

Now we have an index seek, and we have 255 executions of the index seek. All right, 255 right there. So when you have a nested loops join, the thing on the nested loops join will execute once for every row on the outer side of the join.

That’s this part to go get rows. Since I don’t know how the cops are going to sound on the new microphone. So it’s my first one.

So lucky us get to experiment together. If every time a row comes out of here, since the way that the data is designed, every row will match. Right.

So all 32,000 rows in this table have a match here, which means all 255 rows in this table have a match here. So we do this seek 255 times, we find 32,767 matches, and then we aggregate those down to 255 based on the number of unique IDs that came out of t0. Now, if we were to put that query into a CTE, and we were to join that CTE to itself, the query plan would change yet again.

Now the query plan. Well, I mean, A, we have a hash join up here, but now we have two nested loops joins. We actually have two copies of that query that runs because of course, if you are a frequent watcher of my videos, you will know that Microsoft SQL Server at this point in time does not offer a mechanism to materialize the results of a CTE.

That lack of materialization means that every time you reference a CTE in an outer scope, you will have to rerun the query in the CTE. So we actually have the same plan run twice, right? There’s the first copy of the plan of the execution of the query, if that’s easier for you to deal with.

And down here is the second one. Now, what this means is that we have two scans of the table t0, right? And see a number of executions.

Oh, you know what? That is hiding behind my head. Let’s try that again. So we have two scans of the table here, right? One or rather one scan of the table here. We have another scan count of one for this table here for a grand total of two.

Now, this index seek into t1 has 255 executions and so does this one. So we have two total scans of the table t0 and we have 255 scans a piece, which is, I’m going to guess around 510 seeks into t1 total. Now, since there’s a hash join up here, right?

We have a hash join that brings these two results together. Remember when I said from the CTE, join the CTE to itself on the ID column. So this is how SQL Server chose to join those two CTE queries together with a hash join.

To simplify things quite a bit, let’s just say, you know, for the sake of making sure that we stay in agreement, that the first query plan up there, the one, the uppermost plan above my head, the outer side of the join ran, did all its work, went to the hash join, built a hash table, and then the inner side of the query ran and the hash join did its thing to, you know, compare rows in the hash table and all that other stuff.

So the, let’s just say the outer side of the query ran, got to the hash join, then the inner side of the query ran and like got, then got to the hash join and comparisons were made and we decided which rows matched and which didn’t at the join. So these re you really do have two executions of this. Another sort of easy way to see that is that you like, when you look at the operator times, you know, you have like this part of the plan executes and it takes like four milliseconds of accumulated time across all the operators here.

Then you have this part of the plan and there’s four milliseconds of time across all of the accumulated operators. And then you have the hash join up here, which is happening in row mode. So this is the four milliseconds here plus the four milliseconds here plus one millisecond of time spent in the hash join.

All right. So that’s how that looks. Ergo, which I’m told is the, which I’m told is a word, which is, which is just great, I guess.

If we combine the CTE join situation with a nested loops join, the query inside the CTE will be executed, not just once, but once per row that goes into the loop join. To see what I mean, what we’re going to do is instead of just doing a join, we’re going to do a cross supply with a top one. The cross supply, I’m not saying cross supply is bad.

It’s just there for a little bit of convenience because cross supply does often get optimized to a nested loops join. So we’re going to use this for convenience. So I can show you this execution plan.

Notice the top part of the plan, right? Everything looks the same up there. We still have basically the same two copies of the plan that run, right?

There’s like that looks very similar to the original one. And then down here, we have that second copy of the query that ran just preceded by a top operator here because we had that top one. Right.

So now we’re going to see the one scan here and we’re going to see the 255 seeks here. What’s going to change is that instead of having one scan here. Now, this is an index seek now, and we did 255 seeks into this.

Well, the estimated CPU cost is 255, too. That’s that’s amazing. I was like, huh?

Okay. Number of executions, 255. Okay, cool. So every for every row that came up out of here, right? We got we aggregated everything down to 255 rows here.

All 255 rows went into the nested loops join, let’s say one at a time. And every time this nested loops join got a row, it went down here and said, hey, seek into here. And so we did that 255 times.

From here, we went into a nested loops join and we hit this thing 255 times. Okay, like this is this is the same, right? We still have 255 here and we still have 255 here.

This number didn’t really multiply here because we still we have a nested loops join here that’s sort of protecting us. So let’s have a little fun with this. Let’s add some more work.

Let’s further amuse ourselves. Because we are we are we are nothing if we cannot amuse at least ourselves. If we can’t amuse ourselves, what have we got? So we’re going to add some more work to the initial CTE.

We’re going to add some window functions in. We’re going to add average and row number and count big. Right.

So we’re going to we’re going to make SQL Server do some more work in the initial part of the query. And this is where things I think get kind of interesting. So if we run this whole thing now, we’re going to have to do some multiplication math. Right.

So let’s let’s zoom out a little bit here. I’m actually going to. Oh, let’s see. Can I get that? Let’s see my head. My head is going to be between two operators, but that’s okay. So if we if we look at this top part of the plan, it will be a mirror image of the bottom part of the plan up to a degree.

Up to a point rather. Let’s move this way over here. That’s probably about good.

Oh, tooltip. You just had you had to do it to me, didn’t you? So if we look at this part of the plan, right, if we start up here, you’ll see, you know, like the index scan on T0 255, the index scan on T1 32767. And then going over, there’s a merge join and a segment and a sequence project like that’s doing some like the row numbery stuff.

And if we go over here a little bit further, and if you just kind of keep your eye on the bottom part of the plan, you’ll notice that they are essentially mirror images of each other aside from a couple extra operators here. But they both do the exact same thing. Where things get interesting here, I think, is that we still have the same pattern where every row, the 255 rows that go into this loop join that bring this reference to the CTE and join it to this reference to the CTE.

We still have 255 rows that go in there, but way down over here, things really start to multiply out that didn’t work out. So if we look at this part of the plan specifically, notice that this isn’t 255 anymore. This is 65,025, which if you don’t have a calculator handy or just a lot of fingers, that is 255 times 255.

If you look at this number, this is 8.3 million and change. That’s 255 times 32,767, which is that number up there. So now we have sort of fork bombed our CTE by using, but with the nested loops join, because every time this nested loops join runs, we end up multiplying the number of rows in the table by the number of rows that come out of the loop join.

So if you sort of compare the numbers going across, like this merge join up here, this has 255 rows go in. This merge join down here has 65,025 rows go in. Or rather, like come out, sorry, because you have the 255 rows go into each.

Right. And that’s that that happened because we get we aggregate this down over here. Right.

This gets squished and this gets squished to 65,000. So the 8.3 million gets aggregated to 65,025. Here, the 32,000 gets aggregated down to 255. So now, like like we instead of having 255 rows go out here, we have 65,000 rows go across.

And you can see that the number of rows because we have fork bombed ourselves with the nested loops join, the number of rows that go across here are going to be much larger for the bottom part of the query. And you can actually see far more time end up across all those operators. If you look up here before we go into this nested loops join, we have only used five milliseconds of wall clock time.

If you look down here, look at how the time builds up. I don’t know what my head’s going to be sort of in the way. It’s 518 milliseconds in here, nine milliseconds in here.

We get up over 800 milliseconds by the time we get to here. So all of these accumulated operator times get to 800 right there. And then as we go across SQL Server dealing with the number of rows on a single thread, like we just add more and more time to this.

So there’s a there’s a window spool here. We get up to 1.1 seconds. And then after this segment, we have a we have a table spool and we get to 1.2 seconds.

And then we do this all this stuff and we get up to 1.242 seconds. So depending on like usually when, you know, like we talk like I talk about CTE, I’ll say something like, you know, your CTE will run once for every reference you make to it. But depending on the query plan that SQL Server chooses, your CTE might run way more times than that.

Right. So if you’re if you’re CTE joins to itself or joins to itself or, you know, like just joins repeatedly to like say you, I don’t know, throw a third table in the mix. And let’s say you have to join your CTE to one table and then you have to join your CTE, maybe to another column in that table or to a different table.

Depending on the join choice, your CTE might end up executing way, way more times than just once per reference. If you choose a nested loops join, it’ll execute once for every row that goes into the nested loops join and then has to run your your your reference to that CTE. So isn’t that fun? Isn’t there just so much fun in query plans?

Isn’t there just so much interesting, exciting stuff that just makes your day? Mine too. All right. Cool.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will undoubtedly talk about more fun and exciting execution plan stuff. All right.

Cool. Thank you for watching.

Going Further

SQL Server Performance Office Hours Episode 21

Posted on July 21, 2025January 23, 2026 by Erik Darling

SQL Server Performance Office Hours Episode 21

To ask your questions, head over here.

What’s the biggest T-SQL re-write that you’ve done for a customer? Conversely, what weird query tuning trick have you done which returned maximum gains for minimum code change (that isn’t option recompile or index tuning).

I have a non tech question, kinda. You clearly have a talent for taking a complex topic and breaking it down to its simplest form to show the underlying theory. Any tips for this? It’s the number one thing I struggle with when writing blog posts. Thanks!

Thank you for your handy tool, sp_quickiestore. I was recently trying to get more information about a stored proc using the stored proc name but it returned there was no information, but when i used a text, which is part of the stored proc, it returns the information I needed. The stored proc has multiple (3) plans because of PSO, could that be the reason it failed when I searched based on the stored proc name?

Hi Erik! When we update our PMS app to the latest version, the database update involves re-running ALL procedures and triggers (modified or not) with DROP and CREATE (not my decision – I have to stick with that though). What is the downside of the above operation? Does it make any sense to run the sp_updatestats after that? Thanx!

why haven’t you been talking about sql server 2025?

Video Summary

In this video, I dive into some of your most pressing questions about SQL Server and T-SQL tuning. We tackle topics ranging from the biggest T-SQL rewrites I’ve undertaken to the most effective query tuning tricks that yield maximum gains with minimal effort. You’ll also get tips on simplifying complex concepts for better understanding, whether you’re writing blog posts or just explaining things in general. Additionally, we discuss the nuances of using stored procedure names versus text when querying the Query Store and explore the downsides of dropping and recreating procedures during database updates. Lastly, I share my thoughts on SQL Server 2025, highlighting both its potential and the areas where it falls short. Whether you’re a seasoned DBA or just starting out, there’s something here for everyone to learn from.

Full Transcript

All right, you heathens. It’s time to give in to our darkest desires, and we do office hours. This is where I answer five user-submitted questions at a time and try to give the best semblance of an answer that I can provide. The usual stuff here, if you like this channel and you want to, and you feel that it’s worth your wallet, you can become a member of the channel to support my efforts to bring you the highest quality SQL Server content known to humankind. That’s down in the video description. If you want to ask questions that appear on Office Hours, this link is also down in the video description. It’s very easy and anonymous. You don’t have to hardly do any work. Other things that are useful to me, liking, commenting, and subscribing, because that’s, you know, I guess cool too. If you need if you need a consultant for SQL Server, you would like to hire me to come work personally with your deepest, darkest data, you can hire me for all of these things. Health checks, performance analysis, hands-on tuning of the worst of your worst, dealing with your performance emergencies, and training your developers so you do not have performance emergencies anymore. I do all that stuff and more. And as always, my rates are reasonable.

All right, come on. Next slide. There we go. I clicked. If you would like to buy my performance tuning content, you can get all 24 hours of that for about $150 US buckaroos. No tariffs added with that discount code, and that will last you for life. If you would like to get in on the pre-sale prices for my T-SQL course, Learn T-SQL with Eric. That’s me. You can get almost all of the beginner material is now out and publicly available. Many hours of content. The price will be going up to $500 once the advanced material is done after the summer. And if you are attending past data community summit in Seattle, and you’re coming to the T-SQL pre-cons that Kendra Little and I are teaching, you will, of course, get complimentary access to the course because this is companion material to the course.

That means it is not the same material, but it is a good companion to the material there. So if you’re attending the pre-cons, you’ll get this stuff. Ain’t that your lucky day? And, of course, this speaking schedule is going to be grand. The Red Gate Roadshow is taking me on tour. I will be in New York City. Surprise!

August 18th and 19th. Dallas, September 15th and 16th in Utrecht. Rolls right off the tongue October 1st and 2nd. And, of course, past data community summit taking place in Seattle, November 17th to 21st. So I will be live and in person in all of these places, I don’t know, to answer your questions, give you hugs and high fives, tell you you’re awesome at your job, whatever you need me to do.

Anything for a buck. But with that sort of the way, let’s party. Let’s do these office hours-y questions. And let’s zoom in here and make sure that we are nicely framed up, make sure everything is legible above my gigantic head.

And we’ll start with this handsome devil up here. What’s the biggest T-SQL rewrite you’ve done for a customer? Conversely, what weird query tuning trick have you done which returned maximum gains from minimum code change that isn’t option recompile or index tuning?

So I have written, I have rewritten entire applications for people. Like, I mean, like maybe not every single store procedure because like, you know, they’ll be like, hey, we don’t use this store procedure anymore. Which, you know, that’s, that’s cool with me.

But, you know, pretty much like, like every store procedure that was, you know, currently in use, I’ve done, I’ve done rewrites for. Or, you know, like, like depending on the development team a little bit, like there are some people, like there are some times when like I can rewrite, you know, like a handful of store procedures and just be like, like, you know, like, like follow this pattern generally and, you know, other stuff. And if you get stuck on anything, let me know, we can work on it together.

But, you know, I like really like, like hundreds of store procedures and functions. There was, there was one client. I, I, like, I think if I remember the final count in the rewrite folder, it was like, it was something like 56 scalar UDFs that I, I, I hand rewrote.

And that was, that was just the UDFs. It wasn’t even like the store procedures and other stuff. So that’s that, that’s that answer.

Sure. The, but of course, like the, the biggest, the biggest query tuning trick is, you know, it’s probably just like getting batch mode involved when it’s appropriate. Not only because, you know, like not even adding a, like columnstore index, but just, you know, playing some trick on SQL Server so that batch mode gets involved somewhere opens up a lot of, opens up a lot of doors.

You know, obviously it’s better with columnstore as a data source for a lot of these things, but sort of generally like getting batch mode involved solves a lot of problems really quickly. That would otherwise take a lot of index tuning and consolidation and, you know, query rewrites and trying 50 million things to get nudge the optimizer towards some specific pattern or path that I care about. But, you know, batch mode is probably the easiest one to do there.

If I had to pick a second place, you know, as far as just like, you know, like, like bang for the buck, it’s, you know, like breaking up queries that are, you know, miles of CTE. And, you know, getting, like using temp tables at certain like logical breaking points to materialize results. Like everyone thinks that the CTE materializes a result, but it doesn’t.

And so using temp tables in place of that is often very valuable as well. All right. Next up, what do we have here?

Oh boy. I have a non-tech question, kind of. All right. Shortly. You clearly have a talent for taking a complex topic and breaking it down to its simplest form. Oh, thank you.

That’s what I’m known for. Being simple. To show the underlying theory. Any tips for this? It’s the number one thing I struggle with when writing blog posts. Thanks.

So the way that I teach is the way that I learn. If I don’t break things down, like for myself step by step, I get lost and don’t learn things. So I need to break things down into very simple terms that fit into my brick head and make sense to me there.

If I had to give you advice about how to do that, it would be like when you’re writing a blog post, you know, in your head, it’s really easy to like logically jump from one thing to another to get the words out. But like when you speak it out loud and you like see the stuff on the screen and you like something catches your eye and you’re like, wait. Like go to talk about it and try to explain something.

And if you get stuck on something, that’s something else that you need to put in the post. That’s another thing that you need to add in to further break this thing down to make it explainable. Like a lot of blog posts, like even mine, like, you know, I’m not going to pretend I’m not guilty of like, like gloss over some stuff and like leave some details out either because, you know, it’s like a whole other blog post to explain it or it’s like too much of a detour.

But like if like if you want to be able to do that, like like like don’t just write your post, like read it out loud or like rehearse the material out loud. So you have a better idea of like not only what you want to say about it, but like like you really find yourself getting deeper into the nooks and crannies when you have to talk about stuff like out loud. Like you don’t even have to. I mean, you can record it if you want.

I don’t know. But like having that sort of extra added thing where you’re like, like, oh, but is this the well, no, let’s not. OK, never mind. So that would be that would I don’t know. That would be my advice is speak your content out loud because that will force you to to think more about everything that you are looking at and everything that you are saying.

And if you hit one of those unexplainables, that is that is often a good sign that that you need to break things down a little bit further. All right. We’ve got got quite a thing here. Oh, dear. That didn’t work out well. Let’s try that. Let’s try that little rectangle again.

Thank you for your handy tool, SB Quickie Store. I was recently trying to get more information about a stored procedure using the stored procedure name, but it returned. There was no information. But when I used a text, which is part of the stored procedure, it returns the information I needed. The stored procedure has multiple plans because of PSO.

Could that be the reason it failed when I searched based on the stored procedure name? So, yes. So like what what you what you guessed is most likely correct, because in the in the parameter sensitive plan optimization, like like there is an object ID in the XML, but the like the way that the plan is expressed is a lot like dynamic SQL, where it’s like almost completely detached from the object ID of the thing that called it.

So like in query store, there’s an object ID for the procedure. But if like your procedure doesn’t do anything really like meaningful or anything that query store captures, depends on your capture settings that like puts that into query store, then it’s going to say no other stuff that like, you know, has messed me up trying to find procedures and query store was like, like non defaults, like like non DBO schema.

Like there is a there is a like a procedure schema and a procedure name parameter for it. So if it’s not in DBO, that’d be another thing to try. I don’t think that quickie store handles square brackets gracefully.

I did some work to try to make it so that the procedure name parameter was sort of overloaded. So if you put in like, you know, like like square bracket DBO dot procedure name, it would like it would use parse name to break that stuff out.

But I forget how far I took it. So like like making sure that you don’t put the procedure schema and name in the square bracket to be another thing to try. But I think for you specifically, you are generally correct that that would be why it didn’t show up.

It is kind of a pain in the butt. But at the same time, like like I, I really want to avoid XML parsing in quickie store because querying the query store underlying views is kind of painful enough. You know, like Blitz cache working on that was a lot different because getting data out of like the plan cache, like aside from the XML parsing bit was generally pretty quick.

You know, of course, like depending on the size of stuff and a million other details. But like, you know, the XML parsing was what really took up time in there. Like querying the query store stuff is such a misery that I got.

Like I don’t want to add XML parsing in there to like go look for like that hidden object ID in the plan XML for the to detect the parameter sensitive plan optimization stuff. There might be some shortcuts around that. I just haven’t looked much at it yet. You know, like that that’s sort of why the query text search thing is in there.

And I understand that the query text search part of it is not as fast because we’re wildcard searching a bunch of query store data for some some query text. But yeah, I don’t know. It’s something I’ll think about, but I don’t know that I’d really get to it very quickly, at least at this point. All right. One one more question here. Hi, Eric. Hey, how’s it going? That’s me.

When we update our PMS app to the latest version, the database update involves rerunning all procedures and triggers modified or not with drop and create. Not my decision. I have to stick with that, though. What is the downside of the above option? Does it make any sense to run the update stats after that?

So I don’t think it does answer your questions kind of backwards. I don’t think it makes sense sense to run update stats after that. I think the one thing that annoys me about drop and create is that will create new object IDs for everything that gets dropped and created.

And for me and my analysis procedures. So SP quickie store being one of them. Human events block viewer.

Like there’s a lot of different ones where there are like there’s like not always like the procedure name or the. Like an object name there in there. Sometimes it’s just based on an like an object ID and I have to decode objects in the database by like object ID database ID.

And if you’ve gotten created like dropped and created your objects and they get new object IDs, I can’t I can’t resolve those names. And so that messes me up. So that’s the real downside there is you hurt you hurt me wound me terribly.

These these practices aside from that, I don’t I mean, I don’t really I can’t really think of anything that would be all that annoying with it. You know, you’re going to lose query plans. You’re going to.

You know, the plan cache sucks anyway. You know, you can have a bunch of recompiling stuff when you start creating query plans. But again, the plan cache sucks anyway. So I don’t know.

Not not a whole lot. Not a whole lot to go on there. All right. Question number five. Why haven’t you been talking about SQL Server 2025? So I’ll be very honest with you.

I don’t find anything all that compelling with it. All the stuff is just dumb to me. You know, like vector. OK, great.

Vector search. Cool. OK, fine. It’s there. You know, I care about T-SQL enhancements. I care about performance enhancements. And, you know, like like there is some neat.

There are a few neat things in 2025 that I do want to talk about. The optional parameter thing. The the optimized Halloween protection using using accelerated database recovery. Like, you know, the optimized locking stuff.

Even though the optimized locking stuff has kind of been around for a little bit in Azure. Like there are a few things in there that I think are cool and that I want to talk about. But here’s the thing. Microsoft has been so heavily invested in screwing up fabric that they didn’t take a lot of money.

There’s a lot of time out to screw up stuff in SQL Server 2025. So a lot of it is just kind of like like there’s just not a lot in there aside from like the dumb AI stuff. And like it was like, oh, it’s ground to cloud to fabric.

Who? Come on. It’s like ground to cloud to nowhere. Right.

Who cares? Anyway, those are those are my five answers to these five questions. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And I will see you in another video sometime soon where I will most likely be continuing to try to pedal my course. Learn T-SQL with Eric. Because it’s a good one.

Paul White Tech reviewed it. So at the very worst, at the very worst, it’s it’s it is entirely technically accurate. So I’ve got that going for me.

All right. Thank you for watching. Thank you for watching.

Going Further

Learn T-SQL With Erik: Grouping Sets vs Union All

Posted on July 18, 2025January 23, 2026 by Erik Darling

Learn T-SQL With Erik: Grouping Sets vs Union All

Video Summary

In this video, I delve into the lesser-known world of `GROUPING SETS` in SQL Server and explore its alternatives, including `CUBE` and `ROLLUP`. While these features are powerful for complex aggregations, they often face performance challenges due to how the SQL Server optimizer handles them. I demonstrate a real-world example using Stack Overflow data, showing that even with a columnstore index, `GROUPING SETS` queries struggle to utilize batch mode efficiently, leading to slower execution times compared to equivalent queries without these features. By comparing the query plans and performance metrics, you’ll see firsthand why sometimes sticking to simpler methods can yield better results in practical scenarios.

Full Transcript

All right, we’re gonna talk about, boy, something I don’t see very often, be honest with you. Maybe I don’t see it very often because a lot of people don’t really understand how this stuff works and so they get afraid of it. But we’re gonna talk about grouping sets and alternatives here. And the alternatives are important because there’s also the distinct possibility that someone wrote a grouping sets query once upon a time and, gosh, it just didn’t perform terribly well, which is, uh, eh, that’s something that can happen. Boy, howdy. We’re gonna look at that. Um, all of this content is preview material for my course Learn T-SQL with Erik. Uh, it is on the pre-sale price right now of 250 US dollars. That’ll last you for the rest of your year. Uh, it will go up to 500 US dollars once the advanced material is published after the summer. Uh, the, the beginner material is on the very cusp of being completely published. So, uh, you’ll have many hours, uh, of, of content to get through, uh, anyway. Uh, this is all companion content, meaning not the exact content, but it is companion content to, uh, the material that Kendra Little and I will be presenting at, uh, past data community summit in Seattle. So if you attend our pre cons there, uh, you will, you will get access to this material, uh, with your price of admission. Um, I think that’s about it. So we’ll just get at a lot of this thing. And, uh, well, I mean, there’s a whole slide here. Now you can see all my secrets, right? You can see all this stuff that I do when you’re not watching. So, uh, we’re gonna get into the old stack overflow database here.

And, uh, I’m gonna show you what a grouping sets query looks like. There are of course other grouping set, uh, let’s just call them clauses that you can use, uh, like cube and rollup. Uh, cube will, uh, create distinct grouping sets out of all possible column combinations and rollup will, uh, create, um, well, rollup is a little bit more, uh, uh, complicated in what it does at least, at least it’s more complicated to explain where you define your rollup clause. And then SQL Server will take whatever column you put first in the rollup clause and match that to other columns. So it doesn’t do all unique sets. It just does like a subset of unique sets based on whatever you put in first. Uh, I cover it in way more detail in the actual material.

So if you’re really fascinated by what cube and rollup do, uh, you can, you can buy the material and watch it or just watch it if you’ve already bought it. But, uh, with grouping sets, you can define whatever sets of columns you want to group things by. Uh, I have chosen to do, uh, a few levels of grouping here, one by, uh, all, by all three columns that I’m selecting.

And then one grouping set for, uh, each of the columns individually. I could of course add stuff in here. Like I could add another line that says vote type ID and post ID and another one that says vote type ID and vote year and post ID and vote year. But just to keep, just to keep things relatively simple, I’m going to, um, I’m going to just do the three together and then the three individually.

There’s also this empty parentheses at the very end, which is going to be the global aggregate for this. So I’m going to start this thing running and you’ll, you’ll notice that I have a columnstore index up here. I have not created that yet. Uh, we’re going to save that for later.

What I want to show you here is just that, um, SQL Server, uh, SQL servers optimizer fights real good and hard against, uh, using any sort of batch mode stuff, uh, with these grouping sets queries. Uh, even like we’ll show you in a second with the columnstore index, but right, right about 30, 30 or so seconds for this thing. And you know, grouping sets queries, they tend to need to process a lot of data.

Like you’re talking about like, you know, doing like big aggregations of stuff. It would seem a fairly natural fit for the optimizer to want to use is like batch mode for these things, but we just don’t get any. You can kind of figure that out from the query plan by looking at the choice of operators, not always, but in some of these things.

So like, you know, first we have, uh, parallel exchanges, parallel exchanges, like there’s one here, repartition streams, repartition streams, repartition streams, distribute streams. Uh, that, well, that’s a filter. That’s not quite it.

Uh, gather streams over here. None of the parallel exchanges support batch mode. There are operators in the plan that support batch mode, but they all, I, I, I, at least the last time I ran this, they all ran in row mode. So like there’s a sort at the end.

Uh, there’s a filter that runs in row mode, filter support batch mode. Uh, stream aggregates here. One, two, three, uh, well, one, one up there, one down there. Those don’t support batch mode.

The hashes do support batch mode, of course, but they all run in row mode. So this whole thing is kind of cooked. This whole, this whole thing takes 30. Well, actually we should go look.

Cause like that looks pretty funny. 43 seconds. And then I got 20 seconds, 43 seconds, 32 seconds, SQL Server. What happened? Let’s, let’s go, let’s go look at the actual tail of the tape over here in, uh, the query time stats. Hopefully my big head won’t be in the way.

So, uh, that, that one parallel operator lied to us. So we used 166 seconds of CPU time to get 32 seconds of wall clock time. So what we appreciate, we appreciate SQL servers willingness to use a parallel execution plan here.

We do not appreciate the lack of batch mode here for them. Look at that number of rows that we are selecting and aggregating. Gosh darn it, SQL Server.

Why won’t you use batch mode? So what I’m going to do is I’m going to kick off, uh, creating this columnstore index. And just so I don’t get yelled at later, I’m going to make sure that that semi colons in there. We don’t want to improv.

We don’t have a lack of termination here. Do we? I’m going to get that started and I’m going to show you what an equivalent query without the grouping sets stuff looks like. Uh, so we’re going to have one query at the beginning where we select and group by all three columns.

That’s the vote type ID, post ID and vote year. Uh, I’m using a little trick that I showed off in one of my earlier videos where I’m doing this little cross apply values trick. To have, uh, one source for this date, part year creation date thing.

Uh, so that I can reference that one. Uh, well, I can reference that a few times throughout the query without having to keep rewriting date, part year creation date. Now, of course, like you have to do more typing, right?

So we have one query that does all three columns and groups by all three. We have another query that groups by just vote type ID. We have another query that groups by just post ID.

We have another query that just groups by vote year. And then we have one query that, that essentially just like doesn’t aggregate on all of them. Now, if I run all, what I want to do is run all these together. Now, keep, remember that I just created a columnstore index here.

We wouldn’t need the columnstore index for this thing to run in batch mode. Uh, SQL Server would shoot, could shoot, can choose batch mode very naturally for this. Um, I’ll show you that at the end, but the important thing is that when you look at this plan, there’s two things.

There’s a couple few things about this that are much, much different from the plan that we just saw using the grouping sets method. One, there are no parallel exchange and exchanges in this until the very end, right? There’s one necessary one at the end where we do gather streams.

Uh, but there’s also no stream aggregates. Like we don’t have all the repartition stream stuff in here. And of course, if we look at, we look at the execution modes for all of these operators, these are all going to say batch mode, batch mode, batch mode. So this whole thing runs a lot faster.

If you look at the query time stats for this one, just cause it’s not, maybe not terribly obvious from the, from the query plan as a whole. Uh, we spend, uh, 1.13 seconds of CPU time for 1.8 seconds of elapsed time, right? So that’s a huge improvement, uh, using batch mode over, uh, like row mode execution for these queries.

Let’s go back up to the, the, uh, grouping sets function query. And let’s run this with the columnstore index in place. If we let this go for a little bit, right?

It does, it does go for a little bit still. Uh, it’s already slower than the, the union all method of doing this. Um, this thing will take about 12 or 13 seconds in total. Um, and the, the query plan is different, right?

Like with, with the, um, with the columnstore index in place, you don’t see all, like, you don’t see a lot of the repartition stream stuff, but there is still some real junk in the query plan. Like this whole section here, where, uh, like very early in the plan, we gather streams. That means we end a parallel zone right here and then we restart a parallel zone right here.

But in, inside of that serial zone, right? Where we, it’s like no more parallel and then back to parallel in here, we, we do a stream aggregate. Because this is just something that grouping sets has to do for the global aggregate.

Like I’ve, I tried all sorts of tricks to get rid of this. I tried an option hash group hint. Uh, I tried a query rule off hint to prevent the stream aggregate, but still it’s stuck in there. I don’t want to start getting into all that stuff in this video, but man, this thing was just very, very persistent.

And if we look at the query time stats for the grouping sets one, like we go into properties and we look at query time stats here. Like we were at 1.7 something seconds for the one that, uh, the union all, uh, format and like 13 seconds of CPU time here, we’re up to 30 seconds of CPU time and 10 seconds of elapsed time. So this isn’t like, you know, like when I say like, I don’t see this stuff very often.

I kind of get why, like, it’s not just that the syntax syntax is kind of like weird and difficult to remember, or even like occur to you. If you need to do this sort of thing. Uh, it’s that like, even if you, you, you, you’re like a steadfast remember of syntax, which I am generally not.

I need to refer back to, you know, stuff that I wrote before to remember like half the things that I want to do. Uh, like, even if you are really good at remembering this stuff, like you might try this and just be like, holy cow, this is slow. So let’s, um, uh, let’s, uh, get rid of the, the, uh, the columnstore index that I created.

And I just want to show you that kind of naturally, uh, the union all version of this does a lot better on its own. Just sort of naturally using batch mode. SQL Server is like, I’m not afraid of batch mode here, right?

Like, like, it’s not going to be as good as when we had the columnstore index to read from. So like, if you’re doing this kind of like big analytical aggregation stuff, there’s a, you know, very, very good chance that you’re just going to want to use a columnstore is like the source of your data anyway. But even just like getting some batch mode stuff, like this, this runs pretty well or runs better even without the columnstore index, like then the grouping sets version of the grouping sets version was like 30 seconds or something.

This is, this is like twice as fast without even having columnstore is like the, the source to read from the main thing that slows down is we have to read from the clustered index over and over again. Right? Like we do a lot of stuff in there, but even that’s happening with batch mode on road store. We just don’t have those.

We just don’t have that really nicely compressed columnstore index to really speed things up as we’re reading data. So we still kind of have like this IO bound portion of the query that, um, like, like it doesn’t have the good column source source to read from, but like, it still happens in batch mode. So it’s still an improvement. And for all of these branches, notice that we don’t have like, we don’t have that break where we have like the, the, like, like the serial zone with the stream aggregate.

We don’t have all the repartition streams in here. This all pretty cleanly just runs in batch mode and does everything in batch mode and is a lot faster just naturally. Like I, of course, like I, like I said, I would absolutely prefer to have the columnstore index as a data source for this type of thing. But even without that, even just getting batch mode on rowstore, uh, this ends up a lot faster than the, uh, the grouping sets alternative.

So like when you’re, when you’re writing these types of queries, you know, of course, like what you’re reading from is very important. Um, you know, rowstore indexes, just not, not quite the same jam as columnstore indexes for these big analytical agro ag, aggregative, uh, types of queries. But, uh, like, you know, depending, like often just how you write the query and I’m going to, I’m going to say something unfortunate generally.

In general, the more you type, the better off you are, right? Like, like taking, you know, shortcuts like this, this might, you might like learn this in some like shifty Microsoft DP exam. And, you know, think that you’re, you know, like you’re, you’re the hottest cake on the block, uh, knowing, knowing how to use grouping sets, but, you know, get, get out there in the real world with, you have to do anything practical on like a meaningful set of data, not like worldwide importer works or whatever.

You’re, you’re, you’re going to want to abandon that ship pretty quickly. So anyway, uh, thank you for watching. I hope you learned something.

I hope you enjoyed yourselves. Uh, and I will see you in the next video, uh, which, um, I think this is, this is going to be a Friday video. And then, um, the next one will be an office hours episode. And boy, just, we just have so many exciting things.

So many exciting things coming up, don’t we? All right. Anyway, thank you for watching. Thank you.

Going Further

Erik Tests a New Microphone

Posted on July 17, 2025January 23, 2026 by Erik Darling

Erik Tests a New Microphone

Video Summary

In this video, I’m Erik Darling from Darling Data, and I wanted to share a bit of behind-the-scenes content as I test out my new microphone. The old one met its untimely end after countless hours of discussing databases, but fear not—my audio quality remains top-notch! This video serves multiple purposes: it allows me to ensure the sound is clear across various devices so you can enjoy my content without any hiccups, and it’s a direct way for you to support my endeavors. If you’d like to contribute towards better equipment, consider signing up for a channel membership starting at just $4 a month. Your support not only helps me improve but also gives you access to exclusive perks and opportunities to ask questions during office hours episodes.

Additionally, I’m taking this opportunity to promote some of the services and courses I offer. Whether you’re looking to enhance your SQL Server performance with my consulting services or want to dive into T-SQL training, there are plenty of ways to get involved. The pre-sale price for my 24-hour T-SQL course is currently at $250, but don’t miss out on the limited-time offer! And if you’re attending Pass Data Community Summit in Seattle, you’ll receive free access to this companion content with your admission. Mark your calendars for my upcoming live appearances in New York City and Dallas, as well as Utrecht, where I’ll be sharing more insights and expertise.

Full Transcript

Erik Darling here with Darling Data. And if the first thing you’re noticing and admiring in this video is my new clip on mic, my eyes are up here, buddy, then you are a wise and you are a dedicated follower because I was talking the other day about how my old microphone, I was wearing a big headset at the time, my old microphone snapped in half from listening to me talk about databases. Hopefully you will not. snap in half from listening to me talk about databases. But we can consider the entire point of this video is Erik tests a new microphone. And the reason I’m doing this this way is because it’s very easy for me to upload this to YouTube and then listen to it across a variety of devices so I can ensure that I sound okay on a variety of devices. So let’s, let’s, let’s spiel ourselves here. If you would like me to buy a better microphone, if you have complaints about audio quality, well, you can do something about that. You can sign up for a channel membership and for as few as $4 a month, you can, you can support my endeavors to purchase nice microphones. There’s a, there’s a link down in the video description that allows you to do that. If you don’t care about my microphone, well, I don’t know what to tell you. You can like, you can comment, you could subscribe. You can also ask me questions for free privately that I will answer publicly during my office hours episodes.

Isn’t that nice? Isn’t that nice? These nice things I do for you. If you would really like me to get a new microphone, you can hire me as a consultant. It’s a great way for me to buy new microphones. And I am available to perform all sorts of miracles upon your SQL servers, health checks, performance analysis, hands-on tuning, dealing with performance emergencies and training your developers. So whenceforth you have no more performance emergencies. I do all of these things. And as always, my rates are reasonable. Anyway, uh, if you would like to buy some performance tuning training from me, I’ve got 24 whole entire hours of it. Uh, if you, uh, there’s a link down here that assembles all this stuff for you. But of course, uh, you, you, you can get the everything bundle there, uh, for a hundred, about 150 bucks with a 75% off code. And that will last you for life. No subscription required. Um, if you want to get in on my new T-SQL course. While it is at the pre-sale price of 250 US dollars, uh, you can do that now, uh, videos are dropping and being recorded. I’m going to go do some after I do this. Uh, and, uh, the, the price, once the advanced material is fully published after the summer, will go up to $500.

So please do save yourself 250 bucks unless you’re really itching to donate to the mic fund. Uh, uh, this is all companion content to the pre-cons that Kendra Little and I are teaching at Pass Data Community Summit in Seattle, November 17th to 21st. So if you are attending Pass and hopefully chosen wisely and you’re attending our pre-cons, well, guess what?

Uh, you, you will get free access to that, this companion content with, with your, with your admission there. Uh, if you would like to see me live and in person, Red Gate, Red Gate’s little road show. I’m being taken on tour sort of around the world, limited world tour, I guess.

Uh, some small clubs and venues, uh, New York City, August 18th and 19th, Dallas, September 15th and 16th. That’s the one in Texas and, uh, Utrecht, the one near Amsterdam and the, and the Netherlands. Uh, it’s a Hamlet and it’s beautiful.

Uh, October 1st and 2nd, but let’s, uh, let’s do a, a, a short video here for me to test my microphone with. All right. So, uh, a lot of the times when I’m teaching about dynamic SQL, I like to say things like, if you want to properly parameterize your dynamic SQL, you have to use SP execute SQL and you have to feed it some parameters and you have to feed it some, some values to, to, to substitute those parameters with.

And that, that’s a great way to avoid SQL injection because everyone should be trying to avoid SQL injection. It’s unpleasant, gets, gets you fired. It might get your company to go out of business.

There are all sorts of terrible things that can happen when you are SQL injected. So, uh, I, I do, I do recommend avoiding that. Askew SQL injection at all costs.

But there is kind of one funny circumstance where you can not use SP execute SQL and you can still somewhat parameterize your dynamic SQL. So we, what I’m going to do is, well, what I’ve done already is, uh, this, this only works with linked servers, at least that I’ve ever seen. Um, there might be some really insane work around where it works otherwise, but Hey, what do I know?

Uh, I know T SQL with Eric. It’s apparently what I know, but I’ve already added a loopback linked server to my very own server. Uh, I am, uh, for the, for the most part, SQL Server monogamous.

So, uh, I, I only work with one version in addition of SQL Server at a time though. I guess, I guess, I guess these days I am philandering a bit with SQL Server 2025, but, uh, we don’t, we don’t need to talk about that publicly. We can save myself a little shame and embarrassment here, but this linked server will allow me to do something kind of funny.

This linked server will allow me to, uh, declare some SQL and I’ve, I’ve got, uh, I’ve got my, my string that I want to execute, uh, being set and assigned right here between these two things. And I’ve got this, uh, ID local variable set to eight. Now, uh, in my where clause in my dynamic SQL, I’ve got a question mark, right?

And it’s sort of like that crappy store procedure, SPMS for each DB where the question mark is the database name. And you have to say like use bracket question mark thing to, to get into different databases. It’s kind of like that.

But what I can do is I can use, uh, I can use the, the less safe version of executing dynamic SQL. And I can say execute, uh, at SQL, and I can pass in the ID as a second thing. And that second thing is going to act as a parameter replacement for that question mark.

But I have to use exec at, I don’t know why Zoomit has forsaken me like that, but I have to, I have to execute this at the linked server. And when I do that and, and I, you know, go through great pains to use the right database and everything. But when I, when I run all this, uh, this server talks to itself and it returns the thing that I wanted, which is post type ID eight from the post types table.

So anyway, thank you for joining me for Eric tests, a new microphone. I hope you enjoyed yourselves. I hope you learned something.

I hope you like my new microphone because, uh, I don’t know what to get if this, if you don’t like it, perhaps, perhaps you can recommend one to me. Um, it just, it has to plug into one of these things and it has to have a very special connector like this. So if you’ve got recommendations along those lines, um, well, I’m, I’m two ears and one mouth, but that, that, that expression doesn’t really, doesn’t really resonate much.

Does it anyway? Thank you for watching.

Going Further

Spurious Left Join Logic

Posted on July 16, 2025July 11, 2025 by Erik Darling

Spurious Left Join Logic

Going Further

Stubborn NOT EXISTS Ordering

Posted on July 15, 2025January 23, 2026 by Erik Darling

Stubborn NOT EXISTS Ordering

Video Summary

In this video, I delve into the fascinating world of stubborn not exists ordering in SQL Server queries. You might be wondering why I’m wearing a headset that’s making me look like an air traffic controller—well, it’s because my fancy wireless microphone setup broke, and I’m waiting for replacements to arrive. This quirky situation has led to some humorous moments, but it hasn’t stopped us from diving deep into SQL Server optimization techniques. I use dynamic SQL and XQuery to generate random not exists predicates and analyze their execution order in the query plan. By running this process multiple times, we can observe how the optimizer handles these clauses and whether reordering them might improve performance. So, grab your headphones (if you have any) and join me as we explore this intriguing aspect of SQL Server’s query optimization!

Full Transcript

Erik Darling here with Darling Data, and you might be wondering why I’m wearing a headset. Well, it’s a funny story. It turns out when you spend like 800 bucks on a fancy wireless microphone setup, this thing is fine, but the microphone that plugged in here, they don’t send you a backup of one of those. And it turns out that the little wire thing that you usually saw on my shirt about right here is very fragile. And if you try to adjust it, it’ll just snap. And so I’m waiting for my replacement ones to show up. Unfortunately, my scheduled YouTube videos are going to run out before they arrive tomorrow. So we’re going to do a few of these with the old headset, and I’m going to look like a goofy air traffic controller. While I talk to you about SQL Server. So I hope you I hope you can survive these trying times, my friends, where Erik Darling is wearing a headset, because I’m having a tough time with it. Honestly, I feel like I look stupid in this thing. Anyway, let’s talk in this video about stubborn not exists ordering. And by stubborn not exists ordering, well, of course, what I mean is the order that like SQL Server’s optimizer is famous.

For being able to take a query and do all sorts of stuff with it, play all sorts of tricks, like do fun things to like reorder joins. But one thing that it doesn’t do. At least that I can I can’t get it to ever do it is reorder the order of not exists predicates, or not exist sub queries, whatever you want to call them in a query. And I think it’s it’s kind of amusing the way that it doesn’t do this. If there’s a message here, it’s that if you’re going to write a query where there are multiple not exist checks, you may want to spend a little time figuring out if the order of them improves query performance at all.

I’m gonna I’m gonna I’m gonna give you a little spoiler here for our query, it doesn’t make much of a difference. It’s gonna run in two to three seconds anyway, because I did I did many things right along the way. But for you out there and real real user space land, it might it might actually make a difference. So before we do that, haha, we need to shh feel ourselves, sh feel all over ourselves. If you appreciate this SQL Server content, even when I’m wearing a headset, which is I know is hard.

You can sign up to become a member of the channel. And you can for as few as $4 a month, support a starving SQL Server consultant. If you if you are also starving for some reason, maybe AI took your job on already, I don’t know. You can do all sorts of fun stuff to help this channel thrive and survive. And we won’t put food in my stomach, but it’ll put a smile on my face. You can like you can comment, you can subscribe. And if you want to ask questions privately that I will answer publicly during my office hours episodes, you can do that at this link. All of the things that I talked about are very conveniently linked for you down in the video description. You don’t even have to think much about it. You can just click randomly on things until something works. Just just like with SQL Server. It’s a good time. If you need consulting from a guy in a headset.

I am available as a SQL Server consultant, not just not just a pretty face on YouTube, I could be a pretty face on a zoom call for you to health checks performance analysis hands on tuning of whatever you need tuned. Fixing your SQL Server is a SQL Server is a SQL Server. Fixing your SQL Server performance emergencies and of course training your developers so that you don’t have those emergencies anymore.

It’ll be a nice good time for you. I promise everything will go your way. You will you will you will be endowed with the the luck of the luckiest civilization out there. If you would like to get some training from me, I have 24 hours of performance tuning training.

You can get it all for about 150 US dollar rules. No tariffs on that. I promise we’re staying tariff free here in the data darling world. We go to that link you put in that discount code and you’ll get the everything bundle for $150.

It’s nice. I also have a new T-SQL course that is currently publishing as we speak. I have finished the read query portion and the modification query portion. Next up will be isolation levels and then programmability.

So it is half done. At least the beginner portion of it is the advanced portion will be after come out after the summer. The pre-sale price until all the advanced material is out is $250. If you will go up to $500 once the course content is fully fully released.

Just so you know, if you are attending past data community summit in Seattle this November, and you are attending the pre-cons that Kendra Little and I are doing on T-SQL. This is companion material to that content.

So if you attend the pre-cons, you will get access to this content as part of your admission to those. Speaking of speaking. Ho-wee.

Boy, am I my arms tired. Pass is going on tour. It’s the Red Gate Roadshow and they have cordially invited me to go to all of these things. New York City, August 18th and 19th.

Dallas, September 15th and 16th. And Utrecht, which is a hamlet near Amsterdam, October 1st and 2nd. And of course, this is all leading up to past data community summit taking place in lovely Seattle, Washington, November 17th to 21st, where Microsoft has indefinitely canceled their build conferences and has relocated them to sunny Las Vegas.

So, I don’t know, maybe, maybe pass a move to Vegas too. Who knows? Anyway, with that out of the way, what I want to show you is like, like, A, I’m going to prove my point.

But B, I’m going to show you how I prove points like this to myself. It’s often quite a process. It’s quite an endeavor, quite a chore to do these things.

But they help me. They help me understand things and they help me see things how they really are. So, what I did in order to sort of prove this out was I used my best friend, Dynamic SQL, and my other best friend, XQuery.

And what I did was I built up Dynamic SQL in a way that I could grab the query plan for the query that ran, have that query generate not exist clauses in random orders, and then give, like, show me the query that ran in the order that the clustered index things happened in on which tables, so that I could match up the order of the not exist subquery clause predicates, and the order that things happened in the query plan.

It was all very tedious. But when I got it right, it was very exciting. So, I’m going to walk you through the Dynamic SQL portion, and then I’m going to run this a few times, and then I’m going to show you the results.

All right? Cool. I hope you enjoy it. So, we have some variables up here, some local variables. These are not formal variables that we’re going to use to hold various things. This will hold the Dynamic SQL we’re going to execute.

This is going to hold the parameters for the Dynamic SQL. This is going to hold an output parameter for the Dynamic SQL. Why Zoomit just dissed me like that live on YouTube? I don’t know.

These are the parameters that we are going to pass into the Dynamic SQL. This is going to, again, this is going to be the output thing for this, and this is going to be the output thing for this down here.

Maybe I could have organized those slightly better. And we’re going to use this replacement thing here. This replacement thing is going to come in handy in a moment, which we’ll get to it. Just remember, there is a local variable called replacement.

This is going to be our general Dynamic SQL setup. We are going to set this every time, and we have this sort of, we have this thing over here that says replacement.

This is going to act as our token. This is the thing in this batch that we are going to replace with our varying not exists predicates. We also have this lovely piece of SQL in here, and this lovely piece of SQL is going to get the execution plan for the query that’s for the session that’s running here, which works.

I assure you, this is correct. As crazy as it looks and sounds, this is correct. So what we do is we set that, remember that replacement variable I told you that was very, very important.

We use this replacement variable and we set this, and we’re going to use this brand new, brand spanking new, fresh off the factory lot string ag function that came out in SQL Server 2017. And we’re going to use this instead of more XML.

There’s this XML down below. So don’t worry, all of your fetishes will be satisfied. And we are going to aggregate this column with an empty space. And we are going to say within group order by V.O.

You’re probably asking yourselves, what are V.C and V.O? Those are fantastic thoughts to have. And they are wonderful questions to have answered.

So this values clause generates two columns. One of them is this nchar10 with a not exist. And you’ll notice that each one of these not exists has a different table in it.

We have badges, comments, posts, and votes. And then we have this other thing that it generates for new ID. New ID is how we get the random ordering.

So this values clause, which is aliased as V, has the C column, which is where we’re holding the nchar10 and not exists. And then the O is where we’re holding the new ID.

So when we run this all together, we’re going to concatenate all of these not exist clauses ordered by the new IDs that get generated in here. And we are going to get randomly ordered not exists subquery clause predicates.

Okay, cool. After that, we are going to replace in the in the dynamic SQL portion, the replacement token with that string that we just generated and assigned to the replacement variable.

We’re going to execute our query and we are going to output the C and the query plan. All right. The C up here is the C is just, of course, the result of the count big.

We don’t actually do anything with this. We just throw it away. But the query plan, we do something very exciting with. You’re ready for some XML.

You look like you’re ready for some XML. So I’m going to give you some XML here, some X query for you. We’re going to print the query that runs. All right.

And then we’re going to delete some XML. Have you ever seen this before? Have you ever seen a delete from XML? We’re going to use this thing. And we’re going to.

So back story on why this is necessary. So we’re going to do a query. We’re going to do a query. When the query up here runs, there are two query plans that get generated. There’s a query plan for this query, which has the not exist stuff in it. And then there’s a query plan for getting the query plan.

Couldn’t make this stuff up. Right? Could not make these things up. So what we need to do is we need to preserve the first query plan and delete the last query plan. And apparently this works to do that.

Isn’t that insane? The second thing we need to do with XML is shred our query plan. Remember that little, that, that output thing called query plan that we, that we assigned the query plan to? Well, we’ve got something interesting to do here.

We have to select from a variable, the XML nodes for all of the clustered index scans. What we’re going to pull out of that is the node ID, which is going to tell us the order that things happened in the query. The table name that, that the thing that, so we have the node ID, which is the order that things happened in.

We have the table name, which is the thing that got clustered index scanned. And then I’m also going to select the query plan here so that we can validate our, our results. Okay.

All good. This is great. This is wonderful. I am very happy with all of this. We’re not going to do anything with this. This was me just making sure that I was, I was right about things. Uh, I ran this quite a bit to see if I could find any like real big outliers that would be like, Hey, look, this is a big performance thing, but there, there really weren’t any with, with, with, uh, with these.

Anyway, uh, what we need to do now is come back up to the top and we’re going to run set no count on once so that we connect to, and look at, look how nice this is. Look how nice this new SQL Server management studio 21 connect dialogue. We have this futuristic thing.

We can tell, we can even tell it what database we want to connect into, and we can do that. No one look at my password. Of course, this is highly, highly confidential information. Don’t look at the password, but the, we can tell it exactly which database to go to. So we, I don’t actually accidentally connect to master and hit an error the first time.

Fantastic. Thanks, Aaron. Anyway, we are all connected there. And now let’s run this once and let’s just see what happens. Uh, my, my, my VM may have restarted last night.

So this might, this first run might take a couple of seconds because we might have to read some stuff, uh, from, from disk into memory. Remember kids reading stuff from disk into memory is terribly slow. Keep your data in memory and your queries will always be faster.

So, uh, what we have now that we are finished, now that we’re down here, right? We want, we don’t know what, we don’t want to hang around up there. We want to hang out down here because our query, this is what the, this is the query that ran and gave us some stuff.

So we have the node ID, which, which is going to, and I’m going to prove all this out to you. We have the node ID. This is the order that tables were accessed in.

We have the table name and here’s the query plan. Over in the messages tab, we have the query that ran. So all of these queries start with the users table, right? So in all of these users ends up being first.

Then we have our not exist predicate clause query plan predicate subqueries. And we see the order that these happened in here. So we have badges, comments, votes, posts.

All right. B, C, V, P badges, comments, votes, posts. If we come back over to our results, we have users first, because that, that was, that was the from clause. And then we have badges, comments, votes, posts.

Okay. Well, do we know if these, how do we know these node IDs aren’t lying to us? Well, let’s look at the query plan. We have users.

Ta-da. We have badges. Ta-da. We have comments. Ta-da. We have votes. Ta-da. And we have posts.

Again, ta-da. If we hover over any of these, we’ll see the node ID down here in this little tooltip. So this is node ID 14. This is node ID number 12.

This is node ID number 10. And this is node ID number eight. Ah, we got it.

And this is node ID number six. All right. So I think, like, now that we’ve kind of proven that the setup, this is a valid test. Well, here’s our six, eight, 10, 12, 14, right?

Just like we saw up there. Let’s run this a few times and see stuff in some different orders. So now we got a completely different order on this one. Now we have posts, badges, votes, and comments.

And now these are no longer even numbers. Now these are odd numbers. We have seven, nine, 11, and 13. So remember, users, posts, badges, votes, comments. If you look at over here, here’s, oh, go away thing.

Here we have users, posts, badges, votes, comments. So the order matches again. If we keep running this, and I’ve done this hundreds and hundreds of times, like the number of loops I’ve written around this to do this over and over again is absurd.

And if we, every time we validate this output, here’s users, votes, posts, badges, comments. So VPBC over here. What do we have in the messages tab?

V, P, B, C. Every single time. So coming back to the main point, when you are writing queries with not exists, SQL Server, SQL Server’s optimizer will do, let’s just, I won’t say nothing.

Cause who knows, maybe like it almost does something, but then changes its mind and is just like, nah, I don’t think so. Hmm.

It’s not for me. SQL Server’s optimizer does not really make any attempt to reorder not exists predicates. So when you are writing queries that have not exists in them, some tables might be cheaper to access and do a not exists from, and you might be able to narrow rows down than other tables.

So always be very, very careful and cautious. The order that you write your not exists is in because you might be able to get your queries to run much faster just by reordering not exists is because SQL Server will be able to do it for you.

Anyway, that’s all I have for this one. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where I will still be wearing this atrocious headset until my little clippy thing shows up.

It’s not an actual clip, like a tie clip or like a clippy, like the paper clip guy. It’s just a little like, like clippy microphone thing, which I don’t know. I might start wearing for formal occasions as well with my Adidas tuxedo, because that’s how I roll.

All right. Anyway, I guess that’s good for this one. Again, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you over in the next video. All right.

It’s magic. Ta-da! Ta-da!

Going Further

My Upcoming Speaking Schedule

Posted on July 14, 2025July 11, 2025 by Erik Darling

Busy Summer

The nice folks at Red Gate have decided to put me to work.

That means I’m going on tour, and maybe getting some socks and a Hawaiian shirt.

No word on a “Lego Erik” yet.

PASS On Tour Events:

PASS Data Community Summit:

Seattle, WA – November 17-21

Of course, Kendra Little and I are back in action to teach back-t0-back T-SQL precons.

See you out there!