Learn T-SQL With Erik: Window Functions and Aggregates

Learn T-SQL With Erik: Window Functions and Aggregates


Video Summary

In this video, I delve into a fascinating aspect of window functions that isn’t always highlighted in discussions about these powerful SQL Server features. I demonstrate how to use window functions not just for simple column aggregations but also for more complex scenarios where you need to preserve and order aggregated data within a temporary table. By incorporating row numbers based on aggregate values, you can ensure that when querying the results later, they are ordered by the most impactful findings first—something many overlook in their SQL practices. I walk through an example using Stack Overflow data, showing how to calculate averages of counts over different time spans and order them effectively. This technique is particularly useful for troubleshooting and reporting purposes where you need to prioritize high-impact issues. Additionally, I share insights on the flexibility of window functions, illustrating that they can be used in a variety of creative ways beyond just row numbering, making your SQL queries more powerful and versatile.

Full Transcript

Erik Darling here with Darling Data. And to continue the Tickler material for Learn T-SQL with Erik, for which all of the beginner content is now published. There’s about 23 hours of it across an entire 69 modules. Do with that information what you will, but again, the presale price is still $250 until the advanced stuff drops after the summer. So, we are going to, I’m going to show you something that I think is very neat about window functions that not a lot of people pick up on about window functions. So, you may have noticed that I have a couple of store procedures that try to help people troubleshoot various aspects of their SQL Server.

Some of those store procedures have sort of like a roll-up of findings, an aggregated roll-up of all the findings in there. And what I found was that, of course, you know, it’s very easy to, you know, produce the sort of summarized output. What was not easy to do was to order that summarized output later.

So, what I ended up learning to do to make that easier is when I insert the data into the table, I have not only the aggregations of the things, but also a row number that gets produced based on the aggregations of the things. So, for example, like if we were just going to query the stack overflow database, the aggregation plus the ordering would look something like this. We’re going to get a count of all the posts, right?

So, we’re getting the post type ID up here and we’re producing some text. And so, like, because we’re producing some text, you couldn’t order by the text in the output. So, it’s just going to be like texty ordering.

We’re not ordering by like what had the most of something, which is what I would want to do. I’m going to like prioritize the high impact stuff first. But what I learned to do was like put that count into the order by clause of the row number function, which looks like this, so that I can preserve the numbered output for when I want to order it coming out of the findings temp table.

Like to just do this in one query would be trivial to do this when you’re like putting data into a temp table and then you want to select it out later with a specific order. This, this, this becomes a little bit trickier, but the results look like this. And of course, I’m not connected to SQL Server, which would help.

But now, when I run this query, I’m not ordered by the text output. I’m ordered by which posts had the most rows. Now, I know that for this standalone query, I could have just said order by count big descending.

That’s, that’s no problem. The point is that now I have this row number column. So when I want to, when I want to select data out of like my findings table later, I’m going to go to the next one. So I can say like order by finding ID and then this row number column.

So I have the top stuff at the top. So like, you know, post type ID to having 11 million posts and post type ID one having 6 million posts. Those are up at the top of like whatever thing I want to show you.

But, you know, this is something that not a lot of people understand about window functions is that you can partition by, you can order by and you can pass aggregates into them. Just like you do. Like, you know, like another aggregate or something like you could do like sum divided by count or something like that.

Like you could, you can, you can have these things intermix and sort of live together. So just like an easy example of that would be like doing some average, like averages of counts. Like you don’t like a lot of, like I’ve seen a lot of queries written like this where like someone goes and like gets the count and like one CTE or something.

And then does the averages after they’ve like done the counts, which is kind of silly because you can just do it all in one place. The only thing you have to be a little bit aware of, of course, is how you choose to set up your, your, your window range and row specification to deal with that. So what I’m doing in this query is I am hitting the post table again.

I’m doing a little bit of fanciness in here in order to, in order to like give, give myself like consistent, like accessible aliases for these two expressions across my whole query. And what I’m doing is I’m asking for an average of the count over different spans of time. Now, since I want three, six and 12 month averages for the three month average, I have to say between two proceeding and current row, which feels a little funny.

Cause you’re like, I want a three month average. I should put three proceeding in current row, but that would give you four rows, right? You want two proceeding and the current row.

That’s three total. And it’s the same thing for the six month average and the 12 month average for the six month average. You want five rows. And for the 12 month average, you want 11 rows and proceeding. Now, granted for the, the 12 month average, it wouldn’t matter as much because it may be like the last row anyway.

So you’re like, like you could just say between unbounded proceeding and current row if you wanted. But if there’s more than that, if there’s more to that, then you would need to, you know, obviously make sure that you are very specific about it. But now when I run this query, I’m able to just all in one sort of fill swoop of things, get the average number of posts across three, six and 12 and all 12 months in the year 2013.

If we were doing multiple years, that’s where you have to be really careful with the 12 month one. Um, but what you’ll see here is that when we look specifically at these three columns that produce the averages, uh, you’ll see the, the three month average here across all three of these, uh, that, that is the same. But then for the six month average, this number changes from this number.

So this is just another three month average here, whereas this is really producing the six month average across these, right? So like this 12 month average here too, we’ll have the same thing across the first six months, but then, uh, like where this thing for every three months will reset and give us a new set of averages. This gives us two, two window frames of add of averages, the six months and six months.

And then this one, like where this agrees up to the six month part, this is, this really, this really like departs here. And then we have like the full 12 month, uh, um, you know, sort of like, uh, average across all 12. So the main message here is that, um, window functions, uh, are not just sort of limited to, you know, like a column in your tape, in your table.

You can do all, you can, you can assign all sorts of stuff to them. Uh, you saw there where, uh, the first query where I was ordering by account big, uh, and this one, I’m getting the average of count bigs over different spans of time. So there’s a lot of neat stuff you can do with window functions that, uh, are like, may not be just like immediate, obvious and apparent to you based on the way you see a lot of window functions out in the world written.

But there are some really cool things you can do with them that often get overlooked. Like you’ll hear people talk about like, you know, how cool and powerful they are, but they kind of just give you the same examples of like, you know, the kind of like, I don’t know, like, here’s a row number over and over again. Like here’s this different things you can do with row number and you’re like, okay, row number. Great.

But like, like think of all the other neat stuff that you can do for your sort of like neat analytical queries. Anyway, uh, that’s enough here. Again, this is all tickler material from learn T SQL with Eric, uh, still on sale for 250 bucks. There’s a link down in the video description. If you want to see much, much more material like this, uh, I don’t know.

Otherwise, I don’t know. Go live in the T SQL dark for the rest of your life. See if I can. All right. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Common Table Expression Fork Bombs

Common Table Expression Fork Bombs


Video Summary

In this video, I dive into a fascinating concept known as a CTE fork bomb, exploring how recursive Common Table Expressions (CTEs) can lead to exponential growth in query execution. I start by setting the stage with simple sample data and gradually build up complexity, illustrating how nested loops joins within CTEs can multiply the number of rows, leading to significant performance impacts. By breaking down each step of the execution plan, I highlight the importance of understanding these patterns for optimizing queries and avoiding potential performance pitfalls. The video is packed with detailed explanations and visual aids, making it a must-watch for anyone interested in deepening their knowledge of SQL Server query plans and optimization techniques.

Full Transcript

Erik Darling here with Darling Data. Feeling very high energy today, very… very just pumped up. Let’s go. Let’s go get them. Today I want to talk about a CTE fork bomb. If you’re not familiar with what a fork bomb is, you can sort of consider it to be like viral replication where one cell becomes two cells and two cells become four and four. It just keeps, like, getting bigger, right? And that’s what we’re gonna do today. If you would like to support this channel, you can do so. There’s a link to become a member down in the video description below. If you want to ask me questions during my Office Hours episodes, you can do that. Otherwise, the usual like, comment, subscribe stuff is all available to you should you feel so encouraged to do something. So if you need SQL Server consulting help, well, that’s me. Health checks, performance analysis, hands-on tuning, dealing with performance emergencies and training your developers to not write fork bombs on your servers. All good. All good and worthwhile things there. You can get all of my performance tuning content, about 24 hours of it for 75% off. Again, link down in the video description. That brings it down to about 150 bucks and you get that for the rest of your life.

The T-SQL course is now half done. All of the beginner content is online and published. There is about 23 hours of it across, last count, 69 modules. So that’s fun there. Of course, past pre-con attendees will get free access to all of this companion material. It is on sale right now for the pre-sale price of $250. That price will go up in the fall to $500 as soon as everything is said and done. I am doing a lot of outside the house stuff this summer. I will be in New York City, shockingly, August 18th and 19th. I will be in Dallas, Texas, September 15th and 16th.

And I will be in Utrecht, that old Netherlands thing, October 1st and 2nd. And of course, I will be at Pass Data Community Summit from November 17th to 21st in Seattle, Washington, assuming that Seattle is still a city at that time. But with that out of the way, let us talk about this CTE fork bomb thing. Because, you know, making fun of CTE never gets old. At least not for me anyway.

So we’re going to create some sample data here, some simple sample data, because we don’t want to create overly complex sample data that will confound and confuse the masses out there, do we? We want very simple, straightforward demonstrations so that everyone can understand everything. So before I get to the actual fork bomb, there are some agreements that you and I must come to.

We must agree on these concepts so that by the time we get to the fork bomb, you understand fundamentally what is happening. So if we run this query that joins together the two tables that I just created and populated with data, we will get back 255 rows. And if we look at the execution plan, there will be one scan of the table T1, one scan of the table T0, and one merge join in order to produce those 255 rows.

If we look at the details here, there is one number of executions for that scan. And there is one number of executions for this scan. Good stuff.

We can agree on these things. If this query were to use a nested loops join, say like this, where I’m going to force some things to happen, the query plan would change. We would no longer have a merge join.

We would have a nested loops join. And this table would 255 rows would be read from T0. 32,767 rows would be read from T1.

What changes aside from the nested loops join is we had a merge join before, if you remember back that long ago, you little goldfish. This scan still has one execution. But now we have something different on the inner side of the nested loops join.

Now we have an index seek, and we have 255 executions of the index seek. All right, 255 right there. So when you have a nested loops join, the thing on the nested loops join will execute once for every row on the outer side of the join.

That’s this part to go get rows. Since I don’t know how the cops are going to sound on the new microphone. So it’s my first one.

So lucky us get to experiment together. If every time a row comes out of here, since the way that the data is designed, every row will match. Right.

So all 32,000 rows in this table have a match here, which means all 255 rows in this table have a match here. So we do this seek 255 times, we find 32,767 matches, and then we aggregate those down to 255 based on the number of unique IDs that came out of t0. Now, if we were to put that query into a CTE, and we were to join that CTE to itself, the query plan would change yet again.

Now the query plan. Well, I mean, A, we have a hash join up here, but now we have two nested loops joins. We actually have two copies of that query that runs because of course, if you are a frequent watcher of my videos, you will know that Microsoft SQL Server at this point in time does not offer a mechanism to materialize the results of a CTE.

That lack of materialization means that every time you reference a CTE in an outer scope, you will have to rerun the query in the CTE. So we actually have the same plan run twice, right? There’s the first copy of the plan of the execution of the query, if that’s easier for you to deal with.

And down here is the second one. Now, what this means is that we have two scans of the table t0, right? And see a number of executions.

Oh, you know what? That is hiding behind my head. Let’s try that again. So we have two scans of the table here, right? One or rather one scan of the table here. We have another scan count of one for this table here for a grand total of two.

Now, this index seek into t1 has 255 executions and so does this one. So we have two total scans of the table t0 and we have 255 scans a piece, which is, I’m going to guess around 510 seeks into t1 total. Now, since there’s a hash join up here, right?

We have a hash join that brings these two results together. Remember when I said from the CTE, join the CTE to itself on the ID column. So this is how SQL Server chose to join those two CTE queries together with a hash join.

To simplify things quite a bit, let’s just say, you know, for the sake of making sure that we stay in agreement, that the first query plan up there, the one, the uppermost plan above my head, the outer side of the join ran, did all its work, went to the hash join, built a hash table, and then the inner side of the query ran and the hash join did its thing to, you know, compare rows in the hash table and all that other stuff.

So the, let’s just say the outer side of the query ran, got to the hash join, then the inner side of the query ran and like got, then got to the hash join and comparisons were made and we decided which rows matched and which didn’t at the join. So these re you really do have two executions of this. Another sort of easy way to see that is that you like, when you look at the operator times, you know, you have like this part of the plan executes and it takes like four milliseconds of accumulated time across all the operators here.

Then you have this part of the plan and there’s four milliseconds of time across all of the accumulated operators. And then you have the hash join up here, which is happening in row mode. So this is the four milliseconds here plus the four milliseconds here plus one millisecond of time spent in the hash join.

All right. So that’s how that looks. Ergo, which I’m told is the, which I’m told is a word, which is, which is just great, I guess.

If we combine the CTE join situation with a nested loops join, the query inside the CTE will be executed, not just once, but once per row that goes into the loop join. To see what I mean, what we’re going to do is instead of just doing a join, we’re going to do a cross supply with a top one. The cross supply, I’m not saying cross supply is bad.

It’s just there for a little bit of convenience because cross supply does often get optimized to a nested loops join. So we’re going to use this for convenience. So I can show you this execution plan.

Notice the top part of the plan, right? Everything looks the same up there. We still have basically the same two copies of the plan that run, right?

There’s like that looks very similar to the original one. And then down here, we have that second copy of the query that ran just preceded by a top operator here because we had that top one. Right.

So now we’re going to see the one scan here and we’re going to see the 255 seeks here. What’s going to change is that instead of having one scan here. Now, this is an index seek now, and we did 255 seeks into this.

Well, the estimated CPU cost is 255, too. That’s that’s amazing. I was like, huh?

Okay. Number of executions, 255. Okay, cool. So every for every row that came up out of here, right? We got we aggregated everything down to 255 rows here.

All 255 rows went into the nested loops join, let’s say one at a time. And every time this nested loops join got a row, it went down here and said, hey, seek into here. And so we did that 255 times.

From here, we went into a nested loops join and we hit this thing 255 times. Okay, like this is this is the same, right? We still have 255 here and we still have 255 here.

This number didn’t really multiply here because we still we have a nested loops join here that’s sort of protecting us. So let’s have a little fun with this. Let’s add some more work.

Let’s further amuse ourselves. Because we are we are we are nothing if we cannot amuse at least ourselves. If we can’t amuse ourselves, what have we got? So we’re going to add some more work to the initial CTE.

We’re going to add some window functions in. We’re going to add average and row number and count big. Right.

So we’re going to we’re going to make SQL Server do some more work in the initial part of the query. And this is where things I think get kind of interesting. So if we run this whole thing now, we’re going to have to do some multiplication math. Right.

So let’s let’s zoom out a little bit here. I’m actually going to. Oh, let’s see. Can I get that? Let’s see my head. My head is going to be between two operators, but that’s okay. So if we if we look at this top part of the plan, it will be a mirror image of the bottom part of the plan up to a degree.

Up to a point rather. Let’s move this way over here. That’s probably about good.

Oh, tooltip. You just had you had to do it to me, didn’t you? So if we look at this part of the plan, right, if we start up here, you’ll see, you know, like the index scan on T0 255, the index scan on T1 32767. And then going over, there’s a merge join and a segment and a sequence project like that’s doing some like the row numbery stuff.

And if we go over here a little bit further, and if you just kind of keep your eye on the bottom part of the plan, you’ll notice that they are essentially mirror images of each other aside from a couple extra operators here. But they both do the exact same thing. Where things get interesting here, I think, is that we still have the same pattern where every row, the 255 rows that go into this loop join that bring this reference to the CTE and join it to this reference to the CTE.

We still have 255 rows that go in there, but way down over here, things really start to multiply out that didn’t work out. So if we look at this part of the plan specifically, notice that this isn’t 255 anymore. This is 65,025, which if you don’t have a calculator handy or just a lot of fingers, that is 255 times 255.

If you look at this number, this is 8.3 million and change. That’s 255 times 32,767, which is that number up there. So now we have sort of fork bombed our CTE by using, but with the nested loops join, because every time this nested loops join runs, we end up multiplying the number of rows in the table by the number of rows that come out of the loop join.

So if you sort of compare the numbers going across, like this merge join up here, this has 255 rows go in. This merge join down here has 65,025 rows go in. Or rather, like come out, sorry, because you have the 255 rows go into each.

Right. And that’s that that happened because we get we aggregate this down over here. Right.

This gets squished and this gets squished to 65,000. So the 8.3 million gets aggregated to 65,025. Here, the 32,000 gets aggregated down to 255. So now, like like we instead of having 255 rows go out here, we have 65,000 rows go across.

And you can see that the number of rows because we have fork bombed ourselves with the nested loops join, the number of rows that go across here are going to be much larger for the bottom part of the query. And you can actually see far more time end up across all those operators. If you look up here before we go into this nested loops join, we have only used five milliseconds of wall clock time.

If you look down here, look at how the time builds up. I don’t know what my head’s going to be sort of in the way. It’s 518 milliseconds in here, nine milliseconds in here.

We get up over 800 milliseconds by the time we get to here. So all of these accumulated operator times get to 800 right there. And then as we go across SQL Server dealing with the number of rows on a single thread, like we just add more and more time to this.

So there’s a there’s a window spool here. We get up to 1.1 seconds. And then after this segment, we have a we have a table spool and we get to 1.2 seconds.

And then we do this all this stuff and we get up to 1.242 seconds. So depending on like usually when, you know, like we talk like I talk about CTE, I’ll say something like, you know, your CTE will run once for every reference you make to it. But depending on the query plan that SQL Server chooses, your CTE might run way more times than that.

Right. So if you’re if you’re CTE joins to itself or joins to itself or, you know, like just joins repeatedly to like say you, I don’t know, throw a third table in the mix. And let’s say you have to join your CTE to one table and then you have to join your CTE, maybe to another column in that table or to a different table.

Depending on the join choice, your CTE might end up executing way, way more times than just once per reference. If you choose a nested loops join, it’ll execute once for every row that goes into the nested loops join and then has to run your your your reference to that CTE. So isn’t that fun? Isn’t there just so much fun in query plans?

Isn’t there just so much interesting, exciting stuff that just makes your day? Mine too. All right. Cool.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will undoubtedly talk about more fun and exciting execution plan stuff. All right.

Cool. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 21

SQL Server Performance Office Hours Episode 21



To ask your questions, head over here.

What’s the biggest T-SQL re-write that you’ve done for a customer? Conversely, what weird query tuning trick have you done which returned maximum gains for minimum code change (that isn’t option recompile or index tuning).
I have a non tech question, kinda. You clearly have a talent for taking a complex topic and breaking it down to its simplest form to show the underlying theory. Any tips for this? It’s the number one thing I struggle with when writing blog posts. Thanks!
Thank you for your handy tool, sp_quickiestore. I was recently trying to get more information about a stored proc using the stored proc name but it returned there was no information, but when i used a text, which is part of the stored proc, it returns the information I needed. The stored proc has multiple (3) plans because of PSO, could that be the reason it failed when I searched based on the stored proc name?
Hi Erik! When we update our PMS app to the latest version, the database update involves re-running ALL procedures and triggers (modified or not) with DROP and CREATE (not my decision – I have to stick with that though). What is the downside of the above operation? Does it make any sense to run the sp_updatestats after that? Thanx!
why haven’t you been talking about sql server 2025?

Video Summary

In this video, I dive into some of your most pressing questions about SQL Server and T-SQL tuning. We tackle topics ranging from the biggest T-SQL rewrites I’ve undertaken to the most effective query tuning tricks that yield maximum gains with minimal effort. You’ll also get tips on simplifying complex concepts for better understanding, whether you’re writing blog posts or just explaining things in general. Additionally, we discuss the nuances of using stored procedure names versus text when querying the Query Store and explore the downsides of dropping and recreating procedures during database updates. Lastly, I share my thoughts on SQL Server 2025, highlighting both its potential and the areas where it falls short. Whether you’re a seasoned DBA or just starting out, there’s something here for everyone to learn from.

Full Transcript

All right, you heathens. It’s time to give in to our darkest desires, and we do office hours. This is where I answer five user-submitted questions at a time and try to give the best semblance of an answer that I can provide. The usual stuff here, if you like this channel and you want to, and you feel that it’s worth your wallet, you can become a member of the channel to support my efforts to bring you the highest quality SQL Server content known to humankind. That’s down in the video description. If you want to ask questions that appear on Office Hours, this link is also down in the video description. It’s very easy and anonymous. You don’t have to hardly do any work. Other things that are useful to me, liking, commenting, and subscribing, because that’s, you know, I guess cool too. If you need if you need a consultant for SQL Server, you would like to hire me to come work personally with your deepest, darkest data, you can hire me for all of these things. Health checks, performance analysis, hands-on tuning of the worst of your worst, dealing with your performance emergencies, and training your developers so you do not have performance emergencies anymore. I do all that stuff and more. And as always, my rates are reasonable.

All right, come on. Next slide. There we go. I clicked. If you would like to buy my performance tuning content, you can get all 24 hours of that for about $150 US buckaroos. No tariffs added with that discount code, and that will last you for life. If you would like to get in on the pre-sale prices for my T-SQL course, Learn T-SQL with Eric. That’s me. You can get almost all of the beginner material is now out and publicly available. Many hours of content. The price will be going up to $500 once the advanced material is done after the summer. And if you are attending past data community summit in Seattle, and you’re coming to the T-SQL pre-cons that Kendra Little and I are teaching, you will, of course, get complimentary access to the course because this is companion material to the course.

That means it is not the same material, but it is a good companion to the material there. So if you’re attending the pre-cons, you’ll get this stuff. Ain’t that your lucky day? And, of course, this speaking schedule is going to be grand. The Red Gate Roadshow is taking me on tour. I will be in New York City. Surprise!

August 18th and 19th. Dallas, September 15th and 16th in Utrecht. Rolls right off the tongue October 1st and 2nd. And, of course, past data community summit taking place in Seattle, November 17th to 21st. So I will be live and in person in all of these places, I don’t know, to answer your questions, give you hugs and high fives, tell you you’re awesome at your job, whatever you need me to do.

Anything for a buck. But with that sort of the way, let’s party. Let’s do these office hours-y questions. And let’s zoom in here and make sure that we are nicely framed up, make sure everything is legible above my gigantic head.

And we’ll start with this handsome devil up here. What’s the biggest T-SQL rewrite you’ve done for a customer? Conversely, what weird query tuning trick have you done which returned maximum gains from minimum code change that isn’t option recompile or index tuning?

So I have written, I have rewritten entire applications for people. Like, I mean, like maybe not every single store procedure because like, you know, they’ll be like, hey, we don’t use this store procedure anymore. Which, you know, that’s, that’s cool with me.

But, you know, pretty much like, like every store procedure that was, you know, currently in use, I’ve done, I’ve done rewrites for. Or, you know, like, like depending on the development team a little bit, like there are some people, like there are some times when like I can rewrite, you know, like a handful of store procedures and just be like, like, you know, like, like follow this pattern generally and, you know, other stuff. And if you get stuck on anything, let me know, we can work on it together.

But, you know, I like really like, like hundreds of store procedures and functions. There was, there was one client. I, I, like, I think if I remember the final count in the rewrite folder, it was like, it was something like 56 scalar UDFs that I, I, I hand rewrote.

And that was, that was just the UDFs. It wasn’t even like the store procedures and other stuff. So that’s that, that’s that answer.

Sure. The, but of course, like the, the biggest, the biggest query tuning trick is, you know, it’s probably just like getting batch mode involved when it’s appropriate. Not only because, you know, like not even adding a, like columnstore index, but just, you know, playing some trick on SQL Server so that batch mode gets involved somewhere opens up a lot of, opens up a lot of doors.

You know, obviously it’s better with columnstore as a data source for a lot of these things, but sort of generally like getting batch mode involved solves a lot of problems really quickly. That would otherwise take a lot of index tuning and consolidation and, you know, query rewrites and trying 50 million things to get nudge the optimizer towards some specific pattern or path that I care about. But, you know, batch mode is probably the easiest one to do there.

If I had to pick a second place, you know, as far as just like, you know, like, like bang for the buck, it’s, you know, like breaking up queries that are, you know, miles of CTE. And, you know, getting, like using temp tables at certain like logical breaking points to materialize results. Like everyone thinks that the CTE materializes a result, but it doesn’t.

And so using temp tables in place of that is often very valuable as well. All right. Next up, what do we have here?

Oh boy. I have a non-tech question, kind of. All right. Shortly. You clearly have a talent for taking a complex topic and breaking it down to its simplest form. Oh, thank you.

That’s what I’m known for. Being simple. To show the underlying theory. Any tips for this? It’s the number one thing I struggle with when writing blog posts. Thanks.

So the way that I teach is the way that I learn. If I don’t break things down, like for myself step by step, I get lost and don’t learn things. So I need to break things down into very simple terms that fit into my brick head and make sense to me there.

If I had to give you advice about how to do that, it would be like when you’re writing a blog post, you know, in your head, it’s really easy to like logically jump from one thing to another to get the words out. But like when you speak it out loud and you like see the stuff on the screen and you like something catches your eye and you’re like, wait. Like go to talk about it and try to explain something.

And if you get stuck on something, that’s something else that you need to put in the post. That’s another thing that you need to add in to further break this thing down to make it explainable. Like a lot of blog posts, like even mine, like, you know, I’m not going to pretend I’m not guilty of like, like gloss over some stuff and like leave some details out either because, you know, it’s like a whole other blog post to explain it or it’s like too much of a detour.

But like if like if you want to be able to do that, like like like don’t just write your post, like read it out loud or like rehearse the material out loud. So you have a better idea of like not only what you want to say about it, but like like you really find yourself getting deeper into the nooks and crannies when you have to talk about stuff like out loud. Like you don’t even have to. I mean, you can record it if you want.

I don’t know. But like having that sort of extra added thing where you’re like, like, oh, but is this the well, no, let’s not. OK, never mind. So that would be that would I don’t know. That would be my advice is speak your content out loud because that will force you to to think more about everything that you are looking at and everything that you are saying.

And if you hit one of those unexplainables, that is that is often a good sign that that you need to break things down a little bit further. All right. We’ve got got quite a thing here. Oh, dear. That didn’t work out well. Let’s try that. Let’s try that little rectangle again.

Thank you for your handy tool, SB Quickie Store. I was recently trying to get more information about a stored procedure using the stored procedure name, but it returned. There was no information. But when I used a text, which is part of the stored procedure, it returns the information I needed. The stored procedure has multiple plans because of PSO.

Could that be the reason it failed when I searched based on the stored procedure name? So, yes. So like what what you what you guessed is most likely correct, because in the in the parameter sensitive plan optimization, like like there is an object ID in the XML, but the like the way that the plan is expressed is a lot like dynamic SQL, where it’s like almost completely detached from the object ID of the thing that called it.

So like in query store, there’s an object ID for the procedure. But if like your procedure doesn’t do anything really like meaningful or anything that query store captures, depends on your capture settings that like puts that into query store, then it’s going to say no other stuff that like, you know, has messed me up trying to find procedures and query store was like, like non defaults, like like non DBO schema.

Like there is a there is a like a procedure schema and a procedure name parameter for it. So if it’s not in DBO, that’d be another thing to try. I don’t think that quickie store handles square brackets gracefully.

I did some work to try to make it so that the procedure name parameter was sort of overloaded. So if you put in like, you know, like like square bracket DBO dot procedure name, it would like it would use parse name to break that stuff out.

But I forget how far I took it. So like like making sure that you don’t put the procedure schema and name in the square bracket to be another thing to try. But I think for you specifically, you are generally correct that that would be why it didn’t show up.

It is kind of a pain in the butt. But at the same time, like like I, I really want to avoid XML parsing in quickie store because querying the query store underlying views is kind of painful enough. You know, like Blitz cache working on that was a lot different because getting data out of like the plan cache, like aside from the XML parsing bit was generally pretty quick.

You know, of course, like depending on the size of stuff and a million other details. But like, you know, the XML parsing was what really took up time in there. Like querying the query store stuff is such a misery that I got.

Like I don’t want to add XML parsing in there to like go look for like that hidden object ID in the plan XML for the to detect the parameter sensitive plan optimization stuff. There might be some shortcuts around that. I just haven’t looked much at it yet. You know, like that that’s sort of why the query text search thing is in there.

And I understand that the query text search part of it is not as fast because we’re wildcard searching a bunch of query store data for some some query text. But yeah, I don’t know. It’s something I’ll think about, but I don’t know that I’d really get to it very quickly, at least at this point. All right. One one more question here. Hi, Eric. Hey, how’s it going? That’s me.

When we update our PMS app to the latest version, the database update involves rerunning all procedures and triggers modified or not with drop and create. Not my decision. I have to stick with that, though. What is the downside of the above option? Does it make any sense to run the update stats after that?

So I don’t think it does answer your questions kind of backwards. I don’t think it makes sense sense to run update stats after that. I think the one thing that annoys me about drop and create is that will create new object IDs for everything that gets dropped and created.

And for me and my analysis procedures. So SP quickie store being one of them. Human events block viewer.

Like there’s a lot of different ones where there are like there’s like not always like the procedure name or the. Like an object name there in there. Sometimes it’s just based on an like an object ID and I have to decode objects in the database by like object ID database ID.

And if you’ve gotten created like dropped and created your objects and they get new object IDs, I can’t I can’t resolve those names. And so that messes me up. So that’s the real downside there is you hurt you hurt me wound me terribly.

These these practices aside from that, I don’t I mean, I don’t really I can’t really think of anything that would be all that annoying with it. You know, you’re going to lose query plans. You’re going to.

You know, the plan cache sucks anyway. You know, you can have a bunch of recompiling stuff when you start creating query plans. But again, the plan cache sucks anyway. So I don’t know.

Not not a whole lot. Not a whole lot to go on there. All right. Question number five. Why haven’t you been talking about SQL Server 2025? So I’ll be very honest with you.

I don’t find anything all that compelling with it. All the stuff is just dumb to me. You know, like vector. OK, great.

Vector search. Cool. OK, fine. It’s there. You know, I care about T-SQL enhancements. I care about performance enhancements. And, you know, like like there is some neat.

There are a few neat things in 2025 that I do want to talk about. The optional parameter thing. The the optimized Halloween protection using using accelerated database recovery. Like, you know, the optimized locking stuff.

Even though the optimized locking stuff has kind of been around for a little bit in Azure. Like there are a few things in there that I think are cool and that I want to talk about. But here’s the thing. Microsoft has been so heavily invested in screwing up fabric that they didn’t take a lot of money.

There’s a lot of time out to screw up stuff in SQL Server 2025. So a lot of it is just kind of like like there’s just not a lot in there aside from like the dumb AI stuff. And like it was like, oh, it’s ground to cloud to fabric.

Who? Come on. It’s like ground to cloud to nowhere. Right.

Who cares? Anyway, those are those are my five answers to these five questions. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And I will see you in another video sometime soon where I will most likely be continuing to try to pedal my course. Learn T-SQL with Eric. Because it’s a good one.

Paul White Tech reviewed it. So at the very worst, at the very worst, it’s it’s it is entirely technically accurate. So I’ve got that going for me.

All right. Thank you for watching. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Grouping Sets vs Union All

Learn T-SQL With Erik: Grouping Sets vs Union All


Video Summary

In this video, I delve into the lesser-known world of `GROUPING SETS` in SQL Server and explore its alternatives, including `CUBE` and `ROLLUP`. While these features are powerful for complex aggregations, they often face performance challenges due to how the SQL Server optimizer handles them. I demonstrate a real-world example using Stack Overflow data, showing that even with a columnstore index, `GROUPING SETS` queries struggle to utilize batch mode efficiently, leading to slower execution times compared to equivalent queries without these features. By comparing the query plans and performance metrics, you’ll see firsthand why sometimes sticking to simpler methods can yield better results in practical scenarios.

Full Transcript

All right, we’re gonna talk about, boy, something I don’t see very often, be honest with you. Maybe I don’t see it very often because a lot of people don’t really understand how this stuff works and so they get afraid of it. But we’re gonna talk about grouping sets and alternatives here. And the alternatives are important because there’s also the distinct possibility that someone wrote a grouping sets query once upon a time and, gosh, it just didn’t perform terribly well, which is, uh, eh, that’s something that can happen. Boy, howdy. We’re gonna look at that. Um, all of this content is preview material for my course Learn T-SQL with Erik. Uh, it is on the pre-sale price right now of 250 US dollars. That’ll last you for the rest of your year. Uh, it will go up to 500 US dollars once the advanced material is published after the summer. Uh, the, the beginner material is on the very cusp of being completely published. So, uh, you’ll have many hours, uh, of, of content to get through, uh, anyway. Uh, this is all companion content, meaning not the exact content, but it is companion content to, uh, the material that Kendra Little and I will be presenting at, uh, past data community summit in Seattle. So if you attend our pre cons there, uh, you will, you will get access to this material, uh, with your price of admission. Um, I think that’s about it. So we’ll just get at a lot of this thing. And, uh, well, I mean, there’s a whole slide here. Now you can see all my secrets, right? You can see all this stuff that I do when you’re not watching. So, uh, we’re gonna get into the old stack overflow database here.

And, uh, I’m gonna show you what a grouping sets query looks like. There are of course other grouping set, uh, let’s just call them clauses that you can use, uh, like cube and rollup. Uh, cube will, uh, create distinct grouping sets out of all possible column combinations and rollup will, uh, create, um, well, rollup is a little bit more, uh, uh, complicated in what it does at least, at least it’s more complicated to explain where you define your rollup clause. And then SQL Server will take whatever column you put first in the rollup clause and match that to other columns. So it doesn’t do all unique sets. It just does like a subset of unique sets based on whatever you put in first. Uh, I cover it in way more detail in the actual material.

So if you’re really fascinated by what cube and rollup do, uh, you can, you can buy the material and watch it or just watch it if you’ve already bought it. But, uh, with grouping sets, you can define whatever sets of columns you want to group things by. Uh, I have chosen to do, uh, a few levels of grouping here, one by, uh, all, by all three columns that I’m selecting.

And then one grouping set for, uh, each of the columns individually. I could of course add stuff in here. Like I could add another line that says vote type ID and post ID and another one that says vote type ID and vote year and post ID and vote year. But just to keep, just to keep things relatively simple, I’m going to, um, I’m going to just do the three together and then the three individually.

There’s also this empty parentheses at the very end, which is going to be the global aggregate for this. So I’m going to start this thing running and you’ll, you’ll notice that I have a columnstore index up here. I have not created that yet. Uh, we’re going to save that for later.

What I want to show you here is just that, um, SQL Server, uh, SQL servers optimizer fights real good and hard against, uh, using any sort of batch mode stuff, uh, with these grouping sets queries. Uh, even like we’ll show you in a second with the columnstore index, but right, right about 30, 30 or so seconds for this thing. And you know, grouping sets queries, they tend to need to process a lot of data.

Like you’re talking about like, you know, doing like big aggregations of stuff. It would seem a fairly natural fit for the optimizer to want to use is like batch mode for these things, but we just don’t get any. You can kind of figure that out from the query plan by looking at the choice of operators, not always, but in some of these things.

So like, you know, first we have, uh, parallel exchanges, parallel exchanges, like there’s one here, repartition streams, repartition streams, repartition streams, distribute streams. Uh, that, well, that’s a filter. That’s not quite it.

Uh, gather streams over here. None of the parallel exchanges support batch mode. There are operators in the plan that support batch mode, but they all, I, I, I, at least the last time I ran this, they all ran in row mode. So like there’s a sort at the end.

Uh, there’s a filter that runs in row mode, filter support batch mode. Uh, stream aggregates here. One, two, three, uh, well, one, one up there, one down there. Those don’t support batch mode.

The hashes do support batch mode, of course, but they all run in row mode. So this whole thing is kind of cooked. This whole, this whole thing takes 30. Well, actually we should go look.

Cause like that looks pretty funny. 43 seconds. And then I got 20 seconds, 43 seconds, 32 seconds, SQL Server. What happened? Let’s, let’s go, let’s go look at the actual tail of the tape over here in, uh, the query time stats. Hopefully my big head won’t be in the way.

So, uh, that, that one parallel operator lied to us. So we used 166 seconds of CPU time to get 32 seconds of wall clock time. So what we appreciate, we appreciate SQL servers willingness to use a parallel execution plan here.

We do not appreciate the lack of batch mode here for them. Look at that number of rows that we are selecting and aggregating. Gosh darn it, SQL Server.

Why won’t you use batch mode? So what I’m going to do is I’m going to kick off, uh, creating this columnstore index. And just so I don’t get yelled at later, I’m going to make sure that that semi colons in there. We don’t want to improv.

We don’t have a lack of termination here. Do we? I’m going to get that started and I’m going to show you what an equivalent query without the grouping sets stuff looks like. Uh, so we’re going to have one query at the beginning where we select and group by all three columns.

That’s the vote type ID, post ID and vote year. Uh, I’m using a little trick that I showed off in one of my earlier videos where I’m doing this little cross apply values trick. To have, uh, one source for this date, part year creation date thing.

Uh, so that I can reference that one. Uh, well, I can reference that a few times throughout the query without having to keep rewriting date, part year creation date. Now, of course, like you have to do more typing, right?

So we have one query that does all three columns and groups by all three. We have another query that groups by just vote type ID. We have another query that groups by just post ID.

We have another query that just groups by vote year. And then we have one query that, that essentially just like doesn’t aggregate on all of them. Now, if I run all, what I want to do is run all these together. Now, keep, remember that I just created a columnstore index here.

We wouldn’t need the columnstore index for this thing to run in batch mode. Uh, SQL Server would shoot, could shoot, can choose batch mode very naturally for this. Um, I’ll show you that at the end, but the important thing is that when you look at this plan, there’s two things.

There’s a couple few things about this that are much, much different from the plan that we just saw using the grouping sets method. One, there are no parallel exchange and exchanges in this until the very end, right? There’s one necessary one at the end where we do gather streams.

Uh, but there’s also no stream aggregates. Like we don’t have all the repartition stream stuff in here. And of course, if we look at, we look at the execution modes for all of these operators, these are all going to say batch mode, batch mode, batch mode. So this whole thing runs a lot faster.

If you look at the query time stats for this one, just cause it’s not, maybe not terribly obvious from the, from the query plan as a whole. Uh, we spend, uh, 1.13 seconds of CPU time for 1.8 seconds of elapsed time, right? So that’s a huge improvement, uh, using batch mode over, uh, like row mode execution for these queries.

Let’s go back up to the, the, uh, grouping sets function query. And let’s run this with the columnstore index in place. If we let this go for a little bit, right?

It does, it does go for a little bit still. Uh, it’s already slower than the, the union all method of doing this. Um, this thing will take about 12 or 13 seconds in total. Um, and the, the query plan is different, right?

Like with, with the, um, with the columnstore index in place, you don’t see all, like, you don’t see a lot of the repartition stream stuff, but there is still some real junk in the query plan. Like this whole section here, where, uh, like very early in the plan, we gather streams. That means we end a parallel zone right here and then we restart a parallel zone right here.

But in, inside of that serial zone, right? Where we, it’s like no more parallel and then back to parallel in here, we, we do a stream aggregate. Because this is just something that grouping sets has to do for the global aggregate.

Like I’ve, I tried all sorts of tricks to get rid of this. I tried an option hash group hint. Uh, I tried a query rule off hint to prevent the stream aggregate, but still it’s stuck in there. I don’t want to start getting into all that stuff in this video, but man, this thing was just very, very persistent.

And if we look at the query time stats for the grouping sets one, like we go into properties and we look at query time stats here. Like we were at 1.7 something seconds for the one that, uh, the union all, uh, format and like 13 seconds of CPU time here, we’re up to 30 seconds of CPU time and 10 seconds of elapsed time. So this isn’t like, you know, like when I say like, I don’t see this stuff very often.

I kind of get why, like, it’s not just that the syntax syntax is kind of like weird and difficult to remember, or even like occur to you. If you need to do this sort of thing. Uh, it’s that like, even if you, you, you, you’re like a steadfast remember of syntax, which I am generally not.

I need to refer back to, you know, stuff that I wrote before to remember like half the things that I want to do. Uh, like, even if you are really good at remembering this stuff, like you might try this and just be like, holy cow, this is slow. So let’s, um, uh, let’s, uh, get rid of the, the, uh, the columnstore index that I created.

And I just want to show you that kind of naturally, uh, the union all version of this does a lot better on its own. Just sort of naturally using batch mode. SQL Server is like, I’m not afraid of batch mode here, right?

Like, like, it’s not going to be as good as when we had the columnstore index to read from. So like, if you’re doing this kind of like big analytical aggregation stuff, there’s a, you know, very, very good chance that you’re just going to want to use a columnstore is like the source of your data anyway. But even just like getting some batch mode stuff, like this, this runs pretty well or runs better even without the columnstore index, like then the grouping sets version of the grouping sets version was like 30 seconds or something.

This is, this is like twice as fast without even having columnstore is like the, the source to read from the main thing that slows down is we have to read from the clustered index over and over again. Right? Like we do a lot of stuff in there, but even that’s happening with batch mode on road store. We just don’t have those.

We just don’t have that really nicely compressed columnstore index to really speed things up as we’re reading data. So we still kind of have like this IO bound portion of the query that, um, like, like it doesn’t have the good column source source to read from, but like, it still happens in batch mode. So it’s still an improvement. And for all of these branches, notice that we don’t have like, we don’t have that break where we have like the, the, like, like the serial zone with the stream aggregate.

We don’t have all the repartition streams in here. This all pretty cleanly just runs in batch mode and does everything in batch mode and is a lot faster just naturally. Like I, of course, like I, like I said, I would absolutely prefer to have the columnstore index as a data source for this type of thing. But even without that, even just getting batch mode on rowstore, uh, this ends up a lot faster than the, uh, the grouping sets alternative.

So like when you’re, when you’re writing these types of queries, you know, of course, like what you’re reading from is very important. Um, you know, rowstore indexes, just not, not quite the same jam as columnstore indexes for these big analytical agro ag, aggregative, uh, types of queries. But, uh, like, you know, depending, like often just how you write the query and I’m going to, I’m going to say something unfortunate generally.

In general, the more you type, the better off you are, right? Like, like taking, you know, shortcuts like this, this might, you might like learn this in some like shifty Microsoft DP exam. And, you know, think that you’re, you know, like you’re, you’re the hottest cake on the block, uh, knowing, knowing how to use grouping sets, but, you know, get, get out there in the real world with, you have to do anything practical on like a meaningful set of data, not like worldwide importer works or whatever.

You’re, you’re, you’re going to want to abandon that ship pretty quickly. So anyway, uh, thank you for watching. I hope you learned something.

I hope you enjoyed yourselves. Uh, and I will see you in the next video, uh, which, um, I think this is, this is going to be a Friday video. And then, um, the next one will be an office hours episode. And boy, just, we just have so many exciting things.

So many exciting things coming up, don’t we? All right. Anyway, thank you for watching. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Erik Tests a New Microphone

Erik Tests a New Microphone


Video Summary

In this video, I’m Erik Darling from Darling Data, and I wanted to share a bit of behind-the-scenes content as I test out my new microphone. The old one met its untimely end after countless hours of discussing databases, but fear not—my audio quality remains top-notch! This video serves multiple purposes: it allows me to ensure the sound is clear across various devices so you can enjoy my content without any hiccups, and it’s a direct way for you to support my endeavors. If you’d like to contribute towards better equipment, consider signing up for a channel membership starting at just $4 a month. Your support not only helps me improve but also gives you access to exclusive perks and opportunities to ask questions during office hours episodes.

Additionally, I’m taking this opportunity to promote some of the services and courses I offer. Whether you’re looking to enhance your SQL Server performance with my consulting services or want to dive into T-SQL training, there are plenty of ways to get involved. The pre-sale price for my 24-hour T-SQL course is currently at $250, but don’t miss out on the limited-time offer! And if you’re attending Pass Data Community Summit in Seattle, you’ll receive free access to this companion content with your admission. Mark your calendars for my upcoming live appearances in New York City and Dallas, as well as Utrecht, where I’ll be sharing more insights and expertise.

Full Transcript

Erik Darling here with Darling Data. And if the first thing you’re noticing and admiring in this video is my new clip on mic, my eyes are up here, buddy, then you are a wise and you are a dedicated follower because I was talking the other day about how my old microphone, I was wearing a big headset at the time, my old microphone snapped in half from listening to me talk about databases. Hopefully you will not. snap in half from listening to me talk about databases. But we can consider the entire point of this video is Erik tests a new microphone. And the reason I’m doing this this way is because it’s very easy for me to upload this to YouTube and then listen to it across a variety of devices so I can ensure that I sound okay on a variety of devices. So let’s, let’s, let’s spiel ourselves here. If you would like me to buy a better microphone, if you have complaints about audio quality, well, you can do something about that. You can sign up for a channel membership and for as few as $4 a month, you can, you can support my endeavors to purchase nice microphones. There’s a, there’s a link down in the video description that allows you to do that. If you don’t care about my microphone, well, I don’t know what to tell you. You can like, you can comment, you could subscribe. You can also ask me questions for free privately that I will answer publicly during my office hours episodes.

Isn’t that nice? Isn’t that nice? These nice things I do for you. If you would really like me to get a new microphone, you can hire me as a consultant. It’s a great way for me to buy new microphones. And I am available to perform all sorts of miracles upon your SQL servers, health checks, performance analysis, hands-on tuning, dealing with performance emergencies and training your developers. So whenceforth you have no more performance emergencies. I do all of these things. And as always, my rates are reasonable. Anyway, uh, if you would like to buy some performance tuning training from me, I’ve got 24 whole entire hours of it. Uh, if you, uh, there’s a link down here that assembles all this stuff for you. But of course, uh, you, you, you can get the everything bundle there, uh, for a hundred, about 150 bucks with a 75% off code. And that will last you for life. No subscription required. Um, if you want to get in on my new T-SQL course. While it is at the pre-sale price of 250 US dollars, uh, you can do that now, uh, videos are dropping and being recorded. I’m going to go do some after I do this. Uh, and, uh, the, the price, once the advanced material is fully published after the summer, will go up to $500.

So please do save yourself 250 bucks unless you’re really itching to donate to the mic fund. Uh, uh, this is all companion content to the pre-cons that Kendra Little and I are teaching at Pass Data Community Summit in Seattle, November 17th to 21st. So if you are attending Pass and hopefully chosen wisely and you’re attending our pre-cons, well, guess what?

Uh, you, you will get free access to that, this companion content with, with your, with your admission there. Uh, if you would like to see me live and in person, Red Gate, Red Gate’s little road show. I’m being taken on tour sort of around the world, limited world tour, I guess.

Uh, some small clubs and venues, uh, New York City, August 18th and 19th, Dallas, September 15th and 16th. That’s the one in Texas and, uh, Utrecht, the one near Amsterdam and the, and the Netherlands. Uh, it’s a Hamlet and it’s beautiful.

Uh, October 1st and 2nd, but let’s, uh, let’s do a, a, a short video here for me to test my microphone with. All right. So, uh, a lot of the times when I’m teaching about dynamic SQL, I like to say things like, if you want to properly parameterize your dynamic SQL, you have to use SP execute SQL and you have to feed it some parameters and you have to feed it some, some values to, to, to substitute those parameters with.

And that, that’s a great way to avoid SQL injection because everyone should be trying to avoid SQL injection. It’s unpleasant, gets, gets you fired. It might get your company to go out of business.

There are all sorts of terrible things that can happen when you are SQL injected. So, uh, I, I do, I do recommend avoiding that. Askew SQL injection at all costs.

But there is kind of one funny circumstance where you can not use SP execute SQL and you can still somewhat parameterize your dynamic SQL. So we, what I’m going to do is, well, what I’ve done already is, uh, this, this only works with linked servers, at least that I’ve ever seen. Um, there might be some really insane work around where it works otherwise, but Hey, what do I know?

Uh, I know T SQL with Eric. It’s apparently what I know, but I’ve already added a loopback linked server to my very own server. Uh, I am, uh, for the, for the most part, SQL Server monogamous.

So, uh, I, I only work with one version in addition of SQL Server at a time though. I guess, I guess, I guess these days I am philandering a bit with SQL Server 2025, but, uh, we don’t, we don’t need to talk about that publicly. We can save myself a little shame and embarrassment here, but this linked server will allow me to do something kind of funny.

This linked server will allow me to, uh, declare some SQL and I’ve, I’ve got, uh, I’ve got my, my string that I want to execute, uh, being set and assigned right here between these two things. And I’ve got this, uh, ID local variable set to eight. Now, uh, in my where clause in my dynamic SQL, I’ve got a question mark, right?

And it’s sort of like that crappy store procedure, SPMS for each DB where the question mark is the database name. And you have to say like use bracket question mark thing to, to get into different databases. It’s kind of like that.

But what I can do is I can use, uh, I can use the, the less safe version of executing dynamic SQL. And I can say execute, uh, at SQL, and I can pass in the ID as a second thing. And that second thing is going to act as a parameter replacement for that question mark.

But I have to use exec at, I don’t know why Zoomit has forsaken me like that, but I have to, I have to execute this at the linked server. And when I do that and, and I, you know, go through great pains to use the right database and everything. But when I, when I run all this, uh, this server talks to itself and it returns the thing that I wanted, which is post type ID eight from the post types table.

So anyway, thank you for joining me for Eric tests, a new microphone. I hope you enjoyed yourselves. I hope you learned something.

I hope you like my new microphone because, uh, I don’t know what to get if this, if you don’t like it, perhaps, perhaps you can recommend one to me. Um, it just, it has to plug into one of these things and it has to have a very special connector like this. So if you’ve got recommendations along those lines, um, well, I’m, I’m two ears and one mouth, but that, that, that expression doesn’t really, doesn’t really resonate much.

Does it anyway? Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Spurious Left Join Logic

Spurious Left Join Logic


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Stubborn NOT EXISTS Ordering

Stubborn NOT EXISTS Ordering


Video Summary

In this video, I delve into the fascinating world of stubborn not exists ordering in SQL Server queries. You might be wondering why I’m wearing a headset that’s making me look like an air traffic controller—well, it’s because my fancy wireless microphone setup broke, and I’m waiting for replacements to arrive. This quirky situation has led to some humorous moments, but it hasn’t stopped us from diving deep into SQL Server optimization techniques. I use dynamic SQL and XQuery to generate random not exists predicates and analyze their execution order in the query plan. By running this process multiple times, we can observe how the optimizer handles these clauses and whether reordering them might improve performance. So, grab your headphones (if you have any) and join me as we explore this intriguing aspect of SQL Server’s query optimization!

Full Transcript

Erik Darling here with Darling Data, and you might be wondering why I’m wearing a headset. Well, it’s a funny story. It turns out when you spend like 800 bucks on a fancy wireless microphone setup, this thing is fine, but the microphone that plugged in here, they don’t send you a backup of one of those. And it turns out that the little wire thing that you usually saw on my shirt about right here is very fragile. And if you try to adjust it, it’ll just snap. And so I’m waiting for my replacement ones to show up. Unfortunately, my scheduled YouTube videos are going to run out before they arrive tomorrow. So we’re going to do a few of these with the old headset, and I’m going to look like a goofy air traffic controller. While I talk to you about SQL Server. So I hope you I hope you can survive these trying times, my friends, where Erik Darling is wearing a headset, because I’m having a tough time with it. Honestly, I feel like I look stupid in this thing. Anyway, let’s talk in this video about stubborn not exists ordering. And by stubborn not exists ordering, well, of course, what I mean is the order that like SQL Server’s optimizer is famous.

For being able to take a query and do all sorts of stuff with it, play all sorts of tricks, like do fun things to like reorder joins. But one thing that it doesn’t do. At least that I can I can’t get it to ever do it is reorder the order of not exists predicates, or not exist sub queries, whatever you want to call them in a query. And I think it’s it’s kind of amusing the way that it doesn’t do this. If there’s a message here, it’s that if you’re going to write a query where there are multiple not exist checks, you may want to spend a little time figuring out if the order of them improves query performance at all.

I’m gonna I’m gonna I’m gonna give you a little spoiler here for our query, it doesn’t make much of a difference. It’s gonna run in two to three seconds anyway, because I did I did many things right along the way. But for you out there and real real user space land, it might it might actually make a difference. So before we do that, haha, we need to shh feel ourselves, sh feel all over ourselves. If you appreciate this SQL Server content, even when I’m wearing a headset, which is I know is hard.

You can sign up to become a member of the channel. And you can for as few as $4 a month, support a starving SQL Server consultant. If you if you are also starving for some reason, maybe AI took your job on already, I don’t know. You can do all sorts of fun stuff to help this channel thrive and survive. And we won’t put food in my stomach, but it’ll put a smile on my face. You can like you can comment, you can subscribe. And if you want to ask questions privately that I will answer publicly during my office hours episodes, you can do that at this link. All of the things that I talked about are very conveniently linked for you down in the video description. You don’t even have to think much about it. You can just click randomly on things until something works. Just just like with SQL Server. It’s a good time. If you need consulting from a guy in a headset.

I am available as a SQL Server consultant, not just not just a pretty face on YouTube, I could be a pretty face on a zoom call for you to health checks performance analysis hands on tuning of whatever you need tuned. Fixing your SQL Server is a SQL Server is a SQL Server. Fixing your SQL Server performance emergencies and of course training your developers so that you don’t have those emergencies anymore.

It’ll be a nice good time for you. I promise everything will go your way. You will you will you will be endowed with the the luck of the luckiest civilization out there. If you would like to get some training from me, I have 24 hours of performance tuning training.

You can get it all for about 150 US dollar rules. No tariffs on that. I promise we’re staying tariff free here in the data darling world. We go to that link you put in that discount code and you’ll get the everything bundle for $150.

It’s nice. I also have a new T-SQL course that is currently publishing as we speak. I have finished the read query portion and the modification query portion. Next up will be isolation levels and then programmability.

So it is half done. At least the beginner portion of it is the advanced portion will be after come out after the summer. The pre-sale price until all the advanced material is out is $250. If you will go up to $500 once the course content is fully fully released.

Just so you know, if you are attending past data community summit in Seattle this November, and you are attending the pre-cons that Kendra Little and I are doing on T-SQL. This is companion material to that content.

So if you attend the pre-cons, you will get access to this content as part of your admission to those. Speaking of speaking. Ho-wee.

Boy, am I my arms tired. Pass is going on tour. It’s the Red Gate Roadshow and they have cordially invited me to go to all of these things. New York City, August 18th and 19th.

Dallas, September 15th and 16th. And Utrecht, which is a hamlet near Amsterdam, October 1st and 2nd. And of course, this is all leading up to past data community summit taking place in lovely Seattle, Washington, November 17th to 21st, where Microsoft has indefinitely canceled their build conferences and has relocated them to sunny Las Vegas.

So, I don’t know, maybe, maybe pass a move to Vegas too. Who knows? Anyway, with that out of the way, what I want to show you is like, like, A, I’m going to prove my point.

But B, I’m going to show you how I prove points like this to myself. It’s often quite a process. It’s quite an endeavor, quite a chore to do these things.

But they help me. They help me understand things and they help me see things how they really are. So, what I did in order to sort of prove this out was I used my best friend, Dynamic SQL, and my other best friend, XQuery.

And what I did was I built up Dynamic SQL in a way that I could grab the query plan for the query that ran, have that query generate not exist clauses in random orders, and then give, like, show me the query that ran in the order that the clustered index things happened in on which tables, so that I could match up the order of the not exist subquery clause predicates, and the order that things happened in the query plan.

It was all very tedious. But when I got it right, it was very exciting. So, I’m going to walk you through the Dynamic SQL portion, and then I’m going to run this a few times, and then I’m going to show you the results.

All right? Cool. I hope you enjoy it. So, we have some variables up here, some local variables. These are not formal variables that we’re going to use to hold various things. This will hold the Dynamic SQL we’re going to execute.

This is going to hold the parameters for the Dynamic SQL. This is going to hold an output parameter for the Dynamic SQL. Why Zoomit just dissed me like that live on YouTube? I don’t know.

These are the parameters that we are going to pass into the Dynamic SQL. This is going to, again, this is going to be the output thing for this, and this is going to be the output thing for this down here.

Maybe I could have organized those slightly better. And we’re going to use this replacement thing here. This replacement thing is going to come in handy in a moment, which we’ll get to it. Just remember, there is a local variable called replacement.

This is going to be our general Dynamic SQL setup. We are going to set this every time, and we have this sort of, we have this thing over here that says replacement.

This is going to act as our token. This is the thing in this batch that we are going to replace with our varying not exists predicates. We also have this lovely piece of SQL in here, and this lovely piece of SQL is going to get the execution plan for the query that’s for the session that’s running here, which works.

I assure you, this is correct. As crazy as it looks and sounds, this is correct. So what we do is we set that, remember that replacement variable I told you that was very, very important.

We use this replacement variable and we set this, and we’re going to use this brand new, brand spanking new, fresh off the factory lot string ag function that came out in SQL Server 2017. And we’re going to use this instead of more XML.

There’s this XML down below. So don’t worry, all of your fetishes will be satisfied. And we are going to aggregate this column with an empty space. And we are going to say within group order by V.O.

You’re probably asking yourselves, what are V.C and V.O? Those are fantastic thoughts to have. And they are wonderful questions to have answered.

So this values clause generates two columns. One of them is this nchar10 with a not exist. And you’ll notice that each one of these not exists has a different table in it.

We have badges, comments, posts, and votes. And then we have this other thing that it generates for new ID. New ID is how we get the random ordering.

So this values clause, which is aliased as V, has the C column, which is where we’re holding the nchar10 and not exists. And then the O is where we’re holding the new ID.

So when we run this all together, we’re going to concatenate all of these not exist clauses ordered by the new IDs that get generated in here. And we are going to get randomly ordered not exists subquery clause predicates.

Okay, cool. After that, we are going to replace in the in the dynamic SQL portion, the replacement token with that string that we just generated and assigned to the replacement variable.

We’re going to execute our query and we are going to output the C and the query plan. All right. The C up here is the C is just, of course, the result of the count big.

We don’t actually do anything with this. We just throw it away. But the query plan, we do something very exciting with. You’re ready for some XML.

You look like you’re ready for some XML. So I’m going to give you some XML here, some X query for you. We’re going to print the query that runs. All right.

And then we’re going to delete some XML. Have you ever seen this before? Have you ever seen a delete from XML? We’re going to use this thing. And we’re going to.

So back story on why this is necessary. So we’re going to do a query. We’re going to do a query. When the query up here runs, there are two query plans that get generated. There’s a query plan for this query, which has the not exist stuff in it. And then there’s a query plan for getting the query plan.

Couldn’t make this stuff up. Right? Could not make these things up. So what we need to do is we need to preserve the first query plan and delete the last query plan. And apparently this works to do that.

Isn’t that insane? The second thing we need to do with XML is shred our query plan. Remember that little, that, that output thing called query plan that we, that we assigned the query plan to? Well, we’ve got something interesting to do here.

We have to select from a variable, the XML nodes for all of the clustered index scans. What we’re going to pull out of that is the node ID, which is going to tell us the order that things happened in the query. The table name that, that the thing that, so we have the node ID, which is the order that things happened in.

We have the table name, which is the thing that got clustered index scanned. And then I’m also going to select the query plan here so that we can validate our, our results. Okay.

All good. This is great. This is wonderful. I am very happy with all of this. We’re not going to do anything with this. This was me just making sure that I was, I was right about things. Uh, I ran this quite a bit to see if I could find any like real big outliers that would be like, Hey, look, this is a big performance thing, but there, there really weren’t any with, with, with, uh, with these.

Anyway, uh, what we need to do now is come back up to the top and we’re going to run set no count on once so that we connect to, and look at, look how nice this is. Look how nice this new SQL Server management studio 21 connect dialogue. We have this futuristic thing.

We can tell, we can even tell it what database we want to connect into, and we can do that. No one look at my password. Of course, this is highly, highly confidential information. Don’t look at the password, but the, we can tell it exactly which database to go to. So we, I don’t actually accidentally connect to master and hit an error the first time.

Fantastic. Thanks, Aaron. Anyway, we are all connected there. And now let’s run this once and let’s just see what happens. Uh, my, my, my VM may have restarted last night.

So this might, this first run might take a couple of seconds because we might have to read some stuff, uh, from, from disk into memory. Remember kids reading stuff from disk into memory is terribly slow. Keep your data in memory and your queries will always be faster.

So, uh, what we have now that we are finished, now that we’re down here, right? We want, we don’t know what, we don’t want to hang around up there. We want to hang out down here because our query, this is what the, this is the query that ran and gave us some stuff.

So we have the node ID, which, which is going to, and I’m going to prove all this out to you. We have the node ID. This is the order that tables were accessed in.

We have the table name and here’s the query plan. Over in the messages tab, we have the query that ran. So all of these queries start with the users table, right? So in all of these users ends up being first.

Then we have our not exist predicate clause query plan predicate subqueries. And we see the order that these happened in here. So we have badges, comments, votes, posts.

All right. B, C, V, P badges, comments, votes, posts. If we come back over to our results, we have users first, because that, that was, that was the from clause. And then we have badges, comments, votes, posts.

Okay. Well, do we know if these, how do we know these node IDs aren’t lying to us? Well, let’s look at the query plan. We have users.

Ta-da. We have badges. Ta-da. We have comments. Ta-da. We have votes. Ta-da. And we have posts.

Again, ta-da. If we hover over any of these, we’ll see the node ID down here in this little tooltip. So this is node ID 14. This is node ID number 12.

This is node ID number 10. And this is node ID number eight. Ah, we got it.

And this is node ID number six. All right. So I think, like, now that we’ve kind of proven that the setup, this is a valid test. Well, here’s our six, eight, 10, 12, 14, right?

Just like we saw up there. Let’s run this a few times and see stuff in some different orders. So now we got a completely different order on this one. Now we have posts, badges, votes, and comments.

And now these are no longer even numbers. Now these are odd numbers. We have seven, nine, 11, and 13. So remember, users, posts, badges, votes, comments. If you look at over here, here’s, oh, go away thing.

Here we have users, posts, badges, votes, comments. So the order matches again. If we keep running this, and I’ve done this hundreds and hundreds of times, like the number of loops I’ve written around this to do this over and over again is absurd.

And if we, every time we validate this output, here’s users, votes, posts, badges, comments. So VPBC over here. What do we have in the messages tab?

V, P, B, C. Every single time. So coming back to the main point, when you are writing queries with not exists, SQL Server, SQL Server’s optimizer will do, let’s just, I won’t say nothing.

Cause who knows, maybe like it almost does something, but then changes its mind and is just like, nah, I don’t think so. Hmm.

It’s not for me. SQL Server’s optimizer does not really make any attempt to reorder not exists predicates. So when you are writing queries that have not exists in them, some tables might be cheaper to access and do a not exists from, and you might be able to narrow rows down than other tables.

So always be very, very careful and cautious. The order that you write your not exists is in because you might be able to get your queries to run much faster just by reordering not exists is because SQL Server will be able to do it for you.

Anyway, that’s all I have for this one. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where I will still be wearing this atrocious headset until my little clippy thing shows up.

It’s not an actual clip, like a tie clip or like a clippy, like the paper clip guy. It’s just a little like, like clippy microphone thing, which I don’t know. I might start wearing for formal occasions as well with my Adidas tuxedo, because that’s how I roll.

All right. Anyway, I guess that’s good for this one. Again, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you over in the next video. All right.

It’s magic. Ta-da! Ta-da!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

My Upcoming Speaking Schedule

Busy Summer


The nice folks at Red Gate have decided to put me to work.

That means I’m going on tour, and maybe getting some socks and a Hawaiian shirt.

No word on a “Lego Erik” yet.

PASS On Tour Events:

PASS Data Community Summit:

Of course, Kendra Little and I are back in action to teach back-t0-back T-SQL precons.

See you out there!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 20

SQL Server Performance Office Hours Episode 20



To ask your questions, head over here.

Hello Mr. Erik, I’ve attended your tuning course inside a theater in Seattle years ago! If I may ask a question about the future of SQL. I heard in the past that Joe Sack left MS to MongoDB and now he has returned. That genius guy still working with improvements of sql whatever “onprem” or azure ?
If you don’t mind, could you tell us more about the stories behind your tattoos? What do they represent, and how many do you have?
Without parameterized queries, how would you suggest to decide which queries to tune?
Hi Erik! Are your educator skills just natural talent or do you have any good sources for improving that?
Give me the case against partitioned views.

Video Summary

In this video, I dive into the world of Office Hours with Darling Data, where we tackle a variety of SQL Server questions from viewers. We start off by addressing the future of SQL and Joe Sack’s recent career moves, which sparked some interesting discussions. Then, I share insights on deciding which queries to tune without parameterized queries, introducing a new feature in SP Quickie Store that helps analyze query hash totals. The conversation continues with personal questions about my tattoos and teaching skills, offering unique perspectives on how these aspects of life have influenced my work. Finally, we wrap up by discussing partition views, sharing both the benefits and challenges they present. It’s always great to engage directly with our community and hear from you all!

Full Transcript

Erik Darling here with Darling Data. And we are once again greeted with the background. So we are once again doing Office Hours! Kaboom! If you would like to ask your own questions for Office Hours, this is the link to do it. It’s down in the video description. Likewise, if you would like to support my channel and give me money to keep talking, if there were a way for you to pay me to stop talking, perhaps there would be more generosity from the greater public. I am I am open to that. I can be bought. I’m not beyond that. My morals and my ethics do not extend that far. So if you would like to pay me to keep talking, you can do that. If you would like to pay me to stop talking, shoot me an email. We can work something out. If you like this channel but not in a money way, you can like, you can comment, you can subscribe. And if you would like to hire a consultant to do SQL Server stuff because you’re having trouble with SQL Server stuff, guess what? This total package here does SQL Server stuff. Health checks, performance analysis, hands-on tuning, dealing with SQL Server performance emergencies, and of course, training your developers so that you have fewer SQL Server performance emergencies. Right down to zero SQL Server performance emergencies. We can, I can do all those things. Not we, I. There’s only me here. You only get this face. There’s no substitute face that shows up. Doesn’t know who you are or what you are.

If you would like to get my performance tuning training material, you can get all 24 hours of it for about 150 USD for life. Again, link video description. My new T-SQL course, which I finally fixed this slide for. Videos will start dropping in June. You, of course, get the pre-sale price until the advanced material shows up after the summer. This is companion material to the pre-cons that Kendra Little and I are doing in Seattle this November. So if you are attending those, you get access to this material at the price of the pre-cons. If you go to pass and you don’t come to the pre-cons, I don’t care. Right? Like, you have to show up for me and Kendra for me to care. If you would, again, speak more about the live and in-person stuff. Pass on tour. Boy, this is going to be fun. New York, Dallas, and Amsterdam. August, September, October. I will be at all three of them. And of course, I will be at Pass Data Community Summit in Seattle this November. doing the aforementioned pre-cons. So we will have a grand time with that, won’t we? But with that out of the way, let’s do these office hours questions. Let’s zoom, zoom, zoom, zoom, zoom, zoom, zoom, zoom. What do we have here?

Hello, Mr. Eric. Hello, you. I’m not sure how to address you. I’ve attended your tuning course inside a theater in Seattle years ago. A theater, you say. If I may ask a question about the future of SQL, as much as I am not a psychic, I’ll do my best. I heard in the past that Joe Sack left MS to MongoDB and now he has returned. That genius guy is still working with improvements of SQL Server, whatever, on-prem or Azure. So he did come back.

Joe Sack did go to MongoDB for about a year or so. And then he was back at Microsoft. And he was back at Microsoft for about a year and a half. And, you know, it would be inappropriate of me to comment on Joe’s situation at Microsoft. But Joe was sort of unhappy with what his role had morphed into. And so Joe went to work at another database company called Elasticsearch.

So Joe is now some sort of head honcho, not sure what a head honcho is, over at Elasticsearch. Doing a great job there, kicking butt. They are very, very lucky to have him. You know, I miss him dearly working on SQL Server, but it was not meant to be.

All right. Ooh, a personal question. Look at us go here. If you don’t mind, could you tell us more about the stories behind your tattoos? What do they represent and how many do you have?

Well, the stock answer that I have when someone asks me how many tattoos I have is all of them. Because I’m pretty well covered. But the thing where I depart on having stories behind my tattoos is that I got nothing.

All right. Like really, most of them mean absolutely nothing to me. There’s no story. There’s no meaning. There’s no like heartfelt life event that led to me getting them. Like I got a couple like wife and kid name tattoos, but those are just sort of like if I didn’t get them, like they’d be mad.

Right. That’s about it. It’s just, you know, I learned I rather I figured out at a very young age that I was the type of gentleman who wanted the who wanted attention from the type of lady who had a lot of tattoos. And I realized that the best way to get that attention was to get tattoos. And I was lucky enough to make friends with some tattoo artists, like especially like, you know, friends of mine, like lifelong friends of mine who are like starting out getting tattoos, who have gone on to like be really good at tattoos and like own tattoo shops.

But I have all their like starter work. So I have a lot of really old tattoos right now that they were just like, hey, I want to do this tattoo to practice. Can I do it on you? And I was like, dope. I’ll buy burritos.

So most of these tattoos have no real meaning, no real story, maybe just a thing that I kind of liked at the time where the tattoo artist was like, oh, I want to do this Japanese thing today. I’m like, I don’t have any Japanese stuff. Let’s do it. So like, you know, I just got covered with a lot of stuff that means nothing to me very quickly.

And guess what? It worked. May not be the most thoughtful thing in the world, but it was highly effective and painful, painful and effective. So anyway, well, we’re on the subject of pain.

Without parameterized queries, how would you suggest to decide which queries to tune? So this is actually a neat question because I recently added, well, I don’t know, I’m actually, because of how busy I’ve been recently, I have no concept of like actual time. Months have gone by where I’m like, where are we?

But I sort of recently added a new parameter to SP Quickie Store to help you decide if queries are worth tuning. And that parameter is called include query hash totals. There are underscores in there because I like putting underscores in things.

To make them readable. I don’t like the uppercase, lowercase thing. It makes me feel cramped and crowded and it gets me all claustrophobic feeling. But so I added this parameter because like, like what I would find is like I would run Quickie Store.

And like there would be this whole like list of queries that looked similar, but would have like one or like just a couple, maybe like five, six executions. And people would be like, this doesn’t run that much. I don’t want to spend time on it.

So I put this include query hash totals parameter in. And what that does is it looks at the query hash. So like if you have unparameterized queries that are effectively running the same thing over and over and over again. Right.

It’s the same query. You just have like different dates or like a different name or a different number of in clause things. All the stuff that like kind of like gets you like the same query hash, but like with different like with what gets you like the same query plan. No, it’s not the same query plan.

Gets you different query plans, but like the same query hash. Like the text of the query ends up all the same. Like it gets, it counts all that up and it gives you totals for like CPU and duration and like executions and all that stuff. All like the other metrics that are in the in QuickieStore’s output for the query at the query hash level.

So like you might find a very sneaky query that looks like it’s only executed once, but the query hash tells you otherwise. If you look at the query hash, it’s like, wow, this thing actually executed like 7,000 times. We just only see this one, like we just see this one example of it in the output.

So like when I, when queries aren’t parameterized, now that’s the tool that, or rather that’s the option that I use in my tooling when I want to figure out if something is worth going after. It’s still worth using QuickieStore and it’s still worth figuring out like, like if the query is meaningful to the workload, like other ways you can do that with QuickieStore. There’s a parameter in there called workdays.

And I like the workdays parameter because like it’ll just look at stuff that’s run Monday through Friday, but then by default nine to five. But there are two other parameters you can use to change the span of hours. But what’s nice about that is that like you, you screen out automatically all the like overnight processes that you might not care about.

If you like, if you want to focus on the overnight processes, go ahead and say, set the start and end date or whatever, or like, like use workdays, but go in reverse. Like with the, with the times, with the timestamps. But like, so I would probably use some combination of those things to do that.

Okay. Hey, look, another sort of personal question here. Look at that. Are your, hi, Eric.

Hi, you. Nice to meet you. Are your educator skills just natural talent? Oh boy. I don’t know if I call it that. Or do you have any good sources for improving that? Uh, so I have no good source for improving that. Um, any, any, any ability I have as an educator comes from being dumb.

Right. I’m not, I am not a naturally smart person. Things don’t come quickly to me. I’m like one of those like old CD burners that goes at like one X, like it, like it, it burns slow, but it burns deep.

Right. So I don’t learn things very quickly. And it like, so like when I need to learn something, like I need to like really break it down in my head a lot further than people who get things a lot more intuitively. Like people who are much smarter or more clever, whatever it is.

Like, like they, they like, it’s like, oh, like they just look at something like, oh yeah, I get it. Me. I’m like, no, no, no. I need like, no. Why does this thing go from here to there? Like I don’t get things that quickly.

So I’m good at teaching people because I’m dumb. Right. Like it takes a lot for me to learn something. And by the time I’ve learned, I’ve learned something, I feel like I’ve learned it very well and in very small pieces so that like when I have to tell someone else about it, like all those small pieces are just like burned into my, my CD brain. Right.

My CD brain. So I like, if it’s, if it’s any, if there’s anything natural, it’s cause I’m spinning slow up here. All right.

Give me the case against partition views. I don’t really have one. I don’t really have one. My only, the only real rot that I’ve found with partition views is trying to get things right so that they’re updatable. That’s a damn nightmare.

That is not a good time. I don’t suggest that. It doesn’t fit a lot of situations. So if like you want parts, if you want to use partition views, I say, go for it. Just, you know, create your, make sure that you get your constraints right and make sure that whatever needs to like, you know, like whatever view needs to be refreshed to get new data in there is running at the right like intervals and stuff.

And you’re pretty good. You know, I like partition views like, like usually quite a bit better than like capital P partitioning, just because like I can index stuff differently. If there’s like, you know, if I add new columns or remove columns, it’s easy enough to like fix that in the view definition.

When you have like proper constraints on the tables to tell a SQL Server what data lives where, you know, you can get like pretty like clean execution plans from it. So I really don’t have much against partition views aside from like trying to get them to be writable, which again, that kept me up for, that kept me up for a while trying to, trying to like get a good demo where like to where they were writable. And I, I bailed on it.

It was just, it was too much. It was too many things, too many things went wrong and happened that I, I did not like nor love. All right. Anyway, I believe that is, that is five questions.

One, two, three, four. Yeah, that’s five. I can, I can count to five. Most, I, I, I, I, I credit most of my ability to count to five from, from barbell training because doing sets of five really does get you good at counting to five. All right.

So anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video, video, video. So thanks for submitting questions. Submit some more.

All right. That’s not a very good sales pitch, is it? All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Common Table Expression Mediocrity

Learn T-SQL With Erik: Common Table Expression Mediocrity


Video Summary

In this video, I delve into the world of Common Table Expressions (CTEs) in SQL Server, often highlighting their perceived benefits and the reality of their implementation. I start by addressing a common misconception—that CTEs make queries more readable. Through examples, I demonstrate how what might seem like a well-structured query can actually be a disaster, leading to errors when executed. The video then transitions into an in-depth look at SQL Server’s handling of CTEs, emphasizing the lack of materialization and the potential for performance issues due to repeated execution. I provide practical advice on when and how to use CTEs effectively, suggesting that in many cases, dumping their results into temporary tables can significantly improve query performance. Throughout the video, I share my frustration with SQL Server’s limitations and the workaround nature of using CTEs, ultimately aiming to help viewers navigate these challenges more confidently.

Full Transcript

Erik Darling here with Darling Data. In today’s video we’re going to get back to the T-SQL learning material. This is of course the beginner stuff, so if you fancy yourself far beyond the beginner realm of learning in this area, you can feel free to go twiddle your thumbs elsewhere. But we’re going to talk about CTE today. And this is one of my favorite things to talk about. because I like watching the balloons deflate when we start talking about just how mediocre SQL Server’s implementation of CTE is. You know, instead of getting some basic useful database functionality, we get giant monolithic failures like fabric foisted upon us and I don’t know, I guess let the layoffs continue. All right, good job all around. So one of the first things that every LLM generated idiot on the internet likes to say about CTE is how they make queries more readable. Oh, they’re so readable. Look how readable the query is with the CTE. My goodness. Let’s finger point, rocket ship, green check, fire emoji our way to fame and fortune. One wonders if these people have a lot of an assistant to remind them to have an assistant to remind them to breathe. But who knows. So here’s a query that is a CTE. We can identify quite easily that it is a CTE because it starts with with. It does that. It doesn’t need a semicolon, does it? But this query is completely illegible. There is nothing readable, understandable, or even tolerable about this query. It is a disaster. If someone sent this query to me, I would throw them out a window. If you don’t have respect for me, that’s one thing, but at least have some self-respect when you write these things. Even better, is if we attempt to run this query, we will get an error.

And, you know, the error is very clear. But where we would go about remedying this error is not terribly clear based on this syntax, is it? So please format your queries and they will be readable. If one does not format their queries, they will not be readable no matter how many CTE one dispenses with. So with that out of the way, let’s talk about SQL Server. So with that out of the way, let’s talk about SQL Server’s utter mediocrity with CTE. Now, the big thing is that even though the word table is in the name, the result of your query is only tabular. It is not materialized to a table. Nothing, there is nothing stable about the result of your query. There are all sorts of things that can come up, especially when one starts pondering isolation levels and the timing of operations in a query plan that might, they could set one’s head on fire if one spent too long.

So it’s quite a dizzying array of issues that you could run into. But that’s getting a little bit in front of things. That’s getting a little bit in front of things. The main thing is that the query is not materialized. Even if you put a top in there or something, which does provide some like logical fencing of stuff. If you watch the unnesting video that I recorded a few days ago, you’ll see that I use top in there to prevent some unnesting. In a similar way, you could do that with a CTE.

However, that result is still not materialized. And what I mean by that is that every time you reference the CTE, generally in the outer scope of the query, and by outer scope, I mean after the CTE has been run, every time you do that you have to run the query inside of the CTE. An easy way to see what I mean by that is by just getting an estimated plan for this one, where there’s one reference to the CTE, and there’s only one time in the query plan when we touch the users table, and where we generate some query plan operators to create our row number.

We’re calling the row number function here. And there’s only one filter to remove any rows where row number is not between 1 and 100. We have to contrast that a little bit with a slightly different query, where our from clause now joins the CTE, c1 to itself. They, of course, have different aliases because, you know, you can’t alias the same thing the same way twice. You just get an error and say, hey, you already did that.

No need to do that again. So if we now get the estimated plan for this query, we will see that we have now two copies of the CTE being executed, or being referenced, right, where the query gets executed. We touch the user’s table twice. We do all the stuff that we need to generate the row number twice in each of these query plans, right?

We have a lot of things that repeat in here, and we have two filter expressions, one for each time that we filter on the row numbers down here in the where clause outside of the CTE. And this is a very general pattern that you will see over and over again if you are the type of person who uses CTE and then re-references that CTE multiple times in the resulting query. So be very careful with this. If you’re going to do this sort of thing, I mean, you know, if the thing in your CTE is small and compact and easy enough to run, you might never have a problem with it.

But if you find your queries that exhibit this pattern slowing down considerably, strongly consider dumping the result of your CTE into a pound sign or hash sign temp table, and then using that temp table in your outer query instead. So that’s what it did. Now, it’s not just the re-referencing the lack of materialization that this can cause an issue with.

Quite often, even if you have every CTE, like if you stack CTE together, you have like a whole chain of them, and you have a different query in each one. And then in the outer query, you only talk to each CTE once, but you like join the results together in some way, either with a traditional join or like an in sub query or not in or exists or not exists or anything like that.

Cardinality estimation gets very, very difficult when you start combining all those things together. The reason for that is, you know, cardinality estimation can be difficult enough. If you think about a query plan, if you read it from left to right, you of course sort of get the logical flow of the query and like, you know, how things like got to where they needed to get to.

But if you start at the outer edge of the query, by that I mean like the stuff that’s kind of behind my big head, like these things over here. This is like the outer edge of the query plan. And like, you know, cardinality estimation can be tough there if you have a rather complex set of predicates against the table or just confusing like weird or stuff going on over and over again. This is like the outer edge of the query plan here. So like cardinality estimation here might be okay.

But as you start moving across the plan, when you start attempting to join complex expressions together, like as you get like, like deeper in like, or rather like, like further to the left in your query plans. That’s where cardinality estimation generally tends to fall apart. And that’s where materializing results into temp tables can be very valuable. Because now SQL Server, like even if it messes up cardinality estimation completely in the query that populates the temp table, once that result is materialized, SQL Server has every opportunity to better cardinality estimation with that, like physically materialized result.

So like, like, like, like when you see query plans, and like SQL Server’s optimized result cost-based, right? Figures out like, like along the way, it figures out all these different plan shapes and candidate plans and, you know, substitutes different operators to do different things and reorders joins and all sorts of crazy mathy stuff. But like, like, like, like, like, SQL Server is trying out these candidate plans, you’re going to see a lot of like weird Franken plan mix and match stuff where it’s like, oh, this is cheap.

Oh, now put in this cheap part. Okay, now put in this other cheap part. So like cardinality estimation can get really, really wonky and big, complicated plans, because all of a sudden, they’re sort of like these stitched together cost based choices. And things can really start misaligning. So like, it’s not just the lack of materialization with the CTE that can be painful, even if you don’t like, even if you don’t re-reference a CTE, string together a whole bunch of complicated ones, that can, that can also just make life weird. So like, like, I tend to avoid that as much as possible. A rather uncomplicated example, and this is not like a bad cardinality estimation example, this is just to like show you that sometimes like not, you don’t always, the lack of materialization doesn’t always come back to hurt you with the query being rerun.

This is just a simple stacked CTE where, you know, we have C1 here, and we run the query in here. And then down below it, we have C2. And we, when we reference, well, we definitely reference C1 here, and then select from C2 in the outer part. But what’s nice, like, C2 doesn’t mess this up. This doesn’t end up, this doesn’t, like, stacking CTE doesn’t result in the user’s table being hit twice, or the query inside the initial CTE getting executed twice. We only have that once in the stacked sort of thing there, the stacked CTE list here, where things, and actually, something that I think is kind of nice about this one, is you’ll notice that, like, in here, we generate our row number, right? And down here, we filter on that row number between one and 1000. But then in the outermost query, the outermost scope, we filter to where the row number is between 200 and 500, right? Which is a narrower sort of narrower set of, narrower range of values than 100 and 1000.

SQL Server only chooses to filter once, like, we don’t have an intermediate filter, and then a secondary filter, SQL Server just does one filter to where it’s between 200 and 500. So the optimizer does some work and just kind of like throws this portion out, it just says, we don’t actually need you to do anything, because like, you’re not really, like, there’s like no benefit to this. If we did something where, like, this was between 1 and 50, and this was between, like, 10 and 30, then it would filter twice, or like, then it would just filter out here.

So where CTE generally become useful is when you do things that are disallowed by T-SQL, like, just, like, on their own in a single query. One of the, probably the most, like, useful common example would be, like, deleting a top number of rows in an ordered way, right? So if we wanted to delete from this table, based on this where clause, and we wanted to order it by something, notice that we have a little red squiggle here, right?

SQL Server is like, like, IntelliSense is already telling us, hmm, I don’t know about this one. I don’t think that’s going to fly. And if we try to get an estimated plan for this, it would just say incorrect syntax near the keyword order.

Right? It doesn’t tell you, like, hey, you can’t do that. Like, you’re just going to sit there and stare at this query and be like, how, where, there’s no, there’s nothing wrong. If I run this part, I get a query plan.

But if I try to order by here, I don’t get a query plan. Why? There’s no, there’s nothing wrong here. Is it like you’ll start, like, putting this in, like, notepad++ and looking for, like, strange empty space characters and losing your mind. It’s just a T-SQL limitation.

But you can do that with a CTE where if you put a select top 1000 query in the CTE with your order by, you can delete from that and get a query plan just fine. Right? So this works, but just doing it in one query doesn’t.

So a lot of the utility and use of CTE is not performance. It’s not readability. And it’s certainly not, like, some sign that you know what you’re doing with T-SQL if you use them.

It’s how you use them. Right? Like, really, where they come in handy is where you do things that you can’t just do in one simple query. It’s like, you know, coming back to the row number stuff, you have to put that row number in some sort of derived table expression, whether it’s a CTE, whether it’s a derived table, you know, like, anything like that, in order to filter on the row number.

Other databases have a qualify clause that allow you, that allow, it’s sort of like a secondary where clause that allows you to filter on stuff that happens in the select list. Remember, we talked about logical query processing. Select happens almost last when queries are logically processed.

So stuff that you talk about in the select list isn’t visible to the where clause. If we had the qualify clause, it would be visible there. But we don’t.

Instead, we have fabric, fall down fabric, which, you know, just complete waste of our lives. So, like, most of the use of CTE just comes down to getting, like, T-SQL, there’s a workaround. Right?

Like, it’s never like, hey, there’s a straightforward way to do this. It’s always like, there’s this weird hack I read about. Right? It’s like, it’s never, almost never just like, oh, yeah, just do this one simple thing. It’s always like, no, no, no. You have to do this, like, four other things to get it to work, but it’ll work.

So, like, it’s really just a T-SQL limitation where we have to generate the row number in here before we can filter on it anywhere else. Or before we, like, you know, we can order by it up there, but we couldn’t filter on it in there. Because order by happens after select, but where happens way before select.

So, like, would it be nice if we had the qualify clause? Yes. Would it save us a lot of weird time and, like, typing and all the other stuff? Yes.

But, hey, it’s more important that everyone has a non-functional data late or something. Right? Okay. Anyway, thanks for watching. I hope you enjoyed yourselves. I hope you learned something.

And I will see you in another video where we will probably talk about something else T-SQL because that seems like a reasonable thing to do. Of course, the pre-sale on this course, you can buy this course at the pre-sale price, 250 bucks, down in the video description there. This is all companion material to the T-SQL seminars that Kendra Little and I will be teaching at Pass Data Community Summit in Seattle this November.

If you are attending those, you will get access to this companion material as part of your admission to the pre-cons. Otherwise, you will have to buy it from me. And if you wait too long, it won’t be the pre-sale price anymore.

It will be 500 bucks and you will say, can I still get the pre-sale price? And I will say, no. Why didn’t you buy it in the months that you had to buy it for the pre-sale price?

Ding dong. Anyway, I am going to go do something else now. CTE have once again found a way to depress me.

Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.