A Little About: Old vs New Cardinality Estimators In SQL Server

A Little About: Old vs New Cardinality Estimators In SQL Server


Video Summary

In this video, I delve into the reasons why I often prefer the legacy cardinality estimator over the default one in SQL Server. Using a practical example involving an identity column backfill process, I demonstrate how the legacy CE can provide more efficient execution plans compared to the new CE. By running the same query with both estimators and comparing their performance and execution plans, I highlight key differences that might influence your choice of cardinality estimator during query tuning. Whether you’re looking for a straightforward explanation or want to see real-world implications, this video offers valuable insights into when and how to leverage the legacy CE for better query optimization.

Full Transcript

Erik Darling here with Darling Data and we are going to talk in this video because I actually answered an Office Hours question recently where someone asked why I generally prefer the old cardinality estimator, old or legacy, compared to the new cardinality estimator or default cardinality estimator as Microsoft and all its blue Azure-y hubris calls it. And I’m just going to show you an example today of a query. Now, I know this is an example of one query, but it’s just a good example of kind of generally why I tend to prefer the legacy cardinality estimator and why when I am tuning queries and I am using a version of SQL Server and a database whose compatibility level dictates that we are using using the default cardinality estimator using the default cardinality estimator and using the legacy one. This is not, of course, a thorough undoing of everything that the new cardinality estimation model attempts to, you know, do differently than the legacy model, but it’s just an example of why I tend to prefer it and why I will always try it out. There are, of course, worst times in my query tuning life when I’m maybe using the legacy one and I might say, hey, let’s give the newer one a shot. Let’s give it a try. Let’s see how it goes. The worst thing that happens is that either the query finishes in the same amount of time or takes longer and we can say that didn’t work. Let’s try something else. So here we go with that. But before we do, of course, I mentioned Office Hours. If you want to ask me questions privately that I answer publicly, that link right there is how you do it. It’s down in the video description. There’s also a link where if you think that this channel is worth is as little as $4 a month, leaving your bank account and going into my bank account, you can sign up for a channel membership to support all of this wonderful material that I produce. I am also an acting SQL Server consultant active, maybe play play one on TV, play one on YouTube. If you need help with your SQL Server, you can hire me. And as always, my rates are reasonable. Hooray for reasonable rates. Anyway, my performance tuning training, if you want it for about 150 bucks for the rest of your life, that’s the link, that’s the discount code. The forming pattern here is that it is also in the video description.

My T-SQL course, Learn T-SQL with Eric is also available currently at a pre-sale price of $250. I recently finished recording all of the beginner material and am hard at work working on the advanced material now. So that will get done after the summer. The price of the video course will go up to $500 when that is complete. So I suggest you do that sooner than later. I am speaking a lot all over the place this summer. New York City, Dallas and Utrecht, August 18th to 19th, 15th to 16th, and October 1st to 2nd. Of course, all that is right before I go to Seattle for Pass Data Community Summit. Kendra Little and I will be delivering two T-SQL pre-cons at Pass Summit together over two days. So I hope to see you at both of those. But with that out of the way, let’s talk about this turkey here.

Now, let’s say that we have a table and that table and it has an identity column. Let’s pretend it’s called the votes table and let’s pretend it’s in the database called Stack Overflow 2013. Think that’s reasonable, right? The reasonable set of things that we can pretend. And we realize one day that our ID column is not as contiguous as we would like. And maybe we would like to go and backfill it. So we start designing a process to find all of the missing rows in the votes table. In this case, our job is to find the first, the lowest value that is missing from the votes table and then assign that to something and then do an insert to start backfilling rows in there.

Maybe that sounds a little silly, but I’ve seen plenty of places start needing to backfill their identity columns. And, you know, depending on various local factors, this might be a reasonable way of doing it. So if we just select the top 10 from the votes table and we look at the ID column, when compared to the row numbering column, right, this is not a column from the table.

This is just the row numbering that comes back from SQL Server Management Studio. We’ll notice about right here that things go a little amok on us, right? We are clearly missing ID 8, right?

We are like this goes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. This goes 1, 2, 3, 4, 5, 6, 7, 9. Ah, boy, we’re missing ID 8. All right, so our goal is to write a query that will find us the earliest missing ID.

Sounds pretty easy. One way of doing that would be to write a query that looks about like this. You would say I want to select the top 1 missing ID, which in this case, because of what we’re trying to accomplish, we would need to add 1 to the ID column here.

And we’re going to say where not exists, select doesn’t matter from the votes table, where v2.id minus 1 equals v.id. And we will order our results by v.id to make sure that they stay perfectly deterministic.

Since ID is already the clustered primary key of the votes table, it is guaranteed to be unique. And since it is indexed, it is presented to us in order. Now, you might look at this and scream a lot about sargability and whatnot, and I hear you.

I hear you. There’s a lot to be said for sargability. But we’re going to run this query in two ways. Right now, my database is in compatibility level 160.

1.6.0 for SQL Server 2022. So I’m going to run this query once using compatibility level 160 and all its attendant properties. And then we’re going to run the exact same query the exact same way down here, except we’re going to add in this use hint to force legacy cardinality estimation.

All right? So that’s the only difference between these two things is one is using the default cardinality estimator, and the other one is using the legacy cardinality estimator.

You might notice that it’s been a little while since things started running. You would be a very observant person if you picked up on that. These both return ID8, right?

They both return the correct result, but the execution plans are quite different. So this top plan is using the default cardinality estimator. You’ll notice that it took 8.6 seconds right here.

And if we scroll over here a little bit, yeah, my head’s not in the way. We’re off to a great start, aren’t we? We spent 1.7 seconds fully scanning the votes table. We spent 2.2 seconds fully scanning the votes table here.

We’re going to ignore the timing on the repartition streams operators for now because the wall clock timing on parallel exchanges is a complete disaster. So we’re going to ignore that for the time being.

And we’re going to look at this. So this is just about this whole section in here is where 8.5 seconds winds up, right? It’s not like something weird happened over here.

Like we have this top end sort, but this top end sort wasn’t like spilling a bajillion, 52 million rows to disk. And we were like, ah, God, we can’t take it. It’s tempdb.

We broke tempdb. It’s not that. If we look down here at this query where we use the default legacy cardinality estimator, oh, dear old me, this query chose a completely different execution plan, right?

One, it’s single threaded, right? I mean, first off, you might want to know it’s right here. It takes one millisecond.

This is a single threaded execution plan. There is no parallelism at all in any of this whole entire thing. But there are some funny looking numbers.

Like, for example, 900 of 1, 4, 5, 4, 6, 0, 0, 0. So that’s eight digit number. That’s 14 million rows, 14 and a half million rows.

It only took 900 rows for us to find it. So the big difference here is if we look at the properties of the clustered index scan here, there will be this row, estimated rows without row goal.

So SQL Server estimated that it would have to read this many rows to get stuff out of there. But, you know, the number of rows that it took was 900, right? So the actual number, we only needed 900.

SQL Server was like, it might take a while, but we only actually ever needed 900. But the important thing here is that this exists here, right? So we have estimated rows without row goal here.

We have estimated rows without row goal for the second time we touch the votes table. But if we click on these up here, that estimated rows without row goal thing disappears, right? It’s not in here, right?

Even though we have a top in there, SQL Server used a top end sort up here. So this one just used a regular top. So some slight visual differences in the execution plans. But this is sort of in general why, again, when I’m tuning queries and I am using the default cardinality estimator and I get a rather suspicious looking plan, I say to myself, Eric, we should check in on that legacy cardinality estimate.

We should see how Legacy Cardinality Estimator is doing today. Let’s see. Maybe we can bring it some snacks or, you know, just go give it a call. Have a little chat with it.

See how it’s hanging in there. Because a lot of the times, even if you don’t see like a performance difference this drastic from like eight and a half seconds to one millisecond, you can at least, you know, get some feedback from it and see if there are any differences.

And, you know, sometimes you do see something this dramatic, just like I did. Again, you know, you can go on and on about sargability and subtracting one from something that you’re comparing here. But, you know, Legacy Cardinality Estimator just does a better job here.

So you might find this in your queries as well as you are going through and tuning things. And like I said earlier, you may also find the opposite is true sometimes. You may find that the Legacy Cardinality Estimator does a rather rotten job of things occasionally.

And you might find that testing out the new Cardinality Estimator will do a better job. If you want to test out the new one, what you can do is say force default Cardinality Estimation. And you can use that use hint to test your queries out using the default Cardinality Estimator.

But anyway, that’s enough for now. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And I will see you over in the next video where I’m going to talk about… So I did a couple of videos sort of recently about writing batch mode friendly queries. And I realized one thing that might be useful for people would be a small bit of education on how to recognize query patterns where batch mode may be useful.

So we will do that. And one of these queries might even make an appearance in there. You might even see this exact same starting point.

So that’ll be fun for us anyway, won’t it? All right. Thank you for watching. All right.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 22

SQL Server Performance Office Hours Episode 22



To ask your questions, head over here.

In retrospect, what was the best thing about SQL Server 2022? For me, it’s Query Store hints.
why does index cleanup replace unique constraints with unique indexes?
why table value parameters get a different estimate than table variavles?
why do you not care about logical reads?
I missed you at SQLBits this year. Will you be there next year?

Video Summary

In this video, I dive into answering five user-submitted questions during an office hours session, providing insights and solutions to common SQL Server challenges. We cover topics ranging from the best features of SQL Server 2022 to why unique constraints might be replaced with unique indexes in certain scenarios, as well as the differences between table value parameters and table variables when it comes to query estimates. Additionally, I explain my reasoning for not focusing on logical reads when identifying slow queries, emphasizing that duration and CPU usage are more telling indicators of performance issues. The session also includes a bit of personal reflection on upcoming SQL Server events and community summits, including Pass Data Community Summit in Seattle, where Kendra Little and I will be delivering T-SQL pre-cons. Whether you’re a seasoned DBA or just starting out, there’s something for everyone in this episode!

Full Transcript

Erik Darling here with Darling Data. Nice to see you too. Fancy meeting you here, all that good stuff. It is time for an office hours in which I answer five user submitted questions. I don’t know how many users actually submit these. It could all be one person. Or I don’t know. It could be five different people. Anyway, I hope that I answer your question this week so you don’t feel left out. But before we do, the usual the actual old song and dance, if you would like to support this channel, memberships are available. I have an unlimited supply of those. If you enjoy this content and you would just want to support my efforts to keep caring enough about doing it, you can sign up for a membership. Otherwise, you know, all the other stuff. If you want to ask me questions that I answer on these episodes, I have a slightly different URL up here now. This one, goes actually to my website rather than directly to the Google form because there is some additional information on the website about if you need to ask questions about code or execution plans and you need to share them. So I’ve changed the link there a bit and I suppose I’ll fix it in the YouTube videos as well. Or at least the ones that I, I don’t know. We’ll figure it out. Anyway, I’m available for consulting as well. I got an unlimited supply of that. Never seemed to run out of consulting, health checks, performance analysis, hands on tuning, dealing with performance emergencies, and of course, training your developers so that you don’t have any more performance emergencies. Good Lord, that’s quite a bit of service. And as always, my rates are reasonable. My performance tuning content, all of it 24 hours is available for 75% off, which means about 150 US dollars. You can you can of course go to that link and plug in that discount code to get the everything bundle over there. And if you want to pick up my new T-SQL course, which has all 23 ish or so hours of beginner content currently published, you can do that now for the pre-sale price of 250 bucks. That’ll be going up to 500 bucks after the summer once the advanced material lands.

And speaking of summer, boy, is it hot out. New York City, Dallas and Utrecht will all be graced with my presence over the summerish months with the Pass on Tour events. Redgate has decided that they’re going to smuggle me to various places to talk about SQL Server stuff. So that’ll be fun. Especially fun for you, I hope. And of course, Pass Data Community Summit will be in Seattle, November 17th to 21st, where Kendra Little and I are delivering not one, but two T-SQL pre-cons. So we’ll have a great lot of fun there. But with that out of the way, let’s do this whole office hours shindig. Let’s have some fun here. All right. Our first question. Let me, where is Zoomit? There you are. Where’s my little pink dot buddy? There we go. All right. In retrospect, what was the best thing about SQL Server 2022? For me, it’s query store hints.

I wish I had the same love and affection for query store hints and plan forcing. I suppose they’re great when they work, but it’s less fun when they suddenly stop working and you’re like, wait, what happened to the thing that I just told it to do? It worked for a while. Why is it not working now? And then you have to go do it again and kick plans out. It’s not fun. As far as SQL Server 2022 features, let’s see. I don’t know. That was 2019. No, well, that’s going to be in 2025. Gosh, you got me. SQL Server 2022. I suppose there were some decent linguistic improvements to window functions.

But like features, I don’t know. Let’s just let’s just throw it out there for for again. It’s cool when it works, but probably the parameter sensitive plan optimization is a nice, as they say, down payment on, you know, fixing quite a quite a pernicious issue in databases generally. So that’s that’s that’s about it there. 2022. Kind of a kind of a bummer. Kind of 2014 ish, kind of 2017 ish in that it’s it’s not very interesting generally 2025. I don’t know. All right. Here’s a good one. Why do index cleanup replace unique constraints with unique indexes?

Well, my friend, you’re you’re you’re referring to my store procedure SP underscore index cleanup. And the reason why it replaces unique constraints with unique indexes, which is only sometimes is if you have a unique constraint on, let’s say, column a to get, you know, real, real, real worldy there. And you have, let’s say, an either a unique or non unique nonclustered index on column a, maybe with other key columns or actually, no, not with other key columns. I lied. Other key columns would mess it all up on column a with like other included columns.

Then, then, then SP index cleanup will either make either either drop, give you a script to get rid of the unique constraint because unique constraints are backed by an index anyway. So if you already have a unique nonclustered index on that column with some includes, then like you don’t really need the unique constraint still. But if you have a non unique nonclustered index and what it’ll do is give you a script to make the non unique nonclustered index unique and also get rid of the unique constraint because it’s sort of a duplicative facility at that point.

So, um, like, you know, when, like, there is like, I guess a question that sometimes comes up, it’s like, oh, should I use unique constraints or unique indexes? And, um, you know, I, I do prefer the unique index because you have a bit more flexibility with the unique index than you do with just unique constraints as far as like included columns and like some other options go. So, so that’s, that’s, that’s about that there.

All right. Next up. Let’s see here. Um, got it. The same person, right? This one, uh, why table value parameters get a different estimate than table variables? Well, um, I’m going to guess you mean table variables.

Uh, so table value parameters are of course backed by, uh, table variables or rather presented to store procedures with table variables. Uh, but since they are presented to store procedures as parameters, uh, table valued parameters tend to get parameter sniffed the way that, uh, other parameters do. But, you know, it’s, it’s really only for like the table level cardinality.

So like you might find that you execute a store procedure with a table valued parameter. And when you pass it in with, let’s say like 10,000 rows, then you get a 10,000 row table cardinality, table level cardinality estimate from it. Um, and then it’ll keep that until you, you know, so recompilation occurs for whatever reason.

Uh, but then if you were to pass it in instead for first compilation with a table valued parameter that has a thousand rows in it, it would just use that thousand rows over and over again. So table valued parameters are a little bit different in that they tend to get sniffed like parameters rather than, um, you know, being treated like table variables, which, uh, the, the cardinality estimates that you get, get from those do depend a bit on, uh, version edition and, uh, database compatibility level among some other things. All right.

Hey, let’s look at this well, well, well-formed, well-structured question. Why do you not care about logical reads? Well, just because logical reads don’t tell me how long if a query was fast or slow. They don’t indicate that.

Uh, I want to find queries with performance problems. So I go looking for queries that, you know, uh, have a high duration and or a high CPU. Uh, so that’s, that’s it.

Logical reads don’t, don’t tell you if a query was slow or not. I want to find slow queries. So I find queries that use a lot of some mixture of wall clock or CPU time. I don’t know.

That, that, that seems, that seems fairly straightforward to me. Uh, anyway, let’s go on. Uh, oh, oh, how sweet are you? Hey, look at you. Someone, some lucky person.

Hey, wait a minute. Come on, zoom it. Uh, some lucky person out there got to go to SQLBits. I missed you at SQLBits this year. Will you be there next year? Another well-structured question. Good job out there.

Uh, I don’t know if I’ll be there next year. Uh, SQLBits has changed the way they do their pre-cons. And, uh, they, they are now curated. Uh, they curate the speakers.

I was not curated for this past SQLBits. So if you would like to see me curated, uh, for SQLBits, then you are, of course, you know, welcome to express that opinion to the SQLBits organizers.

I don’t know how much good it will do. I don’t know. I don’t know, uh, what their curation process is. But, um, I don’t know. Maybe, maybe there’s a cure for it.

Anyway, uh, that, that, that gets us through five questions here. Uh, they’re short ones this time around, I guess. It’s easy for me then. Uh, anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you… …soon. Somewhere. Somehow. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Carry Over Sort vs Batch Mode Window Functions

Carry Over Sort vs Batch Mode Window Functions


Video Summary

In this video, I delve into an interesting query optimization topic that has been relevant for years but is now less pertinent due to the advent of batch mode in SQL Server. I explore why certain complex query syntaxes were necessary before batch mode existed and how they can be replaced with simpler, more efficient methods today. Specifically, I demonstrate the carry-over sort method—a technique often used when window functions weren’t available—and contrast it with modern approaches using window functions. By running sample queries in SQL Server Management Studio, I illustrate why the old method is not only slower but also less accurate for certain scenarios. The video aims to provide insights into query optimization and help viewers understand how leveraging batch mode can significantly improve performance without resorting to outdated techniques.

Full Transcript

Over the years, there’s been a lot of very interesting and intricate query syntax ginned up to deal with row mode performance issues that don’t really exist anymore in a world where batch mode exists. One of them is called the carry over sort. I don’t know if you’ve maybe ever run into a problem like this or whether you’ve ever stumbled upon this problem. I admit that the name is a little outlandish, but it comes up a lot when you’re dealing with data analysis type work. And, you know, prior to there being batch mode, there was one specific way that you typically wanted to write queries if you wanted to sort of emulate what what a windowing function would give you for finding the max value for a row, not per column in a row, but just per row, that batch mode largely solves. So, I’m going to talk about that today. Once we get over to SQL Server Management Studio, I’m going to show you probably why the query you think that you want to write is not right. Just how slow, how the process of the process of doing this using row mode is and how batch mode improves that. But of course, we’re going to contrast that with the carry over sort method, the way that it has been presented. This is not my syntax. It’s something that I’ve run across and I just wanted to make sure that I was comparing it accurately to modern versions of SQL Server. Of course, you may still need the carry over sort method of doing this if you are using standard edition of SQL Server. The reason for that is because if you are using standard edition of SQL Server, at least as of the recording of this video, Microsoft still does not think that you have paid them enough to be your friend and they have deliberately hobbled anything involved in the way that you have paid them enough to be your friend.

there’s nothing to build. Yo, of course, otherwiseulanning you have final version of SQL flagship and will be rewrite your friend or of Spain. you have had made all the power졌uj assodo.com Be it would be an exact same way that you have posted positive feedback button to the world And that is now patreon.com buradazt neuron will понять how to run substitute feedback of how to run with optimal name in optimal name know if you want to contribute to this channel you can sign up for a membership link down in the video description for that it’s just a show of appreciation for all this cool content that i write and record for free you can also like comment subscribe and ask me questions for my office hours episodes if you need sql server can help consulting help boy howdy i do all this stuff and as always my rates are reasonable if you want to get all my performance tuning training i have 24 hours of it for about 150 us dollars uh you can in that in that lasts you for life you get it forever and always uh just go to that url plug in that discount code which is also a fully assembled link down yonder and you can you can you can start your learning today uh my new t-sql course uh the beginner content is all done and recorded it’s about 23 hours over 69 modules uh if you are going to attend kendra little and i’s uh t-sql past pre-cons you will get access to this material and uh right now the course is on pre-sale for 250 bucks it will go up to 500 bucks when the course is fully recorded after the summer i am hard at work on all the advanced material now so that isn’t that isn’t that spectacular for you how hard i work uh if you would like to see me live and in person i am going on tour with the red gate road show uh the pass on tour dates new york august 18th and 19th dallas september 15th and 16th and utrecht not just an art supply store it’s a hamlet in the netherlands october 1st and 2nd and then of course past data community summit in seattle november 17th to 21st so with that out of the way uh let’s let’s pratty or party whatever it is so i have uh pre-run a couple things here uh the first the first two things that i have pre-run are uh the the the the version of this query that a lot of people uh will write uh or maybe tried to write at some point in the in the far distant past and we’re unhappy with the performance of uh which is basically to uh select some stuff within a CTE and most importantly in that CTE generate a row number and the goal of the row number is for each post type id because that’s what we’re partitioning by we want to order by uh the the creation date uh converted to just a date it’s a date time we’re just converting it to a date then ordered by owner i user owner user id descending and then ordered by id descending with id descending acting as a bit of a tiebreaker because id is unique and these other things are not guaranteed to be unique either individually or in concert the second thing that i’ve i’ve run is a query that this is hinted to use uh optimizer compatibility level 140 because i do not want batch mode on rowstore to kick in and be enabled for this query the second thing i’ve included with the first uh first running of things is the query that a lot of people think could replace this but this is not the right query to write for this because what this is doing is it’s getting you the max value for each column grouped by post type id that is not correct correct because the the the real sort of algorithm if you want to call it that is that we are ordered first by creation date then ordered by owner user id then ordered by id this is getting the max for each of those individually the first thing that i want to show you is the results because showing you the results shows you where these two methods uh no longer agree as far as uh the included data goes so uh post type id one the owner user id is different between these two uh it’s also different it’s also a different owner user id for post type id two and post type id well three of course right there um the ids are different for these two as well uh so really the the max method just does not give you the correct results there’s just too much different in here focusing over on the execution plan uh we can ignore this one because this one does not give us the correct results so let’s just get that like way out of the way we’re not thinking about this one at all this is the row mode version of the query and you’ll see if you follow along the operator times here this thing runs for just about well a little over 14 seconds i like to say 15 seconds that feels good to me so let’s just say this thing ran for like 15 seconds right nice nice fizz buzzy number there uh and and like really you know there’s just not a whole lot to say about this generally uh in row mode queries like this are quite painful even if you have a reasonable index for sql server to use to make the the window function go faster um you you often it’s often just a terribly inefficient way of writing and running the query so what what i want to show you next is uh the carry over sort method so what the carry over sort method aims to do is get the max you see the max starts here and the max ends way down here but what the max is doing is basically assembling a string based on the getting the max of all three of the columns that we care about the normal carry over sort thing uh does not include as much complexity uh in uh for the second column as mine does the problem that i was running into is that there are negative owner user ids in the post table and sure i could have filtered them out but that’s that’s cheating a little bit uh we want to maintain uh all user ids we want to make sure that we even include the negative ones because what when the max is when the max is a negative number then we we need to consider that don’t we we can’t just not return a result for a row because we didn’t feel like dealing with some abnormal potential abnormalities in the data so i have a case expression here and the case expression just says when owner user id is less than zero then i add some x’s to the left uh which um is different from what we’re doing when owner user id is greater than zero or greater than or equal to zero uh because with that we are right padding the number so we’re like adding like zeros to the right of it the reason that i did the x’s is because if i zero padded it things would have gotten messed up down in the select query in the select query uh we are basically asking for a substring and converting the substring to the correct data type so if i were when you do that with the id column and you have a zero padded number and you convert a zero pad right zero padded number to an integer it’s no problem right because like we just you just remove the zeros from the front of it and you give give the rest of the number the problem with the negative number is that you have you essentially have a string and if i added zeros in then i wouldn’t know if there were naturally occurring zeros in the number uh so i used x’s and i replaced the x’s to avoid confusion there but the carryover sort method just does this we still group by post type id we’re just we’re using the max function across like we’re we have three different columns in the max function uh post type id sorry creation date owner user id and id and like we’re assembling like the max across those three right so max encapsulates all three of those columns and if i run this query and we got a little bit of highlighting to do here don’t we uh this will return correct results at least if we at least compared to the first query that we ran with the window function and uh so like all the results here match what we get from the window function version but we get it much faster what’s cool here is that none of this query even though we’re in compat level 160 uses batch mode the scan of the post table happens in row mode you can see that just sort of over right next to my big head there uh this compute scalar also happens in row mode the hash match aggregate which is totally eligible for batch mode still goes in row mode uh so basically and like and of course parallel exchanges like gather streams don’t support row mode so this whole thing finishes in about two seconds without using any batch mode whatsoever so back before batch mode was really a cool useful thing this was a good method to get the max value per row like i said earlier this can still be very useful if you’re on standard edition because microsoft you didn’t if you’re on standard edition microsoft doesn’t think that you paid them enough to have your queries perform well so you don’t get like even if you’ve got batch mode to like batch mode to happen here you would be limited to a dop of two for your batch mode queries so you would not be like like i have max dop set to eight for this this thing will have used like a dop of eight for uh the query right you can see degree of parallelism right there eight so this thing would have used uh eight cores up for for this whole thing and spreading the workload of 17 million rows out across eight across eight cores nice and efficient is pretty efficient for row mode uh and for batch mode but like 17 million rows across two threads even in batch mode you’re likely going to see some performance fall off there like whatever the like i don’t know maybe maybe you’ll get real lucky and the trade-off won’t be too terrible but um that the standard edition limitations there are really quite a pain the next thing that we’re going to that i want to show you is the uh the the window function version of this again but without that compat level uh 140 restriction on it down here to prevent batch mode on rowstore for kicking in so if we run this query this will take about two seconds now right we get back the correct results this finishes in 1.8 seconds just like the carryover sort method that i showed you above and we don’t have to write that crazy max syntax where we convert like dates and numbers to strings and pad things and all that other stuff with the the synthesis developer edition which is the enterprise edition equivalent skew i’m not using uh 2025 standard developer edition because why would i hurt myself that way um that we we are we are just using regular uh edition here regular developer edition here so this runs at a degree of parallelism of eight and this runs nice and quickly and efficiently so if you’re writing uh window fun so like you know like really the the idea of this window function right it’s like we just want to get like the top row for each post type id so i’m filtering to where row number equals one for all these if it’s if your goal is to find like the max value of something for a row based on whatever criteria i just kind of picked three columns at random from the post table that seemed to like make sense i guess i could have thrown score in there if we felt like it um but uh if you’re doing that like i still would have had to deal with potentially negative numbers because scores can be negative in the post table so that like that maybe that wouldn’t have saved me too much time or trouble with the carryover sort syntax but uh if your goal is to like sort of find data like this and you’re filtering to where row number equals one batch mode can make these queries crazy fast like you don’t even have to add indexes like just let batch mode on rowstore kick in read from your table and batch mode process all the data in batch mode like it’s a way better uh way of running like big data crunchy queries like this if but if you don’t if you’re not in a situation where batch mode on rowstore can kick in for you like if you’re not on sql server 2019 plus and you’re not on enterprise edition and your database is an incompatibility level 150 or better and there’s no batch mode on rowstore uh like naturally occurring and you know like depending on if you can change stuff like you know you can you can mess around with like you know columnstore indexes uh you can mess around with uh putting like a like just like having an empty table in your database with a clustered column store index or creating a temp table with a clustered columnstore index and like left joining to that thing uh you can do all sorts of stuff to get like partial batch mode on rowstore but it it does not go as deep into your query plans as batch mode on rowstore like the intelligent query processing optimizer feature does just kind of weird i think but it’s just something that you learn to live with when you are tuning queries across a variety of strange environments anyway uh i this is just something that caught my interest and i felt like talking about um uh i i hope you enjoyed yourselves i hope you’ve learned something i hope maybe there was some good educational point in this video that you are able to take away from it even if it’s not uh memorizing the the crazy syntax in here uh i feel like perhaps there were a good a few good educational moments aside from that but anyway thank you for watching and i will see you over in the next video adios you

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Window Function Tricks for SQL Server 2022+

Learn T-SQL With Erik: Window Function Tricks for SQL Server 2022+


Video Summary

In this video, I delve into the latest enhancements to T-SQL in SQL Server 2025 and reflect on what Microsoft has or hasn’t done to improve it. While there are some minor additions like regex support and aware clauses, much of my focus is on the improvements from SQL Server 2022 that make working with window functions more efficient. I explore new features such as `ignore nulls` and `respect nulls` in `lag` and `lead` functions, which significantly simplify finding last non-null values. Additionally, I discuss shared window clauses, demonstrating how they can be used to reduce redundancy and improve query readability by allowing common window specifications to be defined once and reused across multiple window functions. Despite these useful additions, the overall sentiment remains that Microsoft has largely neglected T-SQL enhancements in recent releases, leaving SQL Server users with a sense of being left behind compared to other database systems.

Full Transcript

All right, so we are going to continue with a little bit of teaser material from the Learn T-SQL with Erik course. Again, that is still on presale price until after the summer when the advanced material drops. There’s a link to purchase it down below. It’s 250 bucks right now. It’ll last you for the rest of your life. And of course, if you are attending past data community summit and you’re coming to Kendra Little and I’s T-SQL pre-cons, you will get access to this material for free because it is completely free. It’s a companion material to what we will be teaching. So, you know, SQL Server 2025. There are release notes all over the place for it. And there is not a single worthwhile enhancement to T-SQL to talk about. Sure, we got regex. Okay. You have any idea how many people that’s going to screw up? Regex and aware clause. I mean, cool. Like, as a consultant, like, yeah. But as far as, like, things I’m excited about, nothing. I think, you know, one way, one way you can sort of judge how much Microsoft cared about a specific SQL Server release is by how much T-SQL has, been sort of alleviated of the many things that it has been missing for many years that are in the SQL standard. And this one is rather laughable. You know, I guess Microsoft is busy trying to get Fabric to catch up with Databricks.

So they have ceased trying to get SQL Server to catch up with, like, every other database on the planet. So, cool. Anyway, the only T-SQL enhancements that I have thought were kind of neat were back in SQL Server 2022 when Windows Functions got a couple neat new things. Windows Functions got nothing in SQL Server 2025. We have once again been left in the dustbin. We are on the shelf. We are not having a good time.

So, like, if you ever spend time, like, just like, I don’t know, like, if you read T-SQL blogs for fun, you read SQL Server blogs for fun, you may have found a particular brand of problem across posts over the years called the last non-null value. This, of course, this, of course, this, of course, did get easier with window functions. Before window functions, it was, forget it, like, queries would never finish. But even with window functions, it takes, like, multi-step queries in order to get the last non-null value for something.

So, if we run a query like this and we say, like, you know, we get the last commenter is lag user ID one over order by creation date, you’ll notice that there’s a lot of, you know, nulls in here. So, if we wanted to find the last non-null value, we would have to, like, we would have to essentially, like, like, run this query and then run another query to sort of, like, to get those other values. It gets very complicated very quickly.

What SQL Server 2022 added is a couple things that you can stick into, like, the lag lead window functions to either ignore nulls or respect nulls. Now, you can think of it what you will, that SQL Server Management Studio 21’s parser has a bunch of red squiggles in this query because it does not recognize the syntax from SQL Server 2022. All right.

So, we have SQL Server Management Studio 2021, which became GA, like, I don’t know, a couple months ago at this point. And we have SQL Server 2022, which came out, like, three years ago at this point. And the parser is still like, I don’t know what that means.

So, you know, we got dark mode. Okay. But I promise you that this query will run successfully.

What I’ve added to this query are the lines ignore nulls for this one and respect nulls for this one. So, ignore nulls makes finding the last non-null thing a lot easier because this will give you the last non-null value in the column. Right.

So, this ignore nulls just gives us the value that we want over and over again. Granted, this isn’t a very interesting data set, but the respect nulls, we get all of this stuff back. Right.

Now, forever, we have had the ability to pass in a third input to, like, lag and lead and stuff. I’m just going to spread this syntax out a little bit so it’s a little bit more obvious what I’m doing in here and why there are some rows that have a very strange big number in them.

And that is because I am adding a third optional input to the lag and lead functions, which give you a default value for anything that would have been produced a null because of the function. So, the results in here, you’ll notice that both of these lines have the integer maximum for them.

That just, that’s because we filled in a blank with that optional third parameter. Now, there’s other neat stuff that came out in SQL Server 2022 for window functions as well. Like, you can now have shared window clauses.

So, like, if you were writing window functions with, like, similar, like, window specifications in them, you would have to, like, write that over and over again and your queries could get very, very big with window function specifications. But now what you can do is you can say something like this, right?

Notice we’re just saying over x here, right? And typically, over x would be like, huh, what is x? Well, x is what we have defined down here.

This window x, right? We have, it’s almost, it almost looks like a CTE for your window function. Isn’t that scary? Right? Window x as partitioned by owner user ID, order by rows between unbounded proceeding and current row.

So, both of these window functions, sum and average, can share a common window clause. Now, would that Microsoft were so kind as to give us more neat enhancements to T-SQL like this in SQL Server 2025, you and I could be talking about much newer cool stuff.

Here we are, though. But what’s even neater, I think, about the common window clause is that you can actually stack them so that they inherit window clause specifications from higher up ones.

So, it almost looks like stacked CTE when you read them. So, here, notice that we’re not using x anymore. We’re using T-S and A-V, right?

So, this is the window specification for sum, and this is the window specification for average. And if we look down here, this is where I’m doing the magic work for this one. We have window x as partitioned by owner user ID, right?

And then we’re saying, comma, T-S as order by score between rows unbounded proceeding and current row. And then, just for, you know, a little bit of texture in the demo, A-V is ordering by score descending between unbounded rows proceeding and unbounded rows following.

So, they’re both going to partition by owner user ID, but then they’re both going to do something slightly different with the order by. So, score for this one is ascending.

Score for this one is descending. This one is going from the beginning of the results to the current row. And this one is going for the entire result set. So, unbounded proceeding and unbounded following. So, it’s like the entire thing is what we’re getting the average as.

And now, we can allow our window functions to not only share a window clause, but to inherit and share window clauses. So, we can get back even, we can, I don’t know, make much more interesting queries without, I don’t know, I guess that’s actually still kind of a lot of typing now that I think about it.

But, it saves you some space up here in the select list. It makes that cleaner and tidier. I’ll give it that much. So, that’s just a couple cool things from 2022. Hey, we got Regex.

Ding. Like and subscribe. All right. Cool. Thanks for watching. I hope you enjoyed yourselves. I hope you learned something. And I’ll see you over in the next video where we’re going to talk about some stuff that batch mode makes a whole lot faster.

So, we’re going to take a little break from being depressed about Microsoft’s abandonment of SQL Server generally. And we’ll talk about some stuff from back when they cared.

That’ll be a good time. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Window Functions and Aggregates

Learn T-SQL With Erik: Window Functions and Aggregates


Video Summary

In this video, I delve into a fascinating aspect of window functions that isn’t always highlighted in discussions about these powerful SQL Server features. I demonstrate how to use window functions not just for simple column aggregations but also for more complex scenarios where you need to preserve and order aggregated data within a temporary table. By incorporating row numbers based on aggregate values, you can ensure that when querying the results later, they are ordered by the most impactful findings first—something many overlook in their SQL practices. I walk through an example using Stack Overflow data, showing how to calculate averages of counts over different time spans and order them effectively. This technique is particularly useful for troubleshooting and reporting purposes where you need to prioritize high-impact issues. Additionally, I share insights on the flexibility of window functions, illustrating that they can be used in a variety of creative ways beyond just row numbering, making your SQL queries more powerful and versatile.

Full Transcript

Erik Darling here with Darling Data. And to continue the Tickler material for Learn T-SQL with Erik, for which all of the beginner content is now published. There’s about 23 hours of it across an entire 69 modules. Do with that information what you will, but again, the presale price is still $250 until the advanced stuff drops after the summer. So, we are going to, I’m going to show you something that I think is very neat about window functions that not a lot of people pick up on about window functions. So, you may have noticed that I have a couple of store procedures that try to help people troubleshoot various aspects of their SQL Server.

Some of those store procedures have sort of like a roll-up of findings, an aggregated roll-up of all the findings in there. And what I found was that, of course, you know, it’s very easy to, you know, produce the sort of summarized output. What was not easy to do was to order that summarized output later.

So, what I ended up learning to do to make that easier is when I insert the data into the table, I have not only the aggregations of the things, but also a row number that gets produced based on the aggregations of the things. So, for example, like if we were just going to query the stack overflow database, the aggregation plus the ordering would look something like this. We’re going to get a count of all the posts, right?

So, we’re getting the post type ID up here and we’re producing some text. And so, like, because we’re producing some text, you couldn’t order by the text in the output. So, it’s just going to be like texty ordering.

We’re not ordering by like what had the most of something, which is what I would want to do. I’m going to like prioritize the high impact stuff first. But what I learned to do was like put that count into the order by clause of the row number function, which looks like this, so that I can preserve the numbered output for when I want to order it coming out of the findings temp table.

Like to just do this in one query would be trivial to do this when you’re like putting data into a temp table and then you want to select it out later with a specific order. This, this, this becomes a little bit trickier, but the results look like this. And of course, I’m not connected to SQL Server, which would help.

But now, when I run this query, I’m not ordered by the text output. I’m ordered by which posts had the most rows. Now, I know that for this standalone query, I could have just said order by count big descending.

That’s, that’s no problem. The point is that now I have this row number column. So when I want to, when I want to select data out of like my findings table later, I’m going to go to the next one. So I can say like order by finding ID and then this row number column.

So I have the top stuff at the top. So like, you know, post type ID to having 11 million posts and post type ID one having 6 million posts. Those are up at the top of like whatever thing I want to show you.

But, you know, this is something that not a lot of people understand about window functions is that you can partition by, you can order by and you can pass aggregates into them. Just like you do. Like, you know, like another aggregate or something like you could do like sum divided by count or something like that.

Like you could, you can, you can have these things intermix and sort of live together. So just like an easy example of that would be like doing some average, like averages of counts. Like you don’t like a lot of, like I’ve seen a lot of queries written like this where like someone goes and like gets the count and like one CTE or something.

And then does the averages after they’ve like done the counts, which is kind of silly because you can just do it all in one place. The only thing you have to be a little bit aware of, of course, is how you choose to set up your, your, your window range and row specification to deal with that. So what I’m doing in this query is I am hitting the post table again.

I’m doing a little bit of fanciness in here in order to, in order to like give, give myself like consistent, like accessible aliases for these two expressions across my whole query. And what I’m doing is I’m asking for an average of the count over different spans of time. Now, since I want three, six and 12 month averages for the three month average, I have to say between two proceeding and current row, which feels a little funny.

Cause you’re like, I want a three month average. I should put three proceeding in current row, but that would give you four rows, right? You want two proceeding and the current row.

That’s three total. And it’s the same thing for the six month average and the 12 month average for the six month average. You want five rows. And for the 12 month average, you want 11 rows and proceeding. Now, granted for the, the 12 month average, it wouldn’t matter as much because it may be like the last row anyway.

So you’re like, like you could just say between unbounded proceeding and current row if you wanted. But if there’s more than that, if there’s more to that, then you would need to, you know, obviously make sure that you are very specific about it. But now when I run this query, I’m able to just all in one sort of fill swoop of things, get the average number of posts across three, six and 12 and all 12 months in the year 2013.

If we were doing multiple years, that’s where you have to be really careful with the 12 month one. Um, but what you’ll see here is that when we look specifically at these three columns that produce the averages, uh, you’ll see the, the three month average here across all three of these, uh, that, that is the same. But then for the six month average, this number changes from this number.

So this is just another three month average here, whereas this is really producing the six month average across these, right? So like this 12 month average here too, we’ll have the same thing across the first six months, but then, uh, like where this thing for every three months will reset and give us a new set of averages. This gives us two, two window frames of add of averages, the six months and six months.

And then this one, like where this agrees up to the six month part, this is, this really, this really like departs here. And then we have like the full 12 month, uh, um, you know, sort of like, uh, average across all 12. So the main message here is that, um, window functions, uh, are not just sort of limited to, you know, like a column in your tape, in your table.

You can do all, you can, you can assign all sorts of stuff to them. Uh, you saw there where, uh, the first query where I was ordering by account big, uh, and this one, I’m getting the average of count bigs over different spans of time. So there’s a lot of neat stuff you can do with window functions that, uh, are like, may not be just like immediate, obvious and apparent to you based on the way you see a lot of window functions out in the world written.

But there are some really cool things you can do with them that often get overlooked. Like you’ll hear people talk about like, you know, how cool and powerful they are, but they kind of just give you the same examples of like, you know, the kind of like, I don’t know, like, here’s a row number over and over again. Like here’s this different things you can do with row number and you’re like, okay, row number. Great.

But like, like think of all the other neat stuff that you can do for your sort of like neat analytical queries. Anyway, uh, that’s enough here. Again, this is all tickler material from learn T SQL with Eric, uh, still on sale for 250 bucks. There’s a link down in the video description. If you want to see much, much more material like this, uh, I don’t know.

Otherwise, I don’t know. Go live in the T SQL dark for the rest of your life. See if I can. All right. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Common Table Expression Fork Bombs

Common Table Expression Fork Bombs


Video Summary

In this video, I dive into a fascinating concept known as a CTE fork bomb, exploring how recursive Common Table Expressions (CTEs) can lead to exponential growth in query execution. I start by setting the stage with simple sample data and gradually build up complexity, illustrating how nested loops joins within CTEs can multiply the number of rows, leading to significant performance impacts. By breaking down each step of the execution plan, I highlight the importance of understanding these patterns for optimizing queries and avoiding potential performance pitfalls. The video is packed with detailed explanations and visual aids, making it a must-watch for anyone interested in deepening their knowledge of SQL Server query plans and optimization techniques.

Full Transcript

Erik Darling here with Darling Data. Feeling very high energy today, very… very just pumped up. Let’s go. Let’s go get them. Today I want to talk about a CTE fork bomb. If you’re not familiar with what a fork bomb is, you can sort of consider it to be like viral replication where one cell becomes two cells and two cells become four and four. It just keeps, like, getting bigger, right? And that’s what we’re gonna do today. If you would like to support this channel, you can do so. There’s a link to become a member down in the video description below. If you want to ask me questions during my Office Hours episodes, you can do that. Otherwise, the usual like, comment, subscribe stuff is all available to you should you feel so encouraged to do something. So if you need SQL Server consulting help, well, that’s me. Health checks, performance analysis, hands-on tuning, dealing with performance emergencies and training your developers to not write fork bombs on your servers. All good. All good and worthwhile things there. You can get all of my performance tuning content, about 24 hours of it for 75% off. Again, link down in the video description. That brings it down to about 150 bucks and you get that for the rest of your life.

The T-SQL course is now half done. All of the beginner content is online and published. There is about 23 hours of it across, last count, 69 modules. So that’s fun there. Of course, past pre-con attendees will get free access to all of this companion material. It is on sale right now for the pre-sale price of $250. That price will go up in the fall to $500 as soon as everything is said and done. I am doing a lot of outside the house stuff this summer. I will be in New York City, shockingly, August 18th and 19th. I will be in Dallas, Texas, September 15th and 16th.

And I will be in Utrecht, that old Netherlands thing, October 1st and 2nd. And of course, I will be at Pass Data Community Summit from November 17th to 21st in Seattle, Washington, assuming that Seattle is still a city at that time. But with that out of the way, let us talk about this CTE fork bomb thing. Because, you know, making fun of CTE never gets old. At least not for me anyway.

So we’re going to create some sample data here, some simple sample data, because we don’t want to create overly complex sample data that will confound and confuse the masses out there, do we? We want very simple, straightforward demonstrations so that everyone can understand everything. So before I get to the actual fork bomb, there are some agreements that you and I must come to.

We must agree on these concepts so that by the time we get to the fork bomb, you understand fundamentally what is happening. So if we run this query that joins together the two tables that I just created and populated with data, we will get back 255 rows. And if we look at the execution plan, there will be one scan of the table T1, one scan of the table T0, and one merge join in order to produce those 255 rows.

If we look at the details here, there is one number of executions for that scan. And there is one number of executions for this scan. Good stuff.

We can agree on these things. If this query were to use a nested loops join, say like this, where I’m going to force some things to happen, the query plan would change. We would no longer have a merge join.

We would have a nested loops join. And this table would 255 rows would be read from T0. 32,767 rows would be read from T1.

What changes aside from the nested loops join is we had a merge join before, if you remember back that long ago, you little goldfish. This scan still has one execution. But now we have something different on the inner side of the nested loops join.

Now we have an index seek, and we have 255 executions of the index seek. All right, 255 right there. So when you have a nested loops join, the thing on the nested loops join will execute once for every row on the outer side of the join.

That’s this part to go get rows. Since I don’t know how the cops are going to sound on the new microphone. So it’s my first one.

So lucky us get to experiment together. If every time a row comes out of here, since the way that the data is designed, every row will match. Right.

So all 32,000 rows in this table have a match here, which means all 255 rows in this table have a match here. So we do this seek 255 times, we find 32,767 matches, and then we aggregate those down to 255 based on the number of unique IDs that came out of t0. Now, if we were to put that query into a CTE, and we were to join that CTE to itself, the query plan would change yet again.

Now the query plan. Well, I mean, A, we have a hash join up here, but now we have two nested loops joins. We actually have two copies of that query that runs because of course, if you are a frequent watcher of my videos, you will know that Microsoft SQL Server at this point in time does not offer a mechanism to materialize the results of a CTE.

That lack of materialization means that every time you reference a CTE in an outer scope, you will have to rerun the query in the CTE. So we actually have the same plan run twice, right? There’s the first copy of the plan of the execution of the query, if that’s easier for you to deal with.

And down here is the second one. Now, what this means is that we have two scans of the table t0, right? And see a number of executions.

Oh, you know what? That is hiding behind my head. Let’s try that again. So we have two scans of the table here, right? One or rather one scan of the table here. We have another scan count of one for this table here for a grand total of two.

Now, this index seek into t1 has 255 executions and so does this one. So we have two total scans of the table t0 and we have 255 scans a piece, which is, I’m going to guess around 510 seeks into t1 total. Now, since there’s a hash join up here, right?

We have a hash join that brings these two results together. Remember when I said from the CTE, join the CTE to itself on the ID column. So this is how SQL Server chose to join those two CTE queries together with a hash join.

To simplify things quite a bit, let’s just say, you know, for the sake of making sure that we stay in agreement, that the first query plan up there, the one, the uppermost plan above my head, the outer side of the join ran, did all its work, went to the hash join, built a hash table, and then the inner side of the query ran and the hash join did its thing to, you know, compare rows in the hash table and all that other stuff.

So the, let’s just say the outer side of the query ran, got to the hash join, then the inner side of the query ran and like got, then got to the hash join and comparisons were made and we decided which rows matched and which didn’t at the join. So these re you really do have two executions of this. Another sort of easy way to see that is that you like, when you look at the operator times, you know, you have like this part of the plan executes and it takes like four milliseconds of accumulated time across all the operators here.

Then you have this part of the plan and there’s four milliseconds of time across all of the accumulated operators. And then you have the hash join up here, which is happening in row mode. So this is the four milliseconds here plus the four milliseconds here plus one millisecond of time spent in the hash join.

All right. So that’s how that looks. Ergo, which I’m told is the, which I’m told is a word, which is, which is just great, I guess.

If we combine the CTE join situation with a nested loops join, the query inside the CTE will be executed, not just once, but once per row that goes into the loop join. To see what I mean, what we’re going to do is instead of just doing a join, we’re going to do a cross supply with a top one. The cross supply, I’m not saying cross supply is bad.

It’s just there for a little bit of convenience because cross supply does often get optimized to a nested loops join. So we’re going to use this for convenience. So I can show you this execution plan.

Notice the top part of the plan, right? Everything looks the same up there. We still have basically the same two copies of the plan that run, right?

There’s like that looks very similar to the original one. And then down here, we have that second copy of the query that ran just preceded by a top operator here because we had that top one. Right.

So now we’re going to see the one scan here and we’re going to see the 255 seeks here. What’s going to change is that instead of having one scan here. Now, this is an index seek now, and we did 255 seeks into this.

Well, the estimated CPU cost is 255, too. That’s that’s amazing. I was like, huh?

Okay. Number of executions, 255. Okay, cool. So every for every row that came up out of here, right? We got we aggregated everything down to 255 rows here.

All 255 rows went into the nested loops join, let’s say one at a time. And every time this nested loops join got a row, it went down here and said, hey, seek into here. And so we did that 255 times.

From here, we went into a nested loops join and we hit this thing 255 times. Okay, like this is this is the same, right? We still have 255 here and we still have 255 here.

This number didn’t really multiply here because we still we have a nested loops join here that’s sort of protecting us. So let’s have a little fun with this. Let’s add some more work.

Let’s further amuse ourselves. Because we are we are we are nothing if we cannot amuse at least ourselves. If we can’t amuse ourselves, what have we got? So we’re going to add some more work to the initial CTE.

We’re going to add some window functions in. We’re going to add average and row number and count big. Right.

So we’re going to we’re going to make SQL Server do some more work in the initial part of the query. And this is where things I think get kind of interesting. So if we run this whole thing now, we’re going to have to do some multiplication math. Right.

So let’s let’s zoom out a little bit here. I’m actually going to. Oh, let’s see. Can I get that? Let’s see my head. My head is going to be between two operators, but that’s okay. So if we if we look at this top part of the plan, it will be a mirror image of the bottom part of the plan up to a degree.

Up to a point rather. Let’s move this way over here. That’s probably about good.

Oh, tooltip. You just had you had to do it to me, didn’t you? So if we look at this part of the plan, right, if we start up here, you’ll see, you know, like the index scan on T0 255, the index scan on T1 32767. And then going over, there’s a merge join and a segment and a sequence project like that’s doing some like the row numbery stuff.

And if we go over here a little bit further, and if you just kind of keep your eye on the bottom part of the plan, you’ll notice that they are essentially mirror images of each other aside from a couple extra operators here. But they both do the exact same thing. Where things get interesting here, I think, is that we still have the same pattern where every row, the 255 rows that go into this loop join that bring this reference to the CTE and join it to this reference to the CTE.

We still have 255 rows that go in there, but way down over here, things really start to multiply out that didn’t work out. So if we look at this part of the plan specifically, notice that this isn’t 255 anymore. This is 65,025, which if you don’t have a calculator handy or just a lot of fingers, that is 255 times 255.

If you look at this number, this is 8.3 million and change. That’s 255 times 32,767, which is that number up there. So now we have sort of fork bombed our CTE by using, but with the nested loops join, because every time this nested loops join runs, we end up multiplying the number of rows in the table by the number of rows that come out of the loop join.

So if you sort of compare the numbers going across, like this merge join up here, this has 255 rows go in. This merge join down here has 65,025 rows go in. Or rather, like come out, sorry, because you have the 255 rows go into each.

Right. And that’s that that happened because we get we aggregate this down over here. Right.

This gets squished and this gets squished to 65,000. So the 8.3 million gets aggregated to 65,025. Here, the 32,000 gets aggregated down to 255. So now, like like we instead of having 255 rows go out here, we have 65,000 rows go across.

And you can see that the number of rows because we have fork bombed ourselves with the nested loops join, the number of rows that go across here are going to be much larger for the bottom part of the query. And you can actually see far more time end up across all those operators. If you look up here before we go into this nested loops join, we have only used five milliseconds of wall clock time.

If you look down here, look at how the time builds up. I don’t know what my head’s going to be sort of in the way. It’s 518 milliseconds in here, nine milliseconds in here.

We get up over 800 milliseconds by the time we get to here. So all of these accumulated operator times get to 800 right there. And then as we go across SQL Server dealing with the number of rows on a single thread, like we just add more and more time to this.

So there’s a there’s a window spool here. We get up to 1.1 seconds. And then after this segment, we have a we have a table spool and we get to 1.2 seconds.

And then we do this all this stuff and we get up to 1.242 seconds. So depending on like usually when, you know, like we talk like I talk about CTE, I’ll say something like, you know, your CTE will run once for every reference you make to it. But depending on the query plan that SQL Server chooses, your CTE might run way more times than that.

Right. So if you’re if you’re CTE joins to itself or joins to itself or, you know, like just joins repeatedly to like say you, I don’t know, throw a third table in the mix. And let’s say you have to join your CTE to one table and then you have to join your CTE, maybe to another column in that table or to a different table.

Depending on the join choice, your CTE might end up executing way, way more times than just once per reference. If you choose a nested loops join, it’ll execute once for every row that goes into the nested loops join and then has to run your your your reference to that CTE. So isn’t that fun? Isn’t there just so much fun in query plans?

Isn’t there just so much interesting, exciting stuff that just makes your day? Mine too. All right. Cool.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will undoubtedly talk about more fun and exciting execution plan stuff. All right.

Cool. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 21

SQL Server Performance Office Hours Episode 21



To ask your questions, head over here.

What’s the biggest T-SQL re-write that you’ve done for a customer? Conversely, what weird query tuning trick have you done which returned maximum gains for minimum code change (that isn’t option recompile or index tuning).
I have a non tech question, kinda. You clearly have a talent for taking a complex topic and breaking it down to its simplest form to show the underlying theory. Any tips for this? It’s the number one thing I struggle with when writing blog posts. Thanks!
Thank you for your handy tool, sp_quickiestore. I was recently trying to get more information about a stored proc using the stored proc name but it returned there was no information, but when i used a text, which is part of the stored proc, it returns the information I needed. The stored proc has multiple (3) plans because of PSO, could that be the reason it failed when I searched based on the stored proc name?
Hi Erik! When we update our PMS app to the latest version, the database update involves re-running ALL procedures and triggers (modified or not) with DROP and CREATE (not my decision – I have to stick with that though). What is the downside of the above operation? Does it make any sense to run the sp_updatestats after that? Thanx!
why haven’t you been talking about sql server 2025?

Video Summary

In this video, I dive into some of your most pressing questions about SQL Server and T-SQL tuning. We tackle topics ranging from the biggest T-SQL rewrites I’ve undertaken to the most effective query tuning tricks that yield maximum gains with minimal effort. You’ll also get tips on simplifying complex concepts for better understanding, whether you’re writing blog posts or just explaining things in general. Additionally, we discuss the nuances of using stored procedure names versus text when querying the Query Store and explore the downsides of dropping and recreating procedures during database updates. Lastly, I share my thoughts on SQL Server 2025, highlighting both its potential and the areas where it falls short. Whether you’re a seasoned DBA or just starting out, there’s something here for everyone to learn from.

Full Transcript

All right, you heathens. It’s time to give in to our darkest desires, and we do office hours. This is where I answer five user-submitted questions at a time and try to give the best semblance of an answer that I can provide. The usual stuff here, if you like this channel and you want to, and you feel that it’s worth your wallet, you can become a member of the channel to support my efforts to bring you the highest quality SQL Server content known to humankind. That’s down in the video description. If you want to ask questions that appear on Office Hours, this link is also down in the video description. It’s very easy and anonymous. You don’t have to hardly do any work. Other things that are useful to me, liking, commenting, and subscribing, because that’s, you know, I guess cool too. If you need if you need a consultant for SQL Server, you would like to hire me to come work personally with your deepest, darkest data, you can hire me for all of these things. Health checks, performance analysis, hands-on tuning of the worst of your worst, dealing with your performance emergencies, and training your developers so you do not have performance emergencies anymore. I do all that stuff and more. And as always, my rates are reasonable.

All right, come on. Next slide. There we go. I clicked. If you would like to buy my performance tuning content, you can get all 24 hours of that for about $150 US buckaroos. No tariffs added with that discount code, and that will last you for life. If you would like to get in on the pre-sale prices for my T-SQL course, Learn T-SQL with Eric. That’s me. You can get almost all of the beginner material is now out and publicly available. Many hours of content. The price will be going up to $500 once the advanced material is done after the summer. And if you are attending past data community summit in Seattle, and you’re coming to the T-SQL pre-cons that Kendra Little and I are teaching, you will, of course, get complimentary access to the course because this is companion material to the course.

That means it is not the same material, but it is a good companion to the material there. So if you’re attending the pre-cons, you’ll get this stuff. Ain’t that your lucky day? And, of course, this speaking schedule is going to be grand. The Red Gate Roadshow is taking me on tour. I will be in New York City. Surprise!

August 18th and 19th. Dallas, September 15th and 16th in Utrecht. Rolls right off the tongue October 1st and 2nd. And, of course, past data community summit taking place in Seattle, November 17th to 21st. So I will be live and in person in all of these places, I don’t know, to answer your questions, give you hugs and high fives, tell you you’re awesome at your job, whatever you need me to do.

Anything for a buck. But with that sort of the way, let’s party. Let’s do these office hours-y questions. And let’s zoom in here and make sure that we are nicely framed up, make sure everything is legible above my gigantic head.

And we’ll start with this handsome devil up here. What’s the biggest T-SQL rewrite you’ve done for a customer? Conversely, what weird query tuning trick have you done which returned maximum gains from minimum code change that isn’t option recompile or index tuning?

So I have written, I have rewritten entire applications for people. Like, I mean, like maybe not every single store procedure because like, you know, they’ll be like, hey, we don’t use this store procedure anymore. Which, you know, that’s, that’s cool with me.

But, you know, pretty much like, like every store procedure that was, you know, currently in use, I’ve done, I’ve done rewrites for. Or, you know, like, like depending on the development team a little bit, like there are some people, like there are some times when like I can rewrite, you know, like a handful of store procedures and just be like, like, you know, like, like follow this pattern generally and, you know, other stuff. And if you get stuck on anything, let me know, we can work on it together.

But, you know, I like really like, like hundreds of store procedures and functions. There was, there was one client. I, I, like, I think if I remember the final count in the rewrite folder, it was like, it was something like 56 scalar UDFs that I, I, I hand rewrote.

And that was, that was just the UDFs. It wasn’t even like the store procedures and other stuff. So that’s that, that’s that answer.

Sure. The, but of course, like the, the biggest, the biggest query tuning trick is, you know, it’s probably just like getting batch mode involved when it’s appropriate. Not only because, you know, like not even adding a, like columnstore index, but just, you know, playing some trick on SQL Server so that batch mode gets involved somewhere opens up a lot of, opens up a lot of doors.

You know, obviously it’s better with columnstore as a data source for a lot of these things, but sort of generally like getting batch mode involved solves a lot of problems really quickly. That would otherwise take a lot of index tuning and consolidation and, you know, query rewrites and trying 50 million things to get nudge the optimizer towards some specific pattern or path that I care about. But, you know, batch mode is probably the easiest one to do there.

If I had to pick a second place, you know, as far as just like, you know, like, like bang for the buck, it’s, you know, like breaking up queries that are, you know, miles of CTE. And, you know, getting, like using temp tables at certain like logical breaking points to materialize results. Like everyone thinks that the CTE materializes a result, but it doesn’t.

And so using temp tables in place of that is often very valuable as well. All right. Next up, what do we have here?

Oh boy. I have a non-tech question, kind of. All right. Shortly. You clearly have a talent for taking a complex topic and breaking it down to its simplest form. Oh, thank you.

That’s what I’m known for. Being simple. To show the underlying theory. Any tips for this? It’s the number one thing I struggle with when writing blog posts. Thanks.

So the way that I teach is the way that I learn. If I don’t break things down, like for myself step by step, I get lost and don’t learn things. So I need to break things down into very simple terms that fit into my brick head and make sense to me there.

If I had to give you advice about how to do that, it would be like when you’re writing a blog post, you know, in your head, it’s really easy to like logically jump from one thing to another to get the words out. But like when you speak it out loud and you like see the stuff on the screen and you like something catches your eye and you’re like, wait. Like go to talk about it and try to explain something.

And if you get stuck on something, that’s something else that you need to put in the post. That’s another thing that you need to add in to further break this thing down to make it explainable. Like a lot of blog posts, like even mine, like, you know, I’m not going to pretend I’m not guilty of like, like gloss over some stuff and like leave some details out either because, you know, it’s like a whole other blog post to explain it or it’s like too much of a detour.

But like if like if you want to be able to do that, like like like don’t just write your post, like read it out loud or like rehearse the material out loud. So you have a better idea of like not only what you want to say about it, but like like you really find yourself getting deeper into the nooks and crannies when you have to talk about stuff like out loud. Like you don’t even have to. I mean, you can record it if you want.

I don’t know. But like having that sort of extra added thing where you’re like, like, oh, but is this the well, no, let’s not. OK, never mind. So that would be that would I don’t know. That would be my advice is speak your content out loud because that will force you to to think more about everything that you are looking at and everything that you are saying.

And if you hit one of those unexplainables, that is that is often a good sign that that you need to break things down a little bit further. All right. We’ve got got quite a thing here. Oh, dear. That didn’t work out well. Let’s try that. Let’s try that little rectangle again.

Thank you for your handy tool, SB Quickie Store. I was recently trying to get more information about a stored procedure using the stored procedure name, but it returned. There was no information. But when I used a text, which is part of the stored procedure, it returns the information I needed. The stored procedure has multiple plans because of PSO.

Could that be the reason it failed when I searched based on the stored procedure name? So, yes. So like what what you what you guessed is most likely correct, because in the in the parameter sensitive plan optimization, like like there is an object ID in the XML, but the like the way that the plan is expressed is a lot like dynamic SQL, where it’s like almost completely detached from the object ID of the thing that called it.

So like in query store, there’s an object ID for the procedure. But if like your procedure doesn’t do anything really like meaningful or anything that query store captures, depends on your capture settings that like puts that into query store, then it’s going to say no other stuff that like, you know, has messed me up trying to find procedures and query store was like, like non defaults, like like non DBO schema.

Like there is a there is a like a procedure schema and a procedure name parameter for it. So if it’s not in DBO, that’d be another thing to try. I don’t think that quickie store handles square brackets gracefully.

I did some work to try to make it so that the procedure name parameter was sort of overloaded. So if you put in like, you know, like like square bracket DBO dot procedure name, it would like it would use parse name to break that stuff out.

But I forget how far I took it. So like like making sure that you don’t put the procedure schema and name in the square bracket to be another thing to try. But I think for you specifically, you are generally correct that that would be why it didn’t show up.

It is kind of a pain in the butt. But at the same time, like like I, I really want to avoid XML parsing in quickie store because querying the query store underlying views is kind of painful enough. You know, like Blitz cache working on that was a lot different because getting data out of like the plan cache, like aside from the XML parsing bit was generally pretty quick.

You know, of course, like depending on the size of stuff and a million other details. But like, you know, the XML parsing was what really took up time in there. Like querying the query store stuff is such a misery that I got.

Like I don’t want to add XML parsing in there to like go look for like that hidden object ID in the plan XML for the to detect the parameter sensitive plan optimization stuff. There might be some shortcuts around that. I just haven’t looked much at it yet. You know, like that that’s sort of why the query text search thing is in there.

And I understand that the query text search part of it is not as fast because we’re wildcard searching a bunch of query store data for some some query text. But yeah, I don’t know. It’s something I’ll think about, but I don’t know that I’d really get to it very quickly, at least at this point. All right. One one more question here. Hi, Eric. Hey, how’s it going? That’s me.

When we update our PMS app to the latest version, the database update involves rerunning all procedures and triggers modified or not with drop and create. Not my decision. I have to stick with that, though. What is the downside of the above option? Does it make any sense to run the update stats after that?

So I don’t think it does answer your questions kind of backwards. I don’t think it makes sense sense to run update stats after that. I think the one thing that annoys me about drop and create is that will create new object IDs for everything that gets dropped and created.

And for me and my analysis procedures. So SP quickie store being one of them. Human events block viewer.

Like there’s a lot of different ones where there are like there’s like not always like the procedure name or the. Like an object name there in there. Sometimes it’s just based on an like an object ID and I have to decode objects in the database by like object ID database ID.

And if you’ve gotten created like dropped and created your objects and they get new object IDs, I can’t I can’t resolve those names. And so that messes me up. So that’s the real downside there is you hurt you hurt me wound me terribly.

These these practices aside from that, I don’t I mean, I don’t really I can’t really think of anything that would be all that annoying with it. You know, you’re going to lose query plans. You’re going to.

You know, the plan cache sucks anyway. You know, you can have a bunch of recompiling stuff when you start creating query plans. But again, the plan cache sucks anyway. So I don’t know.

Not not a whole lot. Not a whole lot to go on there. All right. Question number five. Why haven’t you been talking about SQL Server 2025? So I’ll be very honest with you.

I don’t find anything all that compelling with it. All the stuff is just dumb to me. You know, like vector. OK, great.

Vector search. Cool. OK, fine. It’s there. You know, I care about T-SQL enhancements. I care about performance enhancements. And, you know, like like there is some neat.

There are a few neat things in 2025 that I do want to talk about. The optional parameter thing. The the optimized Halloween protection using using accelerated database recovery. Like, you know, the optimized locking stuff.

Even though the optimized locking stuff has kind of been around for a little bit in Azure. Like there are a few things in there that I think are cool and that I want to talk about. But here’s the thing. Microsoft has been so heavily invested in screwing up fabric that they didn’t take a lot of money.

There’s a lot of time out to screw up stuff in SQL Server 2025. So a lot of it is just kind of like like there’s just not a lot in there aside from like the dumb AI stuff. And like it was like, oh, it’s ground to cloud to fabric.

Who? Come on. It’s like ground to cloud to nowhere. Right.

Who cares? Anyway, those are those are my five answers to these five questions. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And I will see you in another video sometime soon where I will most likely be continuing to try to pedal my course. Learn T-SQL with Eric. Because it’s a good one.

Paul White Tech reviewed it. So at the very worst, at the very worst, it’s it’s it is entirely technically accurate. So I’ve got that going for me.

All right. Thank you for watching. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Grouping Sets vs Union All

Learn T-SQL With Erik: Grouping Sets vs Union All


Video Summary

In this video, I delve into the lesser-known world of `GROUPING SETS` in SQL Server and explore its alternatives, including `CUBE` and `ROLLUP`. While these features are powerful for complex aggregations, they often face performance challenges due to how the SQL Server optimizer handles them. I demonstrate a real-world example using Stack Overflow data, showing that even with a columnstore index, `GROUPING SETS` queries struggle to utilize batch mode efficiently, leading to slower execution times compared to equivalent queries without these features. By comparing the query plans and performance metrics, you’ll see firsthand why sometimes sticking to simpler methods can yield better results in practical scenarios.

Full Transcript

All right, we’re gonna talk about, boy, something I don’t see very often, be honest with you. Maybe I don’t see it very often because a lot of people don’t really understand how this stuff works and so they get afraid of it. But we’re gonna talk about grouping sets and alternatives here. And the alternatives are important because there’s also the distinct possibility that someone wrote a grouping sets query once upon a time and, gosh, it just didn’t perform terribly well, which is, uh, eh, that’s something that can happen. Boy, howdy. We’re gonna look at that. Um, all of this content is preview material for my course Learn T-SQL with Erik. Uh, it is on the pre-sale price right now of 250 US dollars. That’ll last you for the rest of your year. Uh, it will go up to 500 US dollars once the advanced material is published after the summer. Uh, the, the beginner material is on the very cusp of being completely published. So, uh, you’ll have many hours, uh, of, of content to get through, uh, anyway. Uh, this is all companion content, meaning not the exact content, but it is companion content to, uh, the material that Kendra Little and I will be presenting at, uh, past data community summit in Seattle. So if you attend our pre cons there, uh, you will, you will get access to this material, uh, with your price of admission. Um, I think that’s about it. So we’ll just get at a lot of this thing. And, uh, well, I mean, there’s a whole slide here. Now you can see all my secrets, right? You can see all this stuff that I do when you’re not watching. So, uh, we’re gonna get into the old stack overflow database here.

And, uh, I’m gonna show you what a grouping sets query looks like. There are of course other grouping set, uh, let’s just call them clauses that you can use, uh, like cube and rollup. Uh, cube will, uh, create distinct grouping sets out of all possible column combinations and rollup will, uh, create, um, well, rollup is a little bit more, uh, uh, complicated in what it does at least, at least it’s more complicated to explain where you define your rollup clause. And then SQL Server will take whatever column you put first in the rollup clause and match that to other columns. So it doesn’t do all unique sets. It just does like a subset of unique sets based on whatever you put in first. Uh, I cover it in way more detail in the actual material.

So if you’re really fascinated by what cube and rollup do, uh, you can, you can buy the material and watch it or just watch it if you’ve already bought it. But, uh, with grouping sets, you can define whatever sets of columns you want to group things by. Uh, I have chosen to do, uh, a few levels of grouping here, one by, uh, all, by all three columns that I’m selecting.

And then one grouping set for, uh, each of the columns individually. I could of course add stuff in here. Like I could add another line that says vote type ID and post ID and another one that says vote type ID and vote year and post ID and vote year. But just to keep, just to keep things relatively simple, I’m going to, um, I’m going to just do the three together and then the three individually.

There’s also this empty parentheses at the very end, which is going to be the global aggregate for this. So I’m going to start this thing running and you’ll, you’ll notice that I have a columnstore index up here. I have not created that yet. Uh, we’re going to save that for later.

What I want to show you here is just that, um, SQL Server, uh, SQL servers optimizer fights real good and hard against, uh, using any sort of batch mode stuff, uh, with these grouping sets queries. Uh, even like we’ll show you in a second with the columnstore index, but right, right about 30, 30 or so seconds for this thing. And you know, grouping sets queries, they tend to need to process a lot of data.

Like you’re talking about like, you know, doing like big aggregations of stuff. It would seem a fairly natural fit for the optimizer to want to use is like batch mode for these things, but we just don’t get any. You can kind of figure that out from the query plan by looking at the choice of operators, not always, but in some of these things.

So like, you know, first we have, uh, parallel exchanges, parallel exchanges, like there’s one here, repartition streams, repartition streams, repartition streams, distribute streams. Uh, that, well, that’s a filter. That’s not quite it.

Uh, gather streams over here. None of the parallel exchanges support batch mode. There are operators in the plan that support batch mode, but they all, I, I, I, at least the last time I ran this, they all ran in row mode. So like there’s a sort at the end.

Uh, there’s a filter that runs in row mode, filter support batch mode. Uh, stream aggregates here. One, two, three, uh, well, one, one up there, one down there. Those don’t support batch mode.

The hashes do support batch mode, of course, but they all run in row mode. So this whole thing is kind of cooked. This whole, this whole thing takes 30. Well, actually we should go look.

Cause like that looks pretty funny. 43 seconds. And then I got 20 seconds, 43 seconds, 32 seconds, SQL Server. What happened? Let’s, let’s go, let’s go look at the actual tail of the tape over here in, uh, the query time stats. Hopefully my big head won’t be in the way.

So, uh, that, that one parallel operator lied to us. So we used 166 seconds of CPU time to get 32 seconds of wall clock time. So what we appreciate, we appreciate SQL servers willingness to use a parallel execution plan here.

We do not appreciate the lack of batch mode here for them. Look at that number of rows that we are selecting and aggregating. Gosh darn it, SQL Server.

Why won’t you use batch mode? So what I’m going to do is I’m going to kick off, uh, creating this columnstore index. And just so I don’t get yelled at later, I’m going to make sure that that semi colons in there. We don’t want to improv.

We don’t have a lack of termination here. Do we? I’m going to get that started and I’m going to show you what an equivalent query without the grouping sets stuff looks like. Uh, so we’re going to have one query at the beginning where we select and group by all three columns.

That’s the vote type ID, post ID and vote year. Uh, I’m using a little trick that I showed off in one of my earlier videos where I’m doing this little cross apply values trick. To have, uh, one source for this date, part year creation date thing.

Uh, so that I can reference that one. Uh, well, I can reference that a few times throughout the query without having to keep rewriting date, part year creation date. Now, of course, like you have to do more typing, right?

So we have one query that does all three columns and groups by all three. We have another query that groups by just vote type ID. We have another query that groups by just post ID.

We have another query that just groups by vote year. And then we have one query that, that essentially just like doesn’t aggregate on all of them. Now, if I run all, what I want to do is run all these together. Now, keep, remember that I just created a columnstore index here.

We wouldn’t need the columnstore index for this thing to run in batch mode. Uh, SQL Server would shoot, could shoot, can choose batch mode very naturally for this. Um, I’ll show you that at the end, but the important thing is that when you look at this plan, there’s two things.

There’s a couple few things about this that are much, much different from the plan that we just saw using the grouping sets method. One, there are no parallel exchange and exchanges in this until the very end, right? There’s one necessary one at the end where we do gather streams.

Uh, but there’s also no stream aggregates. Like we don’t have all the repartition stream stuff in here. And of course, if we look at, we look at the execution modes for all of these operators, these are all going to say batch mode, batch mode, batch mode. So this whole thing runs a lot faster.

If you look at the query time stats for this one, just cause it’s not, maybe not terribly obvious from the, from the query plan as a whole. Uh, we spend, uh, 1.13 seconds of CPU time for 1.8 seconds of elapsed time, right? So that’s a huge improvement, uh, using batch mode over, uh, like row mode execution for these queries.

Let’s go back up to the, the, uh, grouping sets function query. And let’s run this with the columnstore index in place. If we let this go for a little bit, right?

It does, it does go for a little bit still. Uh, it’s already slower than the, the union all method of doing this. Um, this thing will take about 12 or 13 seconds in total. Um, and the, the query plan is different, right?

Like with, with the, um, with the columnstore index in place, you don’t see all, like, you don’t see a lot of the repartition stream stuff, but there is still some real junk in the query plan. Like this whole section here, where, uh, like very early in the plan, we gather streams. That means we end a parallel zone right here and then we restart a parallel zone right here.

But in, inside of that serial zone, right? Where we, it’s like no more parallel and then back to parallel in here, we, we do a stream aggregate. Because this is just something that grouping sets has to do for the global aggregate.

Like I’ve, I tried all sorts of tricks to get rid of this. I tried an option hash group hint. Uh, I tried a query rule off hint to prevent the stream aggregate, but still it’s stuck in there. I don’t want to start getting into all that stuff in this video, but man, this thing was just very, very persistent.

And if we look at the query time stats for the grouping sets one, like we go into properties and we look at query time stats here. Like we were at 1.7 something seconds for the one that, uh, the union all, uh, format and like 13 seconds of CPU time here, we’re up to 30 seconds of CPU time and 10 seconds of elapsed time. So this isn’t like, you know, like when I say like, I don’t see this stuff very often.

I kind of get why, like, it’s not just that the syntax syntax is kind of like weird and difficult to remember, or even like occur to you. If you need to do this sort of thing. Uh, it’s that like, even if you, you, you, you’re like a steadfast remember of syntax, which I am generally not.

I need to refer back to, you know, stuff that I wrote before to remember like half the things that I want to do. Uh, like, even if you are really good at remembering this stuff, like you might try this and just be like, holy cow, this is slow. So let’s, um, uh, let’s, uh, get rid of the, the, uh, the columnstore index that I created.

And I just want to show you that kind of naturally, uh, the union all version of this does a lot better on its own. Just sort of naturally using batch mode. SQL Server is like, I’m not afraid of batch mode here, right?

Like, like, it’s not going to be as good as when we had the columnstore index to read from. So like, if you’re doing this kind of like big analytical aggregation stuff, there’s a, you know, very, very good chance that you’re just going to want to use a columnstore is like the source of your data anyway. But even just like getting some batch mode stuff, like this, this runs pretty well or runs better even without the columnstore index, like then the grouping sets version of the grouping sets version was like 30 seconds or something.

This is, this is like twice as fast without even having columnstore is like the, the source to read from the main thing that slows down is we have to read from the clustered index over and over again. Right? Like we do a lot of stuff in there, but even that’s happening with batch mode on road store. We just don’t have those.

We just don’t have that really nicely compressed columnstore index to really speed things up as we’re reading data. So we still kind of have like this IO bound portion of the query that, um, like, like it doesn’t have the good column source source to read from, but like, it still happens in batch mode. So it’s still an improvement. And for all of these branches, notice that we don’t have like, we don’t have that break where we have like the, the, like, like the serial zone with the stream aggregate.

We don’t have all the repartition streams in here. This all pretty cleanly just runs in batch mode and does everything in batch mode and is a lot faster just naturally. Like I, of course, like I, like I said, I would absolutely prefer to have the columnstore index as a data source for this type of thing. But even without that, even just getting batch mode on rowstore, uh, this ends up a lot faster than the, uh, the grouping sets alternative.

So like when you’re, when you’re writing these types of queries, you know, of course, like what you’re reading from is very important. Um, you know, rowstore indexes, just not, not quite the same jam as columnstore indexes for these big analytical agro ag, aggregative, uh, types of queries. But, uh, like, you know, depending, like often just how you write the query and I’m going to, I’m going to say something unfortunate generally.

In general, the more you type, the better off you are, right? Like, like taking, you know, shortcuts like this, this might, you might like learn this in some like shifty Microsoft DP exam. And, you know, think that you’re, you know, like you’re, you’re the hottest cake on the block, uh, knowing, knowing how to use grouping sets, but, you know, get, get out there in the real world with, you have to do anything practical on like a meaningful set of data, not like worldwide importer works or whatever.

You’re, you’re, you’re going to want to abandon that ship pretty quickly. So anyway, uh, thank you for watching. I hope you learned something.

I hope you enjoyed yourselves. Uh, and I will see you in the next video, uh, which, um, I think this is, this is going to be a Friday video. And then, um, the next one will be an office hours episode. And boy, just, we just have so many exciting things.

So many exciting things coming up, don’t we? All right. Anyway, thank you for watching. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Erik Tests a New Microphone

Erik Tests a New Microphone


Video Summary

In this video, I’m Erik Darling from Darling Data, and I wanted to share a bit of behind-the-scenes content as I test out my new microphone. The old one met its untimely end after countless hours of discussing databases, but fear not—my audio quality remains top-notch! This video serves multiple purposes: it allows me to ensure the sound is clear across various devices so you can enjoy my content without any hiccups, and it’s a direct way for you to support my endeavors. If you’d like to contribute towards better equipment, consider signing up for a channel membership starting at just $4 a month. Your support not only helps me improve but also gives you access to exclusive perks and opportunities to ask questions during office hours episodes.

Additionally, I’m taking this opportunity to promote some of the services and courses I offer. Whether you’re looking to enhance your SQL Server performance with my consulting services or want to dive into T-SQL training, there are plenty of ways to get involved. The pre-sale price for my 24-hour T-SQL course is currently at $250, but don’t miss out on the limited-time offer! And if you’re attending Pass Data Community Summit in Seattle, you’ll receive free access to this companion content with your admission. Mark your calendars for my upcoming live appearances in New York City and Dallas, as well as Utrecht, where I’ll be sharing more insights and expertise.

Full Transcript

Erik Darling here with Darling Data. And if the first thing you’re noticing and admiring in this video is my new clip on mic, my eyes are up here, buddy, then you are a wise and you are a dedicated follower because I was talking the other day about how my old microphone, I was wearing a big headset at the time, my old microphone snapped in half from listening to me talk about databases. Hopefully you will not. snap in half from listening to me talk about databases. But we can consider the entire point of this video is Erik tests a new microphone. And the reason I’m doing this this way is because it’s very easy for me to upload this to YouTube and then listen to it across a variety of devices so I can ensure that I sound okay on a variety of devices. So let’s, let’s, let’s spiel ourselves here. If you would like me to buy a better microphone, if you have complaints about audio quality, well, you can do something about that. You can sign up for a channel membership and for as few as $4 a month, you can, you can support my endeavors to purchase nice microphones. There’s a, there’s a link down in the video description that allows you to do that. If you don’t care about my microphone, well, I don’t know what to tell you. You can like, you can comment, you could subscribe. You can also ask me questions for free privately that I will answer publicly during my office hours episodes.

Isn’t that nice? Isn’t that nice? These nice things I do for you. If you would really like me to get a new microphone, you can hire me as a consultant. It’s a great way for me to buy new microphones. And I am available to perform all sorts of miracles upon your SQL servers, health checks, performance analysis, hands-on tuning, dealing with performance emergencies and training your developers. So whenceforth you have no more performance emergencies. I do all of these things. And as always, my rates are reasonable. Anyway, uh, if you would like to buy some performance tuning training from me, I’ve got 24 whole entire hours of it. Uh, if you, uh, there’s a link down here that assembles all this stuff for you. But of course, uh, you, you, you can get the everything bundle there, uh, for a hundred, about 150 bucks with a 75% off code. And that will last you for life. No subscription required. Um, if you want to get in on my new T-SQL course. While it is at the pre-sale price of 250 US dollars, uh, you can do that now, uh, videos are dropping and being recorded. I’m going to go do some after I do this. Uh, and, uh, the, the price, once the advanced material is fully published after the summer, will go up to $500.

So please do save yourself 250 bucks unless you’re really itching to donate to the mic fund. Uh, uh, this is all companion content to the pre-cons that Kendra Little and I are teaching at Pass Data Community Summit in Seattle, November 17th to 21st. So if you are attending Pass and hopefully chosen wisely and you’re attending our pre-cons, well, guess what?

Uh, you, you will get free access to that, this companion content with, with your, with your admission there. Uh, if you would like to see me live and in person, Red Gate, Red Gate’s little road show. I’m being taken on tour sort of around the world, limited world tour, I guess.

Uh, some small clubs and venues, uh, New York City, August 18th and 19th, Dallas, September 15th and 16th. That’s the one in Texas and, uh, Utrecht, the one near Amsterdam and the, and the Netherlands. Uh, it’s a Hamlet and it’s beautiful.

Uh, October 1st and 2nd, but let’s, uh, let’s do a, a, a short video here for me to test my microphone with. All right. So, uh, a lot of the times when I’m teaching about dynamic SQL, I like to say things like, if you want to properly parameterize your dynamic SQL, you have to use SP execute SQL and you have to feed it some parameters and you have to feed it some, some values to, to, to substitute those parameters with.

And that, that’s a great way to avoid SQL injection because everyone should be trying to avoid SQL injection. It’s unpleasant, gets, gets you fired. It might get your company to go out of business.

There are all sorts of terrible things that can happen when you are SQL injected. So, uh, I, I do, I do recommend avoiding that. Askew SQL injection at all costs.

But there is kind of one funny circumstance where you can not use SP execute SQL and you can still somewhat parameterize your dynamic SQL. So we, what I’m going to do is, well, what I’ve done already is, uh, this, this only works with linked servers, at least that I’ve ever seen. Um, there might be some really insane work around where it works otherwise, but Hey, what do I know?

Uh, I know T SQL with Eric. It’s apparently what I know, but I’ve already added a loopback linked server to my very own server. Uh, I am, uh, for the, for the most part, SQL Server monogamous.

So, uh, I, I only work with one version in addition of SQL Server at a time though. I guess, I guess, I guess these days I am philandering a bit with SQL Server 2025, but, uh, we don’t, we don’t need to talk about that publicly. We can save myself a little shame and embarrassment here, but this linked server will allow me to do something kind of funny.

This linked server will allow me to, uh, declare some SQL and I’ve, I’ve got, uh, I’ve got my, my string that I want to execute, uh, being set and assigned right here between these two things. And I’ve got this, uh, ID local variable set to eight. Now, uh, in my where clause in my dynamic SQL, I’ve got a question mark, right?

And it’s sort of like that crappy store procedure, SPMS for each DB where the question mark is the database name. And you have to say like use bracket question mark thing to, to get into different databases. It’s kind of like that.

But what I can do is I can use, uh, I can use the, the less safe version of executing dynamic SQL. And I can say execute, uh, at SQL, and I can pass in the ID as a second thing. And that second thing is going to act as a parameter replacement for that question mark.

But I have to use exec at, I don’t know why Zoomit has forsaken me like that, but I have to, I have to execute this at the linked server. And when I do that and, and I, you know, go through great pains to use the right database and everything. But when I, when I run all this, uh, this server talks to itself and it returns the thing that I wanted, which is post type ID eight from the post types table.

So anyway, thank you for joining me for Eric tests, a new microphone. I hope you enjoyed yourselves. I hope you learned something.

I hope you like my new microphone because, uh, I don’t know what to get if this, if you don’t like it, perhaps, perhaps you can recommend one to me. Um, it just, it has to plug into one of these things and it has to have a very special connector like this. So if you’ve got recommendations along those lines, um, well, I’m, I’m two ears and one mouth, but that, that, that expression doesn’t really, doesn’t really resonate much.

Does it anyway? Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Spurious Left Join Logic

Spurious Left Join Logic


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.