Stubborn NOT EXISTS Ordering

Stubborn NOT EXISTS Ordering


Video Summary

In this video, I delve into the fascinating world of stubborn not exists ordering in SQL Server queries. You might be wondering why I’m wearing a headset that’s making me look like an air traffic controller—well, it’s because my fancy wireless microphone setup broke, and I’m waiting for replacements to arrive. This quirky situation has led to some humorous moments, but it hasn’t stopped us from diving deep into SQL Server optimization techniques. I use dynamic SQL and XQuery to generate random not exists predicates and analyze their execution order in the query plan. By running this process multiple times, we can observe how the optimizer handles these clauses and whether reordering them might improve performance. So, grab your headphones (if you have any) and join me as we explore this intriguing aspect of SQL Server’s query optimization!

Full Transcript

Erik Darling here with Darling Data, and you might be wondering why I’m wearing a headset. Well, it’s a funny story. It turns out when you spend like 800 bucks on a fancy wireless microphone setup, this thing is fine, but the microphone that plugged in here, they don’t send you a backup of one of those. And it turns out that the little wire thing that you usually saw on my shirt about right here is very fragile. And if you try to adjust it, it’ll just snap. And so I’m waiting for my replacement ones to show up. Unfortunately, my scheduled YouTube videos are going to run out before they arrive tomorrow. So we’re going to do a few of these with the old headset, and I’m going to look like a goofy air traffic controller. While I talk to you about SQL Server. So I hope you I hope you can survive these trying times, my friends, where Erik Darling is wearing a headset, because I’m having a tough time with it. Honestly, I feel like I look stupid in this thing. Anyway, let’s talk in this video about stubborn not exists ordering. And by stubborn not exists ordering, well, of course, what I mean is the order that like SQL Server’s optimizer is famous.

For being able to take a query and do all sorts of stuff with it, play all sorts of tricks, like do fun things to like reorder joins. But one thing that it doesn’t do. At least that I can I can’t get it to ever do it is reorder the order of not exists predicates, or not exist sub queries, whatever you want to call them in a query. And I think it’s it’s kind of amusing the way that it doesn’t do this. If there’s a message here, it’s that if you’re going to write a query where there are multiple not exist checks, you may want to spend a little time figuring out if the order of them improves query performance at all.

I’m gonna I’m gonna I’m gonna give you a little spoiler here for our query, it doesn’t make much of a difference. It’s gonna run in two to three seconds anyway, because I did I did many things right along the way. But for you out there and real real user space land, it might it might actually make a difference. So before we do that, haha, we need to shh feel ourselves, sh feel all over ourselves. If you appreciate this SQL Server content, even when I’m wearing a headset, which is I know is hard.

You can sign up to become a member of the channel. And you can for as few as $4 a month, support a starving SQL Server consultant. If you if you are also starving for some reason, maybe AI took your job on already, I don’t know. You can do all sorts of fun stuff to help this channel thrive and survive. And we won’t put food in my stomach, but it’ll put a smile on my face. You can like you can comment, you can subscribe. And if you want to ask questions privately that I will answer publicly during my office hours episodes, you can do that at this link. All of the things that I talked about are very conveniently linked for you down in the video description. You don’t even have to think much about it. You can just click randomly on things until something works. Just just like with SQL Server. It’s a good time. If you need consulting from a guy in a headset.

I am available as a SQL Server consultant, not just not just a pretty face on YouTube, I could be a pretty face on a zoom call for you to health checks performance analysis hands on tuning of whatever you need tuned. Fixing your SQL Server is a SQL Server is a SQL Server. Fixing your SQL Server performance emergencies and of course training your developers so that you don’t have those emergencies anymore.

It’ll be a nice good time for you. I promise everything will go your way. You will you will you will be endowed with the the luck of the luckiest civilization out there. If you would like to get some training from me, I have 24 hours of performance tuning training.

You can get it all for about 150 US dollar rules. No tariffs on that. I promise we’re staying tariff free here in the data darling world. We go to that link you put in that discount code and you’ll get the everything bundle for $150.

It’s nice. I also have a new T-SQL course that is currently publishing as we speak. I have finished the read query portion and the modification query portion. Next up will be isolation levels and then programmability.

So it is half done. At least the beginner portion of it is the advanced portion will be after come out after the summer. The pre-sale price until all the advanced material is out is $250. If you will go up to $500 once the course content is fully fully released.

Just so you know, if you are attending past data community summit in Seattle this November, and you are attending the pre-cons that Kendra Little and I are doing on T-SQL. This is companion material to that content.

So if you attend the pre-cons, you will get access to this content as part of your admission to those. Speaking of speaking. Ho-wee.

Boy, am I my arms tired. Pass is going on tour. It’s the Red Gate Roadshow and they have cordially invited me to go to all of these things. New York City, August 18th and 19th.

Dallas, September 15th and 16th. And Utrecht, which is a hamlet near Amsterdam, October 1st and 2nd. And of course, this is all leading up to past data community summit taking place in lovely Seattle, Washington, November 17th to 21st, where Microsoft has indefinitely canceled their build conferences and has relocated them to sunny Las Vegas.

So, I don’t know, maybe, maybe pass a move to Vegas too. Who knows? Anyway, with that out of the way, what I want to show you is like, like, A, I’m going to prove my point.

But B, I’m going to show you how I prove points like this to myself. It’s often quite a process. It’s quite an endeavor, quite a chore to do these things.

But they help me. They help me understand things and they help me see things how they really are. So, what I did in order to sort of prove this out was I used my best friend, Dynamic SQL, and my other best friend, XQuery.

And what I did was I built up Dynamic SQL in a way that I could grab the query plan for the query that ran, have that query generate not exist clauses in random orders, and then give, like, show me the query that ran in the order that the clustered index things happened in on which tables, so that I could match up the order of the not exist subquery clause predicates, and the order that things happened in the query plan.

It was all very tedious. But when I got it right, it was very exciting. So, I’m going to walk you through the Dynamic SQL portion, and then I’m going to run this a few times, and then I’m going to show you the results.

All right? Cool. I hope you enjoy it. So, we have some variables up here, some local variables. These are not formal variables that we’re going to use to hold various things. This will hold the Dynamic SQL we’re going to execute.

This is going to hold the parameters for the Dynamic SQL. This is going to hold an output parameter for the Dynamic SQL. Why Zoomit just dissed me like that live on YouTube? I don’t know.

These are the parameters that we are going to pass into the Dynamic SQL. This is going to, again, this is going to be the output thing for this, and this is going to be the output thing for this down here.

Maybe I could have organized those slightly better. And we’re going to use this replacement thing here. This replacement thing is going to come in handy in a moment, which we’ll get to it. Just remember, there is a local variable called replacement.

This is going to be our general Dynamic SQL setup. We are going to set this every time, and we have this sort of, we have this thing over here that says replacement.

This is going to act as our token. This is the thing in this batch that we are going to replace with our varying not exists predicates. We also have this lovely piece of SQL in here, and this lovely piece of SQL is going to get the execution plan for the query that’s for the session that’s running here, which works.

I assure you, this is correct. As crazy as it looks and sounds, this is correct. So what we do is we set that, remember that replacement variable I told you that was very, very important.

We use this replacement variable and we set this, and we’re going to use this brand new, brand spanking new, fresh off the factory lot string ag function that came out in SQL Server 2017. And we’re going to use this instead of more XML.

There’s this XML down below. So don’t worry, all of your fetishes will be satisfied. And we are going to aggregate this column with an empty space. And we are going to say within group order by V.O.

You’re probably asking yourselves, what are V.C and V.O? Those are fantastic thoughts to have. And they are wonderful questions to have answered.

So this values clause generates two columns. One of them is this nchar10 with a not exist. And you’ll notice that each one of these not exists has a different table in it.

We have badges, comments, posts, and votes. And then we have this other thing that it generates for new ID. New ID is how we get the random ordering.

So this values clause, which is aliased as V, has the C column, which is where we’re holding the nchar10 and not exists. And then the O is where we’re holding the new ID.

So when we run this all together, we’re going to concatenate all of these not exist clauses ordered by the new IDs that get generated in here. And we are going to get randomly ordered not exists subquery clause predicates.

Okay, cool. After that, we are going to replace in the in the dynamic SQL portion, the replacement token with that string that we just generated and assigned to the replacement variable.

We’re going to execute our query and we are going to output the C and the query plan. All right. The C up here is the C is just, of course, the result of the count big.

We don’t actually do anything with this. We just throw it away. But the query plan, we do something very exciting with. You’re ready for some XML.

You look like you’re ready for some XML. So I’m going to give you some XML here, some X query for you. We’re going to print the query that runs. All right.

And then we’re going to delete some XML. Have you ever seen this before? Have you ever seen a delete from XML? We’re going to use this thing. And we’re going to.

So back story on why this is necessary. So we’re going to do a query. We’re going to do a query. When the query up here runs, there are two query plans that get generated. There’s a query plan for this query, which has the not exist stuff in it. And then there’s a query plan for getting the query plan.

Couldn’t make this stuff up. Right? Could not make these things up. So what we need to do is we need to preserve the first query plan and delete the last query plan. And apparently this works to do that.

Isn’t that insane? The second thing we need to do with XML is shred our query plan. Remember that little, that, that output thing called query plan that we, that we assigned the query plan to? Well, we’ve got something interesting to do here.

We have to select from a variable, the XML nodes for all of the clustered index scans. What we’re going to pull out of that is the node ID, which is going to tell us the order that things happened in the query. The table name that, that the thing that, so we have the node ID, which is the order that things happened in.

We have the table name, which is the thing that got clustered index scanned. And then I’m also going to select the query plan here so that we can validate our, our results. Okay.

All good. This is great. This is wonderful. I am very happy with all of this. We’re not going to do anything with this. This was me just making sure that I was, I was right about things. Uh, I ran this quite a bit to see if I could find any like real big outliers that would be like, Hey, look, this is a big performance thing, but there, there really weren’t any with, with, with, uh, with these.

Anyway, uh, what we need to do now is come back up to the top and we’re going to run set no count on once so that we connect to, and look at, look how nice this is. Look how nice this new SQL Server management studio 21 connect dialogue. We have this futuristic thing.

We can tell, we can even tell it what database we want to connect into, and we can do that. No one look at my password. Of course, this is highly, highly confidential information. Don’t look at the password, but the, we can tell it exactly which database to go to. So we, I don’t actually accidentally connect to master and hit an error the first time.

Fantastic. Thanks, Aaron. Anyway, we are all connected there. And now let’s run this once and let’s just see what happens. Uh, my, my, my VM may have restarted last night.

So this might, this first run might take a couple of seconds because we might have to read some stuff, uh, from, from disk into memory. Remember kids reading stuff from disk into memory is terribly slow. Keep your data in memory and your queries will always be faster.

So, uh, what we have now that we are finished, now that we’re down here, right? We want, we don’t know what, we don’t want to hang around up there. We want to hang out down here because our query, this is what the, this is the query that ran and gave us some stuff.

So we have the node ID, which, which is going to, and I’m going to prove all this out to you. We have the node ID. This is the order that tables were accessed in.

We have the table name and here’s the query plan. Over in the messages tab, we have the query that ran. So all of these queries start with the users table, right? So in all of these users ends up being first.

Then we have our not exist predicate clause query plan predicate subqueries. And we see the order that these happened in here. So we have badges, comments, votes, posts.

All right. B, C, V, P badges, comments, votes, posts. If we come back over to our results, we have users first, because that, that was, that was the from clause. And then we have badges, comments, votes, posts.

Okay. Well, do we know if these, how do we know these node IDs aren’t lying to us? Well, let’s look at the query plan. We have users.

Ta-da. We have badges. Ta-da. We have comments. Ta-da. We have votes. Ta-da. And we have posts.

Again, ta-da. If we hover over any of these, we’ll see the node ID down here in this little tooltip. So this is node ID 14. This is node ID number 12.

This is node ID number 10. And this is node ID number eight. Ah, we got it.

And this is node ID number six. All right. So I think, like, now that we’ve kind of proven that the setup, this is a valid test. Well, here’s our six, eight, 10, 12, 14, right?

Just like we saw up there. Let’s run this a few times and see stuff in some different orders. So now we got a completely different order on this one. Now we have posts, badges, votes, and comments.

And now these are no longer even numbers. Now these are odd numbers. We have seven, nine, 11, and 13. So remember, users, posts, badges, votes, comments. If you look at over here, here’s, oh, go away thing.

Here we have users, posts, badges, votes, comments. So the order matches again. If we keep running this, and I’ve done this hundreds and hundreds of times, like the number of loops I’ve written around this to do this over and over again is absurd.

And if we, every time we validate this output, here’s users, votes, posts, badges, comments. So VPBC over here. What do we have in the messages tab?

V, P, B, C. Every single time. So coming back to the main point, when you are writing queries with not exists, SQL Server, SQL Server’s optimizer will do, let’s just, I won’t say nothing.

Cause who knows, maybe like it almost does something, but then changes its mind and is just like, nah, I don’t think so. Hmm.

It’s not for me. SQL Server’s optimizer does not really make any attempt to reorder not exists predicates. So when you are writing queries that have not exists in them, some tables might be cheaper to access and do a not exists from, and you might be able to narrow rows down than other tables.

So always be very, very careful and cautious. The order that you write your not exists is in because you might be able to get your queries to run much faster just by reordering not exists is because SQL Server will be able to do it for you.

Anyway, that’s all I have for this one. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where I will still be wearing this atrocious headset until my little clippy thing shows up.

It’s not an actual clip, like a tie clip or like a clippy, like the paper clip guy. It’s just a little like, like clippy microphone thing, which I don’t know. I might start wearing for formal occasions as well with my Adidas tuxedo, because that’s how I roll.

All right. Anyway, I guess that’s good for this one. Again, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you over in the next video. All right.

It’s magic. Ta-da! Ta-da!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

My Upcoming Speaking Schedule

Busy Summer


The nice folks at Red Gate have decided to put me to work.

That means I’m going on tour, and maybe getting some socks and a Hawaiian shirt.

No word on a “Lego Erik” yet.

PASS On Tour Events:

PASS Data Community Summit:

Of course, Kendra Little and I are back in action to teach back-t0-back T-SQL precons.

See you out there!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 20

SQL Server Performance Office Hours Episode 20



To ask your questions, head over here.

Hello Mr. Erik, I’ve attended your tuning course inside a theater in Seattle years ago! If I may ask a question about the future of SQL. I heard in the past that Joe Sack left MS to MongoDB and now he has returned. That genius guy still working with improvements of sql whatever “onprem” or azure ?
If you don’t mind, could you tell us more about the stories behind your tattoos? What do they represent, and how many do you have?
Without parameterized queries, how would you suggest to decide which queries to tune?
Hi Erik! Are your educator skills just natural talent or do you have any good sources for improving that?
Give me the case against partitioned views.

Video Summary

In this video, I dive into the world of Office Hours with Darling Data, where we tackle a variety of SQL Server questions from viewers. We start off by addressing the future of SQL and Joe Sack’s recent career moves, which sparked some interesting discussions. Then, I share insights on deciding which queries to tune without parameterized queries, introducing a new feature in SP Quickie Store that helps analyze query hash totals. The conversation continues with personal questions about my tattoos and teaching skills, offering unique perspectives on how these aspects of life have influenced my work. Finally, we wrap up by discussing partition views, sharing both the benefits and challenges they present. It’s always great to engage directly with our community and hear from you all!

Full Transcript

Erik Darling here with Darling Data. And we are once again greeted with the background. So we are once again doing Office Hours! Kaboom! If you would like to ask your own questions for Office Hours, this is the link to do it. It’s down in the video description. Likewise, if you would like to support my channel and give me money to keep talking, if there were a way for you to pay me to stop talking, perhaps there would be more generosity from the greater public. I am I am open to that. I can be bought. I’m not beyond that. My morals and my ethics do not extend that far. So if you would like to pay me to keep talking, you can do that. If you would like to pay me to stop talking, shoot me an email. We can work something out. If you like this channel but not in a money way, you can like, you can comment, you can subscribe. And if you would like to hire a consultant to do SQL Server stuff because you’re having trouble with SQL Server stuff, guess what? This total package here does SQL Server stuff. Health checks, performance analysis, hands-on tuning, dealing with SQL Server performance emergencies, and of course, training your developers so that you have fewer SQL Server performance emergencies. Right down to zero SQL Server performance emergencies. We can, I can do all those things. Not we, I. There’s only me here. You only get this face. There’s no substitute face that shows up. Doesn’t know who you are or what you are.

If you would like to get my performance tuning training material, you can get all 24 hours of it for about 150 USD for life. Again, link video description. My new T-SQL course, which I finally fixed this slide for. Videos will start dropping in June. You, of course, get the pre-sale price until the advanced material shows up after the summer. This is companion material to the pre-cons that Kendra Little and I are doing in Seattle this November. So if you are attending those, you get access to this material at the price of the pre-cons. If you go to pass and you don’t come to the pre-cons, I don’t care. Right? Like, you have to show up for me and Kendra for me to care. If you would, again, speak more about the live and in-person stuff. Pass on tour. Boy, this is going to be fun. New York, Dallas, and Amsterdam. August, September, October. I will be at all three of them. And of course, I will be at Pass Data Community Summit in Seattle this November. doing the aforementioned pre-cons. So we will have a grand time with that, won’t we? But with that out of the way, let’s do these office hours questions. Let’s zoom, zoom, zoom, zoom, zoom, zoom, zoom, zoom. What do we have here?

Hello, Mr. Eric. Hello, you. I’m not sure how to address you. I’ve attended your tuning course inside a theater in Seattle years ago. A theater, you say. If I may ask a question about the future of SQL, as much as I am not a psychic, I’ll do my best. I heard in the past that Joe Sack left MS to MongoDB and now he has returned. That genius guy is still working with improvements of SQL Server, whatever, on-prem or Azure. So he did come back.

Joe Sack did go to MongoDB for about a year or so. And then he was back at Microsoft. And he was back at Microsoft for about a year and a half. And, you know, it would be inappropriate of me to comment on Joe’s situation at Microsoft. But Joe was sort of unhappy with what his role had morphed into. And so Joe went to work at another database company called Elasticsearch.

So Joe is now some sort of head honcho, not sure what a head honcho is, over at Elasticsearch. Doing a great job there, kicking butt. They are very, very lucky to have him. You know, I miss him dearly working on SQL Server, but it was not meant to be.

All right. Ooh, a personal question. Look at us go here. If you don’t mind, could you tell us more about the stories behind your tattoos? What do they represent and how many do you have?

Well, the stock answer that I have when someone asks me how many tattoos I have is all of them. Because I’m pretty well covered. But the thing where I depart on having stories behind my tattoos is that I got nothing.

All right. Like really, most of them mean absolutely nothing to me. There’s no story. There’s no meaning. There’s no like heartfelt life event that led to me getting them. Like I got a couple like wife and kid name tattoos, but those are just sort of like if I didn’t get them, like they’d be mad.

Right. That’s about it. It’s just, you know, I learned I rather I figured out at a very young age that I was the type of gentleman who wanted the who wanted attention from the type of lady who had a lot of tattoos. And I realized that the best way to get that attention was to get tattoos. And I was lucky enough to make friends with some tattoo artists, like especially like, you know, friends of mine, like lifelong friends of mine who are like starting out getting tattoos, who have gone on to like be really good at tattoos and like own tattoo shops.

But I have all their like starter work. So I have a lot of really old tattoos right now that they were just like, hey, I want to do this tattoo to practice. Can I do it on you? And I was like, dope. I’ll buy burritos.

So most of these tattoos have no real meaning, no real story, maybe just a thing that I kind of liked at the time where the tattoo artist was like, oh, I want to do this Japanese thing today. I’m like, I don’t have any Japanese stuff. Let’s do it. So like, you know, I just got covered with a lot of stuff that means nothing to me very quickly.

And guess what? It worked. May not be the most thoughtful thing in the world, but it was highly effective and painful, painful and effective. So anyway, well, we’re on the subject of pain.

Without parameterized queries, how would you suggest to decide which queries to tune? So this is actually a neat question because I recently added, well, I don’t know, I’m actually, because of how busy I’ve been recently, I have no concept of like actual time. Months have gone by where I’m like, where are we?

But I sort of recently added a new parameter to SP Quickie Store to help you decide if queries are worth tuning. And that parameter is called include query hash totals. There are underscores in there because I like putting underscores in things.

To make them readable. I don’t like the uppercase, lowercase thing. It makes me feel cramped and crowded and it gets me all claustrophobic feeling. But so I added this parameter because like, like what I would find is like I would run Quickie Store.

And like there would be this whole like list of queries that looked similar, but would have like one or like just a couple, maybe like five, six executions. And people would be like, this doesn’t run that much. I don’t want to spend time on it.

So I put this include query hash totals parameter in. And what that does is it looks at the query hash. So like if you have unparameterized queries that are effectively running the same thing over and over and over again. Right.

It’s the same query. You just have like different dates or like a different name or a different number of in clause things. All the stuff that like kind of like gets you like the same query hash, but like with different like with what gets you like the same query plan. No, it’s not the same query plan.

Gets you different query plans, but like the same query hash. Like the text of the query ends up all the same. Like it gets, it counts all that up and it gives you totals for like CPU and duration and like executions and all that stuff. All like the other metrics that are in the in QuickieStore’s output for the query at the query hash level.

So like you might find a very sneaky query that looks like it’s only executed once, but the query hash tells you otherwise. If you look at the query hash, it’s like, wow, this thing actually executed like 7,000 times. We just only see this one, like we just see this one example of it in the output.

So like when I, when queries aren’t parameterized, now that’s the tool that, or rather that’s the option that I use in my tooling when I want to figure out if something is worth going after. It’s still worth using QuickieStore and it’s still worth figuring out like, like if the query is meaningful to the workload, like other ways you can do that with QuickieStore. There’s a parameter in there called workdays.

And I like the workdays parameter because like it’ll just look at stuff that’s run Monday through Friday, but then by default nine to five. But there are two other parameters you can use to change the span of hours. But what’s nice about that is that like you, you screen out automatically all the like overnight processes that you might not care about.

If you like, if you want to focus on the overnight processes, go ahead and say, set the start and end date or whatever, or like, like use workdays, but go in reverse. Like with the, with the times, with the timestamps. But like, so I would probably use some combination of those things to do that.

Okay. Hey, look, another sort of personal question here. Look at that. Are your, hi, Eric.

Hi, you. Nice to meet you. Are your educator skills just natural talent? Oh boy. I don’t know if I call it that. Or do you have any good sources for improving that? Uh, so I have no good source for improving that. Um, any, any, any ability I have as an educator comes from being dumb.

Right. I’m not, I am not a naturally smart person. Things don’t come quickly to me. I’m like one of those like old CD burners that goes at like one X, like it, like it, it burns slow, but it burns deep.

Right. So I don’t learn things very quickly. And it like, so like when I need to learn something, like I need to like really break it down in my head a lot further than people who get things a lot more intuitively. Like people who are much smarter or more clever, whatever it is.

Like, like they, they like, it’s like, oh, like they just look at something like, oh yeah, I get it. Me. I’m like, no, no, no. I need like, no. Why does this thing go from here to there? Like I don’t get things that quickly.

So I’m good at teaching people because I’m dumb. Right. Like it takes a lot for me to learn something. And by the time I’ve learned, I’ve learned something, I feel like I’ve learned it very well and in very small pieces so that like when I have to tell someone else about it, like all those small pieces are just like burned into my, my CD brain. Right.

My CD brain. So I like, if it’s, if it’s any, if there’s anything natural, it’s cause I’m spinning slow up here. All right.

Give me the case against partition views. I don’t really have one. I don’t really have one. My only, the only real rot that I’ve found with partition views is trying to get things right so that they’re updatable. That’s a damn nightmare.

That is not a good time. I don’t suggest that. It doesn’t fit a lot of situations. So if like you want parts, if you want to use partition views, I say, go for it. Just, you know, create your, make sure that you get your constraints right and make sure that whatever needs to like, you know, like whatever view needs to be refreshed to get new data in there is running at the right like intervals and stuff.

And you’re pretty good. You know, I like partition views like, like usually quite a bit better than like capital P partitioning, just because like I can index stuff differently. If there’s like, you know, if I add new columns or remove columns, it’s easy enough to like fix that in the view definition.

When you have like proper constraints on the tables to tell a SQL Server what data lives where, you know, you can get like pretty like clean execution plans from it. So I really don’t have much against partition views aside from like trying to get them to be writable, which again, that kept me up for, that kept me up for a while trying to, trying to like get a good demo where like to where they were writable. And I, I bailed on it.

It was just, it was too much. It was too many things, too many things went wrong and happened that I, I did not like nor love. All right. Anyway, I believe that is, that is five questions.

One, two, three, four. Yeah, that’s five. I can, I can count to five. Most, I, I, I, I, I credit most of my ability to count to five from, from barbell training because doing sets of five really does get you good at counting to five. All right.

So anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video, video, video. So thanks for submitting questions. Submit some more.

All right. That’s not a very good sales pitch, is it? All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Common Table Expression Mediocrity

Learn T-SQL With Erik: Common Table Expression Mediocrity


Video Summary

In this video, I delve into the world of Common Table Expressions (CTEs) in SQL Server, often highlighting their perceived benefits and the reality of their implementation. I start by addressing a common misconception—that CTEs make queries more readable. Through examples, I demonstrate how what might seem like a well-structured query can actually be a disaster, leading to errors when executed. The video then transitions into an in-depth look at SQL Server’s handling of CTEs, emphasizing the lack of materialization and the potential for performance issues due to repeated execution. I provide practical advice on when and how to use CTEs effectively, suggesting that in many cases, dumping their results into temporary tables can significantly improve query performance. Throughout the video, I share my frustration with SQL Server’s limitations and the workaround nature of using CTEs, ultimately aiming to help viewers navigate these challenges more confidently.

Full Transcript

Erik Darling here with Darling Data. In today’s video we’re going to get back to the T-SQL learning material. This is of course the beginner stuff, so if you fancy yourself far beyond the beginner realm of learning in this area, you can feel free to go twiddle your thumbs elsewhere. But we’re going to talk about CTE today. And this is one of my favorite things to talk about. because I like watching the balloons deflate when we start talking about just how mediocre SQL Server’s implementation of CTE is. You know, instead of getting some basic useful database functionality, we get giant monolithic failures like fabric foisted upon us and I don’t know, I guess let the layoffs continue. All right, good job all around. So one of the first things that every LLM generated idiot on the internet likes to say about CTE is how they make queries more readable. Oh, they’re so readable. Look how readable the query is with the CTE. My goodness. Let’s finger point, rocket ship, green check, fire emoji our way to fame and fortune. One wonders if these people have a lot of an assistant to remind them to have an assistant to remind them to breathe. But who knows. So here’s a query that is a CTE. We can identify quite easily that it is a CTE because it starts with with. It does that. It doesn’t need a semicolon, does it? But this query is completely illegible. There is nothing readable, understandable, or even tolerable about this query. It is a disaster. If someone sent this query to me, I would throw them out a window. If you don’t have respect for me, that’s one thing, but at least have some self-respect when you write these things. Even better, is if we attempt to run this query, we will get an error.

And, you know, the error is very clear. But where we would go about remedying this error is not terribly clear based on this syntax, is it? So please format your queries and they will be readable. If one does not format their queries, they will not be readable no matter how many CTE one dispenses with. So with that out of the way, let’s talk about SQL Server. So with that out of the way, let’s talk about SQL Server’s utter mediocrity with CTE. Now, the big thing is that even though the word table is in the name, the result of your query is only tabular. It is not materialized to a table. Nothing, there is nothing stable about the result of your query. There are all sorts of things that can come up, especially when one starts pondering isolation levels and the timing of operations in a query plan that might, they could set one’s head on fire if one spent too long.

So it’s quite a dizzying array of issues that you could run into. But that’s getting a little bit in front of things. That’s getting a little bit in front of things. The main thing is that the query is not materialized. Even if you put a top in there or something, which does provide some like logical fencing of stuff. If you watch the unnesting video that I recorded a few days ago, you’ll see that I use top in there to prevent some unnesting. In a similar way, you could do that with a CTE.

However, that result is still not materialized. And what I mean by that is that every time you reference the CTE, generally in the outer scope of the query, and by outer scope, I mean after the CTE has been run, every time you do that you have to run the query inside of the CTE. An easy way to see what I mean by that is by just getting an estimated plan for this one, where there’s one reference to the CTE, and there’s only one time in the query plan when we touch the users table, and where we generate some query plan operators to create our row number.

We’re calling the row number function here. And there’s only one filter to remove any rows where row number is not between 1 and 100. We have to contrast that a little bit with a slightly different query, where our from clause now joins the CTE, c1 to itself. They, of course, have different aliases because, you know, you can’t alias the same thing the same way twice. You just get an error and say, hey, you already did that.

No need to do that again. So if we now get the estimated plan for this query, we will see that we have now two copies of the CTE being executed, or being referenced, right, where the query gets executed. We touch the user’s table twice. We do all the stuff that we need to generate the row number twice in each of these query plans, right?

We have a lot of things that repeat in here, and we have two filter expressions, one for each time that we filter on the row numbers down here in the where clause outside of the CTE. And this is a very general pattern that you will see over and over again if you are the type of person who uses CTE and then re-references that CTE multiple times in the resulting query. So be very careful with this. If you’re going to do this sort of thing, I mean, you know, if the thing in your CTE is small and compact and easy enough to run, you might never have a problem with it.

But if you find your queries that exhibit this pattern slowing down considerably, strongly consider dumping the result of your CTE into a pound sign or hash sign temp table, and then using that temp table in your outer query instead. So that’s what it did. Now, it’s not just the re-referencing the lack of materialization that this can cause an issue with.

Quite often, even if you have every CTE, like if you stack CTE together, you have like a whole chain of them, and you have a different query in each one. And then in the outer query, you only talk to each CTE once, but you like join the results together in some way, either with a traditional join or like an in sub query or not in or exists or not exists or anything like that.

Cardinality estimation gets very, very difficult when you start combining all those things together. The reason for that is, you know, cardinality estimation can be difficult enough. If you think about a query plan, if you read it from left to right, you of course sort of get the logical flow of the query and like, you know, how things like got to where they needed to get to.

But if you start at the outer edge of the query, by that I mean like the stuff that’s kind of behind my big head, like these things over here. This is like the outer edge of the query plan. And like, you know, cardinality estimation can be tough there if you have a rather complex set of predicates against the table or just confusing like weird or stuff going on over and over again. This is like the outer edge of the query plan here. So like cardinality estimation here might be okay.

But as you start moving across the plan, when you start attempting to join complex expressions together, like as you get like, like deeper in like, or rather like, like further to the left in your query plans. That’s where cardinality estimation generally tends to fall apart. And that’s where materializing results into temp tables can be very valuable. Because now SQL Server, like even if it messes up cardinality estimation completely in the query that populates the temp table, once that result is materialized, SQL Server has every opportunity to better cardinality estimation with that, like physically materialized result.

So like, like, like, like when you see query plans, and like SQL Server’s optimized result cost-based, right? Figures out like, like along the way, it figures out all these different plan shapes and candidate plans and, you know, substitutes different operators to do different things and reorders joins and all sorts of crazy mathy stuff. But like, like, like, like, like, SQL Server is trying out these candidate plans, you’re going to see a lot of like weird Franken plan mix and match stuff where it’s like, oh, this is cheap.

Oh, now put in this cheap part. Okay, now put in this other cheap part. So like cardinality estimation can get really, really wonky and big, complicated plans, because all of a sudden, they’re sort of like these stitched together cost based choices. And things can really start misaligning. So like, it’s not just the lack of materialization with the CTE that can be painful, even if you don’t like, even if you don’t re-reference a CTE, string together a whole bunch of complicated ones, that can, that can also just make life weird. So like, like, I tend to avoid that as much as possible. A rather uncomplicated example, and this is not like a bad cardinality estimation example, this is just to like show you that sometimes like not, you don’t always, the lack of materialization doesn’t always come back to hurt you with the query being rerun.

This is just a simple stacked CTE where, you know, we have C1 here, and we run the query in here. And then down below it, we have C2. And we, when we reference, well, we definitely reference C1 here, and then select from C2 in the outer part. But what’s nice, like, C2 doesn’t mess this up. This doesn’t end up, this doesn’t, like, stacking CTE doesn’t result in the user’s table being hit twice, or the query inside the initial CTE getting executed twice. We only have that once in the stacked sort of thing there, the stacked CTE list here, where things, and actually, something that I think is kind of nice about this one, is you’ll notice that, like, in here, we generate our row number, right? And down here, we filter on that row number between one and 1000. But then in the outermost query, the outermost scope, we filter to where the row number is between 200 and 500, right? Which is a narrower sort of narrower set of, narrower range of values than 100 and 1000.

SQL Server only chooses to filter once, like, we don’t have an intermediate filter, and then a secondary filter, SQL Server just does one filter to where it’s between 200 and 500. So the optimizer does some work and just kind of like throws this portion out, it just says, we don’t actually need you to do anything, because like, you’re not really, like, there’s like no benefit to this. If we did something where, like, this was between 1 and 50, and this was between, like, 10 and 30, then it would filter twice, or like, then it would just filter out here.

So where CTE generally become useful is when you do things that are disallowed by T-SQL, like, just, like, on their own in a single query. One of the, probably the most, like, useful common example would be, like, deleting a top number of rows in an ordered way, right? So if we wanted to delete from this table, based on this where clause, and we wanted to order it by something, notice that we have a little red squiggle here, right?

SQL Server is like, like, IntelliSense is already telling us, hmm, I don’t know about this one. I don’t think that’s going to fly. And if we try to get an estimated plan for this, it would just say incorrect syntax near the keyword order.

Right? It doesn’t tell you, like, hey, you can’t do that. Like, you’re just going to sit there and stare at this query and be like, how, where, there’s no, there’s nothing wrong. If I run this part, I get a query plan.

But if I try to order by here, I don’t get a query plan. Why? There’s no, there’s nothing wrong here. Is it like you’ll start, like, putting this in, like, notepad++ and looking for, like, strange empty space characters and losing your mind. It’s just a T-SQL limitation.

But you can do that with a CTE where if you put a select top 1000 query in the CTE with your order by, you can delete from that and get a query plan just fine. Right? So this works, but just doing it in one query doesn’t.

So a lot of the utility and use of CTE is not performance. It’s not readability. And it’s certainly not, like, some sign that you know what you’re doing with T-SQL if you use them.

It’s how you use them. Right? Like, really, where they come in handy is where you do things that you can’t just do in one simple query. It’s like, you know, coming back to the row number stuff, you have to put that row number in some sort of derived table expression, whether it’s a CTE, whether it’s a derived table, you know, like, anything like that, in order to filter on the row number.

Other databases have a qualify clause that allow you, that allow, it’s sort of like a secondary where clause that allows you to filter on stuff that happens in the select list. Remember, we talked about logical query processing. Select happens almost last when queries are logically processed.

So stuff that you talk about in the select list isn’t visible to the where clause. If we had the qualify clause, it would be visible there. But we don’t.

Instead, we have fabric, fall down fabric, which, you know, just complete waste of our lives. So, like, most of the use of CTE just comes down to getting, like, T-SQL, there’s a workaround. Right?

Like, it’s never like, hey, there’s a straightforward way to do this. It’s always like, there’s this weird hack I read about. Right? It’s like, it’s never, almost never just like, oh, yeah, just do this one simple thing. It’s always like, no, no, no. You have to do this, like, four other things to get it to work, but it’ll work.

So, like, it’s really just a T-SQL limitation where we have to generate the row number in here before we can filter on it anywhere else. Or before we, like, you know, we can order by it up there, but we couldn’t filter on it in there. Because order by happens after select, but where happens way before select.

So, like, would it be nice if we had the qualify clause? Yes. Would it save us a lot of weird time and, like, typing and all the other stuff? Yes.

But, hey, it’s more important that everyone has a non-functional data late or something. Right? Okay. Anyway, thanks for watching. I hope you enjoyed yourselves. I hope you learned something.

And I will see you in another video where we will probably talk about something else T-SQL because that seems like a reasonable thing to do. Of course, the pre-sale on this course, you can buy this course at the pre-sale price, 250 bucks, down in the video description there. This is all companion material to the T-SQL seminars that Kendra Little and I will be teaching at Pass Data Community Summit in Seattle this November.

If you are attending those, you will get access to this companion material as part of your admission to the pre-cons. Otherwise, you will have to buy it from me. And if you wait too long, it won’t be the pre-sale price anymore.

It will be 500 bucks and you will say, can I still get the pre-sale price? And I will say, no. Why didn’t you buy it in the months that you had to buy it for the pre-sale price?

Ding dong. Anyway, I am going to go do something else now. CTE have once again found a way to depress me.

Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 19

SQL Server Performance Office Hours Episode 19



To ask your questions, head over here.

Tell me about the perils of using SNAPSHOT for writes. Is learning how to deal with conflict detection really worse than the pains of pessimism?
Why SqlServer doesn’t have parallel rollback ?
How can you be for SORT_IN_TEMPDB and for Accelerated Database Recovery? If tempdb is good for performance, then isn’t ADR bad for performance?
Your experience says to prefer the legacy cardinality estimator. What is fundamentally wrong with the new estimator?
You previously discussed why you use SORT_IN_TEMPDB. Would you default to that in Azure SQL Database given the lack of control over tempdb (including sizing) in that product?

Video Summary

In this video, I dive into answering five questions submitted by my YouTube community during an Office Hours episode. We cover a range of topics, from the practical aspects of using Snapshot isolation in transactions to the more technical question about why SQL Server doesn’t support parallel rollbacks. Additionally, we explore the differences between the legacy and new cardinality estimators, discussing their performance and personal preferences based on testing. The session also delves into the use of Sort operations and TempDB in Azure SQL Database, weighing the pros and cons despite potential limitations. Whether you’re a seasoned SQL Server professional or just starting out, there’s plenty to learn from these real-world questions and my insights.

Full Transcript

Erik Darling here with Darling Data and, well, we are background folk again, aren’t we? We’ve got a dumbbell, barbell behind us. It’s not a dumbbell. Dumbbells don’t bend like that. Barbells bend like that. And we’ve got a Darling Data logo, so that must mean we are doing another Office Hours episode where a helicopter of some sort. Hopefully it’s not a drone strike. Microsoft’s finally after me. Anyway, we’re going to do an Office Hours episode where I answer five entire questions submitted by you, my adorable users. If you want to submit your own question, you can go to this link, which is down in the video description. If you would like to support my channel, if you’re like, wow, this man deserves to get paid for all the work he does, well, you can do that also down in the video description. If you don’t feel like I deserve to get paid, maybe I am well deserving of a like or a comment or a subscription to the channel. I don’t know. I think we’re up to still around 60 paid subscribers and a little over 7,000 paid members and a little over 7,000 unpaid subscribers.

So I think a few of you might like the channel. All right. If you need help with SQL Server, health checks, performance analysis, hands-on tuning, dealing with performance emergencies and whipping your developers into shape so you have fewer emergencies. Well, I happen to be pretty good with a whip. Just saying. Doesn’t have to end there. If you would like to buy my performance tuning content, you can get all 24 hours of it for 75% off. That is about 150 USD and that is for life or 8 life. You can do that with the link down in the video description. And of course, I have a new T-SQL course with me, Eric. Ignore that. I need to fix that at some point. I’ll remember to do that someday. Funny story with Podia. But anyway, it’s on pre-sale price now. 250 bucks. It’s going to go up to 500 bucks when the advanced material drops after the summer. If you’re attending Kendra and I’s past pre-cons, you get access to all of the content here with the price of admission. So that’s a nice deal for you.

If you want to catch me live and in person, I will be at all three of the Pass On Tour events. That’s New York City, Dallas, and Amsterdam taking place August, September, and October of this year. And then, of course, at Past Data Community Summit taking place in Seattle in November of this year. So you could see me four times this year if you were really ambitious.

I don’t know. That might be too much for both of us. I know how social you people are. Let’s not push it. Anyway, let’s go answer some of these questions. Has anything ever been less useful than the Dropbox badge that shows up here? Like, there’s nothing useful about that. Nothing useful about that has ever happened.

All right. Anyway, let’s start here. That’s right at the very top. Let’s see. One, two, three, four, five. All right. Five questions. Tell me about the perils of using Snapshot for rights. Is learning how to deal with conflict detection really worse than the pains of pessimism?

Well, you know, it does depend a little bit on your, I guess, skill and comfort as a developer. Like, the main peril for, like, using Snapshot for rights is you get errors if you try to update, if two queries try to update the same thing in a Snapshot transaction. If you are cool enough with dealing with those errors, then it’s not a big deal.

You know, of course, the pains of, you know, non-snapshot rights can be, you know, under most isolation levels be like lost updates, right? Like, you know, like, like one query could do something and like another query could immediately overwrite it and that’s not a good time. So, really, it just, you know, it’s just kind of like picking your poison.

Like, like, what’s a bigger problem? If you’re cool with dealing with the errors that come along with the, like, like right conflicts, then cool, go with it. I’m not going to try to talk you out of it.

Me, personally, you know, really depends on, like, for me, you know, it’s more of like an application, like, like expectation issue. Like, like, like, like, like what, what, what would make an, what would make most sense to the end user? Like, what, what is the most sensible end result of two queries trying to update the same thing?

Is it one query failing or is it one query overwriting what the other query just did? Like, really just comes down to that for me. Let’s see here.

Why SQL Server doesn’t have parallel rollback? I don’t know. That’s Microsoft. Do I look like Microsoft? I can’t tell you these things. They didn’t, they didn’t implement it.

It’s doable. I don’t know why. Maybe it’s hard or something. I don’t know. Why don’t you go work for Microsoft and put it, write it into the product if it means that much to you. All right.

How can you be for a certain 10 dB and for accelerated database recovery? If 10 dB is good for performance, then isn’t ADR bad? Are you drunk?

These are… What? Huh? This doesn’t even make sense. I can’t answer this.

This is… It’s mind-blowing. Anyway. Ah. I’m just going to forget that. I’m going to start drinking after that one. Maybe I am too sober to answer that question.

I should get drunk and try to reread that one. Your experience says to prefer the legacy cardinality estimator. What is fundamentally wrong with the new estimator? Well, fundamentally, it doesn’t estimate things as well most of the time.

You know, like when I’m writing demos for, you know, my classes and, you know, for my videos and all that other stuff, you know, I always give both estimators a chance to see which one does a job that, you know, I am happier with. And just probably like 75, 80% of the time, it is the legacy cardinality estimator that does the better job. The default cardinality estimator, or as Microsoft calls it, I don’t call it that.

I just call it the new one because that’s all it is. It’s new. Most of the time, the new one, meh, just doesn’t do it for me.

It’s either like a guess that’s close enough to legacy or it’s a guess that’s way wronger, way more wronger-ish than legacy. So, you know, I don’t have any specific things to, like, show you these differences. It’s just, you know, just a general testing that I’ve found.

It’s just not quite as good. You previously discussed why you use Sort and TempDB. Would you default to that in Azure SQL database given the lack of control over TempDB, including sizing in that product?

Yeah, I think I still would, or rather I still do. I don’t really see a need not to. Honestly, I can’t think of a good reason why either of the things you mentioned would prevent me from doing that.

You know, like, the stuff, the limit, the TempDB limitations that I really care about have nothing to do with that. Like, I think both Managed Instance and Azure SQL DB, like, neither one of those still allow for the in-memory TempDB metadata, and sorting in TempDB would have no effect on that.

If TempDB performance, like, if you test it and you find TempDB performance is worse for creating indexes or whatever you’re doing with indexes when you sort in TempDB, then certainly stop. But, like, for me, from just, like, a general, like, I’m going to create this index perspective, I would still prefer to sort in TempDB, regardless of the locality of my database, unless testing proved otherwise.

There may even be times on-prem when a sort in TempDB would be like, hey, why is this slow? I don’t know.

TempDB is created on, like, an old pile of boar’s head Swiss cheese. It’s on some rye bread and salami in there. It’s like, I don’t know.

TempDB sucks, don’t go there. If it’s okay, go there. You know? It’s like, again, it comes back to, like, the public restroom metaphor for TempDB. If you open the door and you don’t like what you see, close the door.

All right? Walk away. Go pee behind a tree or something. Anyway, thank you for watching. Thank you for sending in questions, by the way.

I hope you enjoyed yourselves. I hope you learned something. For the person who asked the question about accelerated database recovery, I hope you have sobered up by now. Perhaps you could restate that question in a way that a sober person could understand.

Not that I’m sober. I’ll, like, permanently. Just when I do these, I tend to be. So perhaps you’re just on a different wavelength there. Anyway, thanks for watching.

I will see you in another video. Doing another thing, I suppose. Makes sense then. All right. Cool. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About TOP WITH TIES In SQL Server

A Little About TOP WITH TIES In SQL Server


Video Summary

In this video, I dive into the concept of `TOP (1) WITH TIES` in SQL Server and address some common confusion around why it might return more rows than expected. Erik Darling from Darling Data explains how the lack of an appropriate tiebreaker can lead to returning all matching rows instead of just one. I walk through a practical example using a hypothetical post table, demonstrating that without an order-by clause or a proper tie-breaking column, `TOP (1) WITH TIES` will return all rows that match the criteria, not just one. The video also covers how to use window functions like `DENSE_RANK()` to better understand and control which rows are returned when using this query construct.

Full Transcript

Erik Darling here with Darling Data and in today’s video I’m going to attempt to answer a question because I posted a video where I described using top one with ties a little bit some time ago and some people still didn’t get it. Did not understand why top one with ties would return a lot of rows sometimes. some ties. So we’re gonna talk a bit about that. I don’t know. Honestly, it’s Saturday here and I’m not feeling terribly creative and I just need something easy to do right now. So screw it. If you like this channel and you would like to support my endeavors to bring you usually very thoughtful, energetic, SQL Server content with the occasional screw it, you can sign up for a membership. There’s a link down below. It says join or become a member or something. I forget what. I don’t watch these things. You crazy listening to my own voice, seeing my own face. Woof! Why would I put myself through that? If you would like to support this channel in some other way, perhaps $4 a month is just too rich for your blood, you can like, you can comment, you can subscribe. And of course you can ask us to ask me questions for free privately that I will answer publicly during my office hours episodes where I answer five user submitted questions at a time. You can ask whatever you want. I don’t care. If you need consulting help with SQL Server, still powerhouse number one. SQL Server consultant outside of New Zealand. Beer Gut Magazine says so, so it must be so. Whether you need health checks, performance analysis, hands on tuning, dealing with SQL Server, dealing with SQL Server, dealing with SQL Server performance emergencies, or teaching your developers to not be such dimwits so you have fewer performance emergencies, all of these things become possible through yours truly at a very reasonable rate. So, get at me with that. Anyway, if you would like to get some performance tuning training content from me, you can get all 24 hours of my currently available stuff at that URL.

Well, with that discount code, it comes down to about 150 USD and that lasts you for the rest of your life. There is no subscription necessary. If you would like to get in on the presale price for my upcoming T-SQL course, you can get it now for 250 bucks. That will not last forever. And if you wait and it goes up to 500 bucks and you’re like, hey, can I get a discount? The answer is no. You missed out. You will have had months to do this. I urge you to do it now rather than later. When it will cost you twice as much. This is, of course, companion material to the pre-cons that Kendra Little and I will be teaching in Seattle this November about T-SQL. So, if you’re going to attend those pre-cons, you get all this stuff for free. Well, not for free. You get it with the price of admission, which if your company is paying for it, that basically makes it for free.

But this is work-related stuff, so I would hope that your company would pay for or at least reimburse you for buying this. All right. It would be kind of crazy to not. Anyway, speaking of leaving the house, I will be on tour with Redgate all summer long. I feel just like Lars Ulrich. I think that’s how you say his name.

New York City, August 18th to 20th. Dallas, September 15th to 17th. Then Amsterdam, October 1st to 3rd. And that all leading up to the main event at the Past Data Community Summit, Seattle, November 17th to 21st. Come hang out. Watch me be a SQL Server monkey. Live and in person.

With that out of the way, though, let’s talk about these top one with ties. And don’t worry, we’re not going to discuss anything lascivious in this video having to do with ties. We’re not going to say any dirty stuff like foreign hand knot.

Because we don’t tie ties like we’re going to our 8th grade dance. We’re adults. We use half Windsor’s because we are grown people with necks. We’re also not in a talking heads video.

So in the post table, for post type ID 3, there are, I think, 167 rows. So if we run this query, we will get back all 167 rows of post type ID 3. All right. 167 right there.

And then the way that top one with ties works, and actually I should show you this first. So the first thing I’m going to do is run this query without the order by. And honestly, this is perhaps something that should happen with any query where top is involved.

But we actually get an error when we do this. The top end with ties clause is not allowed without a corresponding order by clause. Perhaps this should happen for any top that we use.

I don’t know. But the point is that if we order by post type ID, and we’re filtering to post type ID 3, we’re only going to have post type ID.

So there is no tie breaker when we do this. Is there no semicolon there or there or there? I don’t know. I feel very foolish now. But when we do this, we still get back all 167 rows.

If we make that a little bit wider and we scroll on down, without some sort of tie breaker in place, there is no tie to break. This is all post type ID 3 going out through all the results.

So there is nothing to break our tie. We could break the tie very early if we added in a unique column to the order by. So the ID column is the clustered primary key in this table, which means that it is all unique and every row is unique.

And so we don’t get past the first row when we do this. This just says one row. Cool.

We broke the tie early. There were no ties after that. There were no duplicates in ID. So there are no ties after the ID broke the ties very, very early. We could also add in sort of a late tie breaker, right?

So if we run this and we say top, like select the top one with ties, and we order by owner user ID, we don’t get back 167 rows anymore. We get back eventually after waiting some indeterminate amount of time, we get back 164 rows.

So if we come down here, we will see that we only got 164 instead of 167. There are three missing rows here. Now the owner user ID for all of these is negative one, which if you’re keeping track at home, I think that’s the ID for the community bot for Stack Overflow.

So this is all negative one. The only thing that we can infer from this is that something different than negative one eventually happened, and then we broke the tie there, and then we got nothing further back, right?

So we were able to not return three rows that happened after the tie was broken. If you want to see what breaks the tie, you might want to consider using the dense rank windowing function, where you will see what dense rank does is ranks everything together where the partition is the same, and then as soon as the partition resets, then we get a new one, right?

So rather than like row number, which will give us a contiguous number going up, and rank, which will give us like a weird like broken set of numbers when the new, if there are any ties in a new ranking starts, dense rank gives you contiguous numbers, right?

So if we run this, and we say I want the dense rank of all this stuff partitioned by post type ID, ordered by owner user ID, and then we order by owner user ID on the way out, this first column is our dense rank.

So like I said, unlike row number, which would give us like contiguous numbers counting one through whatever until we got to a new partition thing, this just gives us all one, right? And this kind of makes sense for how the tie is broken, because eventually after all these ones, we get down to a new owner user ID, and then we get 234.

Now if we used rank, this would give us non-contiguous results after the tie. I think there’s a start of like three or something. Actually, let’s just find out real quick. All right, we’ll just take the dense out of that, and I’ll show you.

All right, that makes the most sense to do, right? So if we run this now, oh no, it goes 165. So we had 164 rows of one, and then we have 165, 166, and 167.

Boy, was I silly. So learn T-SQL with Eric. So we have these numbers in here.

So if we put the dense rank back, which is what we wanted to do here anyway, then we will get back all the ones, right? And then 234.

And of course the 234, if we scroll across a little bit to owner user ID, we’ll see that’s where we started getting new values back. So this is what top one with ties does. As long as your ordering elements continue to supply the same value over and over again, you will continue to get rows back until you reach the end of the result set.

Or, I mean, presuming in this case that you never return a tiebreaker like we did with just ordering by post type ID, you’ll just keep getting rows back until you reach the end of the result. Only when you add in a column that eventually breaks the tie do you stop returning ties because the tie has officially been broken and you have found all of the ties available.

So anyway, I hope you enjoyed yourselves. I hope you learned something. I mean, I just spaced on the rank function. That’s my bad.

Oh, man. I’m still going to publish this too. That’s where I’m at. So this is how top one with ties sort of works. And this is how you can get lots and lots of results back if you do not have an adequate tiebreaker at some point in your query.

What that adequate tiebreaker is is between you and your database. I can’t tell you what it should be. I can’t tell you what you should use there.

All I can say is use your best judgment. All right. Cool. Thank you for watching. Goodbye.

Bye. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Subquery Unnesting In SQL Server

A Little About Subquery Unnesting In SQL Server


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Forwarded Fetches In SQL Server

A Little About Forwarded Fetches In SQL Server


Video Summary

In this video, I delve into the fascinating world of SQL Server heaps and forwarded fetches. Starting off with a brief detour through my upcoming speaking engagements and consulting projects, including Zoomit’s recent webinar, I transition to explaining why understanding these concepts is crucial for database administrators and developers alike. We explore how rows in a heap can move around due to data growth, leaving behind “forwarded record pointers” that point to their new locations—a phenomenon that doesn’t occur with clustered tables. By creating a sample table and inserting a specific number of rows, I demonstrate the process of updating these rows and observe the resulting forwarded fetches through DMVs (Dynamic Management Views). This hands-on approach helps clarify how nonclustered indexes inherit certain information from the base heap or clustered table, adding depth to our understanding of SQL Server’s internal workings.

Full Transcript

Your friend, Erik Darling here with Darling Data. And we’re going to take a small break from the T-SQL preview material to talk about… Well, I mean, it’s all SQL Server here all the time, isn’t it? We are not traitors to our beloved SQL Server. I don’t know, maybe someday I’ll get bored enough to do some videos about like DuckDB or something. I don’t know. Well, my luck, you know, I do a video about DuckDB and then like Databricks would buy them for $10 billion. Uh, I don’t know. It seems, it seems like the fastest path to becoming a billionaire these days is just to fork Postgres and do something.

So, uh, perhaps, perhaps my business plan for 2026 will be fork Postgres, do something, become billionaire. Seems pretty logical to me. Anyway, uh, today we’re going to talk about something special that happens with, uh, heap tables or, uh, tables that do not have clustered indexes on them in SQL Server called forwarded fetches. Uh, and we’re going to try to answer the question of why SQL Server uses forwarded fetches rather than, you know, well, something else.

Uh, so, I don’t know. Buckle up for that, I guess. We’re here. We’re here. We’re here for it, aren’t we? It’s the whole reason we woke up this morning just to do this. Uh, if you like this channel and you would like to sign up for a membership, you can do so, uh, in the, in the, in the, in the links down in the video description.

So, uh, you should, you should use it. Probably. Uh, if you would like to hire me to do some consulting work for a SQL Server, uh, I, I do, I do that. And of course, according to the, the beer gut magazine magic quadrant, I am still the number one SQL Server consultant in the world outside of New Zealand.

So, uh, we’re, we’re, we’re still, we’re still very proud of this, this magic metric. Um, if you need a health check, performance analysis, hands-on tuning, dealing with SQL Server emergencies, and of course, uh, training your developers, whipping them into the fine shape so that you, you don’t have SQL Server performance emergencies anymore, and you can, you can finally see your family on the weekend.

Well, I’m, I can, I can do all that stuff. And as always, my rates are reasonable. Uh, if you would like to buy some, uh, performance tuning content from me, I have about 24 hours of it. Uh, it’s available for 150 US dollars of just about, or thereabouts, when you use the 75% off coupon code right there.

This is also fully assembled for you down in the video description. Uh, if, if you would like to, uh, get, get in on the presale, for my new T-SQL course, T-SQL with Eric.

It’s a me. Uh, you can, you can buy the, at the presale price of $250. That will go up, uh, after the summer when, when the advanced material, uh, becomes available. Um, and then it’ll be $500.

This is, of course, companion content to the pre-cons that Kendra Little and I will be teaching about T-SQL at Pass Data Community Summit in November. And speaking of speaking, I am doing a whole bunch of speaking.

Uh, but Breadgate is kind enough to, uh, hire me as a, hire me as a full-time roadie on their, their Pass On Tour event. So I’m going to be helping them load in and load out. And, uh, they, they’ve apparently given me an empty room to, to just talk to myself in.

So if you want to come hear me talk to myself about SQL Server, you can do so in New York City, August 18th to 20th, Dallas, September 15th to 17th, and Amsterdam, October 1st to 3rd. And then, of course, for the, the, the, the Grand event, uh, the, the Grand Bon event, not the Grand Mal event.

That’s not, those aren’t good. Uh, Pass Data Community Summit taking place in Seattle, November 17th to 21st. Uh, and with that out of the way, let’s talk about these heaps and these forwarded fetches and, I don’t know, whatever else comes up along the way.

So we’re going to do a little bit of talking first, and then I’m going to, I’m going to demonstrate some things for you. Because what’s, what’s life without demonstrations? One, one wonders.

One wonders aloud about these things. Uh, so with heaps, all right, that is a table that does not have a clustered index on it. And it may have nonclustered indexes on it. And it may even have a non-clustered primary key.

It may have unique constraints. It might have foreign keys, but it is still a heap. It is still an unorganized clump of pages, uh, with no clustered index. SQL Server still requires a way to uniquely identify each row in the heap.

Uh, this is an important thing in database. This is an especially important thing for SQL Server to do things like, uh, lookups, right? When you can still create a nonclustered index on a heap, on a few columns, and you might have a query that, that queries those few columns, plus a bunch of other columns.

And then SQL Server still has to go get those columns from somewhere. And, and SQL Server needs a way to identify which rows to go look up and, and provide those columns for. Um, and, and to do so, it uses something called a RID, a row identifier.

Wow, that is an apt name for something. Remember when Microsoft used to be good at naming stuff? Not so much anymore.

Uh, but that RID is based on three components, uh, three metadata components about, uh, the row itself. That is the file ID of the file that the row lives in. Uh, the page ID of the page that the row lives on.

And the slot ID of the page that the, the row lives on more, more, more specifically. Um, so, uh, we, we have all that going for us, right? Uh, Microsoft has called this, this storage mechanism very efficient, or, uh, incredibly efficient, or some modifier plus efficient.

And I forget the exact wording they use. Uh, but when you have a unique clustered index or a clustered primary key, the key of that index is used instead. Right?

Of a, in place of a row identifier uses the key of the, the clustered index. Uh, when you have a non-unique clustered index, then the key plus a unique-ifier. Unique-ifier is the single hardest word in the English language to spell. Uh, the day that they throw that out at one of those national spelling bees, I’ll start believing that they’re actually using.

They actually mean something. Um, a unique-ifier is a four-byte integer. Uh, Microsoft was a real cheapskate on that one.

Uh, because you can run out of them. But the unique-ifier is a four-byte integer that gets added to rows when a non-unique value is inserted into your clustered index. Right?

So, uh, if you, if you have a, uh, uh, uh, if you decret, rather, if you create your clustered index is, without saying it’s unique, but you never add a unique value, then you don’t have to worry about the unique-ifier. Right?

You maybe should have called it unique, but you didn’t. You know, you forgot the, you forgot the word. Uh-oh. Uh, and so you have this, uh, this potentiality for a unique-ifier getting added. And like I said, the unique-ifier is an integer.

Four bytes. And you can run out of them. There’s even an especially devilish message, uh, that, that tells us about this. Uh, and, uh, that message is, of course, message ID on the, the triple six.

If Zoomit will wake, wake the hell up. Uh, and that message says that the maximum generated unique value for a duplicate group was exceeded for index with partition ID. Uh, that is, that is a big int, isn’t it?

I-64. Uh, and, uh, dropping and recreating the index may resolve this. Otherwise, use another clustering key. Just upend your whole life, ruin your whole weekend, because Microsoft couldn’t be bothered to use a big int. Thanks, Microsoft.

They’re really great at ruining weekends, aren’t they? Those of you with availability groups know this especially well, don’t you? You’re particularly keen on, um, just how many nights and weekends have been ruined by your high availability solution. Uh, but anyway, the point here is to sort of answer the question of why, the, why SQL Server uses forwarded record pointers when rows move around within a heap.

Right? Uh, these are, these are, these are the ones that are working for multiple people. And these of course occur, these forwarded record pointers occur, when a row, the values on a row, grow to the point where that row no longer fits on the page it was initially assigned to.

SQL Server will move it to a new data page, not to a new data page, just to a different data page that has enough space on it. It might be another existing data page that has adequate space, or it might be a brand new page.

Who knows, right? Crazy things happen. But when that happens, when a row moves, it leaves behind this little pointer that says, I have moved here.

I live down here now. I’ve changed, I’ve left my apartment. I’ve moved out. This is my new address. It forwards that down there. So that’s different than what happens when you have a clustered table.

Now, the term clustered index is good because, you know, it is an index. But it does lead people down this strange path of thinking that the clustered index is a separate copy of the table.

It’s just another index on the heap. When it’s not, I find that I find the terminology clustered table prepares people mentally more correctly for exactly what a clustered index is. So when you have a clustered index on a table and a row no longer fits on a page, SQL Server will take roughly 50% of the data pages and put them on a new data page.

All right, that is called a page split. Now, one of the key things about, like, why SQL Server has these rids and uses and does these things is that nonclustered indexes on your tables inherit certain information from the base table.

The base table could be a heap. The base table could be a clustered table, right? In other words, a table with a clustered index on it. When you have a heap, the rid is added to your nonclustered index. When you have a clustered index, cluster table, non-unique, for non-unique, nonclustered indexes, they get added as a key column.

Like after, like, you know, if you have, like, just a single key column index, your clustered index key will be the index key after that. If you have a multi-key index, then your clustered index will be the unique, well, right, sorry, your clustered index will be added to the key after whatever columns you’ve explicitly listed out.

If you have a unique nonclustered index, then the clustered key gets added as include columns to the index. All right. With all that stuff out of the way, let’s move on and let’s create a table and then let’s, we’re going to drop it if it exists, I suppose, and then we’re going to create it.

And then we are going to insert a very special number of rows, it’s not really special at all, into the table. 104, 8, 5, 7, 6. All right.

Okay. That many rows went into the table. There’s nothing special about those rows. And now we are going to add a nonclustered index to that table. All right. Following this, I saw on that date column right there. Okay.

And so we’re going to add this and our table is still a heap. We just have a nonclustered index on it now. All right. Nothing, nothing terribly special about that. Now let’s look at the table itself. I’m going to run this query.

And this table, this query is going to tell us some stuff about our heap. All right. Namely, it’s going to tell us that we are, our table is named El Heapo. It lives in the DBO schema.

Right now we have zero forwarded fetches. We have had 1, 0, 4, 8, 5, 7, 6 records inserted into the table. And we have had no updates or deletes against the table. I’m not entirely sure why this number does not quite match this number, but, you know, what’s a few rows difference amongst friends, I guess.

So to start things off, because our table is just freshly inserted into, there has been no opportunity for rows to move around, for a pointer to occur. All right.

So if we run this query and we look at the DMOS performance counters for forwarded records, this number will be the same before and after this query runs. All right.

So we have, oh boy, that’s tough to frame, isn’t it? We have a counter value. Let’s move that up a tad. Let’s rearrange things a little bit. There we go. That should make life easier. We have 4, 6, 0, 2, 5, 1 up here and 4, 6, 0, 2, 5, 1 down here.

So this counter has not budged. Now, one thing I do want to point out is that the a string column is an envarchar 200, right? So it’s 200 potential bytes of characters.

But when we did our insert, we didn’t fill up every single row with the full size, right? We did some, we did a little bit of funny math on here. And I don’t quite understand why IntelliSense is giving me red squiggles when everything worked correctly.

Perhaps SSMS 21 will eventually fix this issue. I’m not sure, but anyway, we didn’t fill up every row to its potential, which means that we can update rows to their full potential.

And what does that mean? We can have some forwarded records. So what we’re going to do is come over here.

Now, what I want to, what I’m doing in this window is pointing out two things. And this is going to look different when we add the clustered index and come back to this. I’m going to point out two things here.

One is that page splits do not occur with a heap. Heaps are just big flat structures, right? There’s no order to them. They’re not B trees the way indexes are, like both clustered and non-clustered. So we don’t have page splits when we have heaps.

So let’s run this. And what we’re going to do is update a bunch of rows in this table. And we’re going to see after the insert runs that there are no page splits that happen, right?

But because we don’t have page splits now, remember this, there were zero forwarded fetches here when we started. Now, when I rerun that query, we have had a bunch of updates. And those updates have created forwarded fetches, which means when we come back over here, right, and we run this query, which hits our heap table, we are going to have our forwarded fetch counter.

I should remember to move that, shouldn’t I? Our forwarded fetch counter is going to increment before and after, or after this query runs. We start with 546105, and now we have 55184. Now you may notice a slight difference here too, in that all of these columns now have a whole bunch of stuff in them that was not in there before.

The eyes look a little bit shorter, but that’s really only because they’re much thinner. These big fat letters down here take up way more visual space on the screen, but I promise you there are 200 lines going across there.

So anyway, if we look at the query plan for that select, we will see some of the stuff that I had pointed out earlier, where we have a seek into the nonclustered index. Next, then we have a nested loops join, and that nested loops join joins us back to the heap in order to fetch the columns that are not in the nonclustered index that we created.

Remember, the nonclustered index that we created was just on that date column, but we are selecting everything from the heap table. And when we look down here, we will see the seek predicate of the…

This is the bookmark, right? The BMK. And this is the RID that I was talking about earlier, right? So the RID lives in there now.

Now, you’re not going to see anything about the forwarded fetches in the execution plan. But now is as good a time as any to tell you about why SQL Server does forwarded fetches rather than just move the row entirely.

The answer is because of what I was talking about with that inheritance, right? So think about how SQL Server stores a RID, right? It’s the file ID, page ID, slot ID.

If every time we updated a row and it moved, and if we didn’t use a forwarded record pointer, we would have to not only update the heap, we would have to update the clustered index. Sorry, we would have to update all of the nonclustered indexes to, like, rearrange things so that, like, the row identifier was now rewritten to page ID, file ID, slot ID, right?

So it saves a lot of work with nonclustered indexes to do that. Now, I guess, you know, if you have a heap with no nonclustered indexes, it might make sense to just, you know, like, rewrite the thing since it’s just the heap anyway, but we are not that lucky.

But, oh, stop zooming, fool. But if we look over here, right, and if we look at the execution plan for the update that we did, right, like, we certainly scanned the heap and we certainly update the heap, but we do not also update the nonclustered index because we updated that string column, but we did not update any columns that are part of the nonclustered index.

So we only had to update the heap for this. So that is why SQL Server uses forwarded fetches. It is an optimization to help us from, help us not need to do unnecessary write work to nonclustered indexes to now, like, rewrite the, all of those so that they have to identify rows with the new file ID, page ID, slot ID of where the row moved to.

We can just use a forwarded record pointer to go find that row when we scan through the table and hit it. Now, what we can do to get rid of those forwarded record pointers is rebuild the table, right? Oops, we are not in the right database, are we?

Shame on me. But when we, when we, when we, when we rebuild the table, notice that we rebuild both the table and any nonclustered indexes on the table. I don’t know why that just showed up there.

I’m not sure what key combination I hit. So this is rebuilding the, the heap, right? We rebuild the whole heap here and we rebuild the nonclustered index on the heap here. Doing this will get rid of the forwarded fetches.

This will rewrite both of the indexes so that we, this will rewrite both of the indexes so that the new file ID, page ID, slot ID for all of these things just kind of, you know, is now the new row location, right? We can remove all the forwarded record pointers, rebuild the heap and like to get rid of the forwarded record pointers, rewrite the new rows with the new file ID, page ID, slot ID, and then rewrite the, like rebuild the nonclustered index so that it has that new link to it. So let’s come back a little bit.

And now let’s, let’s, what I’m going to do is run this whole portion to drop the table. And now this is going to be a clustered table. And because this table, because this table has a clustered index on it, this query is no longer going to return anything.

But what, what, and you know, like if I run this query now, rather than having a bookmark lookup, we have a key lookup. This is looking up based on the key of the clustered index rather than a bookmark or a rid lookup for the heap. But what I, what I do want to do real quick before we close this thing out is show you what an update does when there’s a clustered index.

So before when we ran this, we had the results, there were no page splits before and after the update query ran, or rather there were no, there were no new page splits after the update query ran. But now when we run this with the clustered index, you will see that there were quite a few page splits that occurred. Now, are page splits good, bad, ugly?

Well, I guess, you know, if you, if you, if you have enough of anything happening in a database, it can get bad. Right. You can certainly produce a lot of extra work if you have a lot of page splits happening.

I guess transaction logging and, you know, IO and stuff, but whether it’s, whether it’s, whether it’s a significant enough problem for you to attempt to address in some way is, is left as an exercise to the reader. So anyway, that is about as much as I can fit into this video about why SQL Server uses forwarded fetches in heaps rather than like doing anything else. And sort of some differences between clustered tables and heap tables.

I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we’re going to talk about a little bit about sub query unnesting and why the optimizer is kind of a goofball about that unless you, unless you do some extra stuff. So anyway, I’m going to just throw this out there and we’re going to go, go do this.

Honestly, we’re just going to wait for this to upload and then record another video. So whatever. I’m going to catch my breath first.

Anyway, goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Derived Tables

Learn T-SQL With Erik: Derived Tables


Video Summary

In this video, I dive into the world of T-SQL and specifically focus on derived tables, explaining why they are a preferred tool over common alternatives like Common Table Expressions (CTEs). Derived tables not only simplify complex queries but also provide a cleaner way to reference expressions across different parts of your SQL statements. I discuss how logical query processing works in SQL Server and highlight the limitations of using aliases in certain clauses, such as `GROUP BY`, which can make writing clean T-SQL code challenging. By leveraging derived tables, you can avoid redundancy and improve readability without affecting the underlying query plan or performance. I also touch on some personal frustrations with modern tools like SQL Prompt that try to analyze my code, emphasizing that sometimes old-school techniques are still the best approach.

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we are going to talk a little bit more about the old T-SQL and why you should probably learn it from me and not some other useless hump out there trying to pretend that they know about T-SQL. Really just stinks at T-SQL and probably just had an LLM write everything anyway. And in today’s video, we’re going to talk about T-SQL. We’re going to talk about derived tables. Derived tables are, of course, my preferred, my generally preferred mechanism over CTE for a variety of reasons. Mostly, number one reason is irking people who white knight for CTE. That’s top of the charts for me. But they can make queries, they can, rather, they can simplify a lot of things in queries that would otherwise require quite a bit of syntax. Now, part of the reason why derived tables are necessary is sort of because of, again, the way that logical query processing does stuff. Like, we talked about how, like, you know, you have, like, it starts at, like, from rather than select and, you know, makes it way, like, down to joins and where’s and group by and having an order by or select and then order by. And because of that, there are certain things that you write in the select list that aren’t available in the group by, like, unless you nest them, right? Unless you nest the query and you give, like, another from to start the logical query processing thing over with again.

Which, you know, like, I make fun of people who, like, are like, we’re rewriting SQL and we’re gonna have it so you write from first and, like, you can pipeline syntax. Because, like, every time someone critiques SQL, the only two things they ever come up with, despite, like, decades of, like, time that they could have, like, thought of something different. It’s always the same two things. It’s always, oh, from first, pipeline syntax.

Now, the pipeline syntax thing I sort of get, but it’s the way that they, the way that they name it is stupid. You know, it just looks like crappy PowerShell and I hate it. DuckDB has a much smarter name for it.

They, DuckDB calls it, like, like, function chaining or expression chaining where you can, like, like, take, like, in one select list, you can take, like, like, the result of one function, like, alias to something and then use it in another, like, function called within the exact same select list. Oracle also has the ability to reference aliases in the group by clause.

So you don’t have to rewrite expressions in the group by clause, which is amazing because, like, you can write some pretty, like, gnarly expressions and all of a sudden you’re like, wait a minute, I have to group by that now. Jeez, wait. And so, like, you just, like, you have to, like, copy and paste it and remember to take the aliases out because you can’t have, like, you can’t, like, group by some expression, like, or rather, like, some, something equals some expression and my SQL Server freaks out.

It can’t handle it. So derived tables can, like, at least for the way that T-SQL is engineered today, be useful for simplifying that stuff. What they don’t do, though, is change the way that your query plan physically looks, right?

Because SQL Server still has to, like, process the query the same way, just with an outer sort of layer of things. So this is the query that we’re going to start with, right? And we’re going to select post ID and we have this upvotes thing, which is a sum, and then this downvotes thing, which is a sum.

And then, like, we’re not going to bother with the group by thing in here because we’re not going to group by those, right? Those aren’t worth, those aren’t things that we need to group, need or want to group by for this query. But if we wanted to write a having clause that did some math on those, like, we would have to write a, oh, get out of here.

We would have to write a lot of, like, extra code to do all this stuff, right? Like, this is a, just honestly, this is a nightmare, and it makes me completely understand why people get mad at SQL. What makes things worse is that, like, when you talk logical query processing, like, you can absolutely reference expressions in the order by most of the time.

But if we were to unquote this, we would immediately be greeted with red squiggles. And if we tried to run the query with, like, the alias downvotes minus upvotes descending, this would, like, this, like, we would get errors there. So we have to, like, actually redo the whole order by thing in order to do that.

Where derived tables come in, it can be useful is to sort of simplify that thing. For some reason, oh, man, so, like, I just switched to, like, the SSMS 21 general availability, and I just upgraded SQL prompt to a new major version. But it’s revolting on me, and it’s trying to analyze my code, and I don’t need your code analysis.

SQL prompt, I know what I’m doing. I teach the T-SQL here. You don’t teach the T-SQL to me. SQL prompt.

But what we can simplify things a bit with a derived table is we can just write our inner query like we would normally here. But then outside of the query, we can reference these aliases, like, what I think is in a lot more clean way, right? Just where upvotes is greater than zero and downvotes is greater than upvotes times 10.

And then we order by downvotes minus upvotes the way we did in the thing. But it was all, like, expressions, and it was a big mess. It was very, like, it gets very unclear and very tangled up very quickly.

But what I want you to notice about the query plans for both of these, if the good Lord will allow me to live long enough to highlight them both, is that they’re the same, right? Like, this doesn’t change the query plan. This doesn’t make the query faster, right?

This doesn’t help SQL Server, like, do anything. Just like CTE, the results of derived tables aren’t materialized in any way. It doesn’t matter as much for derived tables because you can’t re-reference derived tables in a way that you have to re-execute the query.

But, like, the point is that, like, you still have to apply filters the same way to this. It doesn’t help you, like, make, like, push filters down, like, any step further, right? So, like, if we look at what this filter does, it’s just, like, where expression is greater than expression times 10 and expression is greater than 0.

SQL Server has to do the same thing here, right? Like, it’s the exact same filter that we have to apply to both of these, right? And, like, we still have to sort by up here, right?

When we look at what we’re, like, this is our order by clause. It’s expression 1, 0, 0, 3 descending. And this one, it is still expression 1, 0, 0, 3 descending. So no matter which way, like, you write the, like, this is more like a query cleanliness.

This is, like, a hygiene thing that this makes, like, when we talk about things that makes queries, like, easier to read and easier to understand, like, this is what does it for me. It’s, like, having these expressions just written once in the main part of the query and then, like, just being able to reference those aliases because we have an outer, we have, like, we have the nesting. We have the derived table and then we can talk to those aliases outside of the derived table, right?

So that’s where they really come in handy. That’s, like, where they can really make a big difference. Now, there’s all sorts of reasons why you would put a derived table in a query, of course. There’s many, many uses for it.

This is just one kind of, like, good sort of, like, code hygiene cleanliness one. So important things about derived tables, they’re much cooler than CTE. Good job on those.

And, of course, they can make your queries a lot, like, more compact, cleaner, easier to read and understand. And they can, like, you don’t, like, it’s not going to change performance. It is going to change your performance.

It’s going to change how fast you can figure out what the hell that query is doing. So thank you for watching. Hope you enjoyed yourselves. I hope you learned something. All of this content is still available at the presale price down using the link in the video below.

It is still companion content to the pre-cons that Kendra Little and I will be teaching in Seattle. And if you are attending those, of course, you will get free access to this content with pre-con admission. So I think that’s everything, right?

Do I have anything else to say? I don’t think so. I don’t know. Check out this neat SSMS 21. Yeah.

Yeah. Look at that. Real, real nice looking. Dark mode. Got co-pilot up there that sucks. God, what a piece of crap that is.

Anyway, before I get too far off track, I’m going to go now. All right. Goodbye. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 18

SQL Server Performance Office Hours Episode 18



To ask your questions, head over here.

What are the pros and cons of using clustered indexes on #temp tables (as opposed to non-clustered)
I am a junior DBA, I have been reading about performance tuning online. Can you pls let me know how do i start looking for the queries that are worst performing(timing out) and how do I see the actual execution plan of them?
What’s the best way to deal with ascending key issues? Would using OPTIMIZE FOR UNKNOWN or local variables be a good approach?
Hi Erik! Azure SQL Database vs SQL Server on Azure VMs. What do you think are the pros and cons of Azure SQL Database?
I’ve fallen in love with adaptive joins. They’ve solved problems with parameter sensitivity for me. I’ve never seen you mention them in that context. Any reason why?

Video Summary

In this video, I delve into the world of temporary tables and indexes, specifically addressing the pros and cons of using clustered versus non-clustered indexes on temp tables. I also share tips for identifying poorly performing queries through Query Store and offer advice on dealing with ascending key issues in SQL Server. Additionally, I provide a candid opinion on Azure SQL Database, expressing my strong preference for Azure on a VM over Azure SQL Database due to its numerous shortcomings. The video wraps up with a Q&A session where I answer five questions submitted by viewers, covering topics from performance tuning and query optimization to the nuances of adaptive joins and parameter sniffing. Whether you’re a seasoned DBA or just starting out, there’s plenty of valuable information here to help improve your SQL Server skills.

Full Transcript

Erik Darling here with Darling Data. And you might be able to, I don’t know, there’s a background today. So, we’re doing Office Hours. This is where I answer five entire whole questions that you submit. Where do you submit them? That is by itself a great Office Hours question. If you go to this link, which is down in the video description, you can submit questions that I will answer here. Okay? So that’s exactly how it works. No doubt. It’s very different any other time. Usual stuff. If you’d like to support my channel, you can sign up for membership. Also down in the video description. If you like the content, but not like, you’re not like, wow, I would totally pay this guy to keep talking. Proposition there. You can like, you can comment, you could subscribe and all that other good stuff. If you need help with your SQL Server beyond, far beyond what a YouTube can do, then I am available for for consulting help, health checks, performance analysis, hands-on tuning of your SQL Server malfeasance, dealing with SQL Server performance emergencies, and of course, training your malfeasant developers so that you have fewer SQL Server emergencies. If you would like to buy my performance tuning training content, you can do that for 75% off. That’s that combination of link and code there, which is hopefully assembled for you down in the video description.

And if you would like to get in on my new T-SQL course, you can buy that at the pre-sale price. Videos for that will start dropping. Actually, by the time this publishes, they will already have started trickling out into the internet. If you’re going to attend, well, I mean, that information is obviously wrong. Things that held up a little bit with tech review, and there’s a funny little quirk with Podia, where when you put a release date on something, you can’t release before the release date. So ignore that, because that’s wrong. Obviously, obviously. If you’re attending Kendra and I’s past pre-cons in Seattle, you will get access to all this material. This material is considered companion material to that. So, you know, get in on all that stuff while supplies last. If you want to see me live and in person, I will be at all three Pass On Tour events. That’s New York City, Dallas, and Amsterdam. You can see the dates right there. If you can’t, for some reason, send me an email, and I’ll call you and read them to you.

And, of course, Pass Data Community Summit in Seattle, November 17th to 21st. That one’s important enough to say the dates for, I guess. But with that out of the way, let’s answer some questions from you, my lovely users. So here’s the first one.

What are the pros and cons of using clustered indexes on pound sign temp table as opposed to non-clustered? Well, if I’m going… So if I have decided that I am going to test indexing a temp table, perhaps because that temp table gets loaded with a significant enough amount of data, or the way that the temp table is queried tells me that an index would be appropriate here, I am only going to start with a clustered index.

Why? Well, great question. I’m glad you asked. When you create a nonclustered index on a temp table that does not have a clustered index, you have a heap, and you have a nonclustered index. Typically, your nonclustered index is going to be on a rather narrow array of columns.

If that is completely satisfactory to your query, well, maybe that’s good enough for you. But if it’s not, if your nonclustered index is not what we call a covering index that covers all the columns that your query requires, then SQL Server is completely at its own optimization will to ignore your nonclustered index and scan the heap anyway.

So if I was going to offer any advice here, it would be rather than starting with a nonclustered index, start with a clustered index. And if you’re going to start with a clustered index, my preference is to create the clustered index after populating the temp table because it’s generally easier to get a fully parallel insert into the temp table, which can be meaningful if you are loading a lot of data into it.

That’s a lot easier when you just have the heap. And when you create the clustered index afterwards, there’s an additional benefit that you create the index with full scan statistics of the temp table rather than just getting sampled statistics for the temp table when a query finally hits it in a way that SQL Server needs to do cardinality estimation.

So there we go. Next question. I am a junior DBA.

I’ve been reading about performance tuning online. Well, that’s your first mistake. There’s nothing but bad advice online. Can you pose? Let me know. How do I start looking for the queries that are worst performing? And how do I see the actual execution plan of them?

Well, two separate questions there. My big preference for finding poorly performing queries is to use Query Store. I’ve got my pre-store procedure SP Quickie Store available at code.erikdarling.com.

That is great for mining through Query Store data by default. It gives you the top 10 queries that have used the most average CPU over the past seven days. There are, of course, all sorts of parameters that you can use with my store procedure to search through Query Store data that are not available in the Query Store GUI because Microsoft hates you and I love you.

So, that’s the first part. I prefer to look in Query Store for that. And then how do I see the actual execution plan for them?

Well, Query Store is a bit on par with the plan cache for, like, which query plan it shows you, which is, like, the cached plan or the, like, estimated pre-execution plan, which doesn’t have all the actual runtime metrics in there. So, if you want that, you still have to run the query and get the actual execution plan. Now, if the query is so poorly performing that, like, you can’t get it to finish, you can still cheat a little bit here.

So, what you’d want to do is start running the query, but turn on actual execution plans for the query. Don’t turn on live query plans. Live query plans are bugged pain in the butt.

Turn on actual execution plans and start running the query. Then, in another window, use spwhoisactive with the get plans parameter set to true. And then what that’ll do is look at, like, the in-flight query plan.

And sometimes, like, you can catch enough of the bad stuff that’s happening, like, while the query is running. Like, it’ll start updating with metrics on, like, wait stats and, like, operator times and stuff like that. So, you can get, like, a partial view of the actual execution plan while the query is running.

You, of course, could do other stuff if you know what queries you want to catch. It sounds like you don’t, but if you knew what queries you wanted to capture, then I would recommend sort of targeted extended events to capture specific, like, post-execution plans for specific, like, store procedures or queries or something like that. But it sounds like if you’re just on the hunt for ones that are bad, then I would do query store to find the worst-performing ones.

All right. Another one here. What’s the best way to deal with ascending key issues? Would optimize for unknown or local variables be a good approach?

I mean, like, you hate to say no because, like, clearly if you’re dealing… Well, actually, this is an interesting one. So, like, the local variable or optimize for unknown thing would not help specifically if you are using the new cardinality estimator or, as Microsoft so presumptuously calls it, the default cardinality estimator.

Because for ascending key issues where… I assume you’re talking about, like, index key values that are off the histogram, right? So, like, let’s say your histogram goes, like, it’s 200 steps, so let’s just pretend it goes from 1 to 200.

You’re worried about numbers that are over 200 that are not in the histogram, right? So if you’re using the new cardinality estimator, you are already getting the density vector guess, which is the sort of, like, nerdy name for optimize for unknown and local variables. So you’re already getting that.

If you’re using the legacy cardinality estimator, you usually get to guess one row for those off-histogram steps. So, like, optimize for unknown and local variables, you would get the density vector guess regardless of what the off-histogram value is. So, but I don’t know if that’s going to be a good guess for you either, right?

So, like, I don’t, like, I hear that and I’m immediately like, well, I’ve just seen optimize for unknown and local variables cause too many bad cardinality estimates. They still might be better than the guess of one that the legacy cardinality estimator gives you for those things. But, like, I still don’t think they’re going to be, like, great all around for all the other queries that are actually hitting the histogram, right?

Like, those you’re like, well, use the histogram, buddy. So, like, my first approach would be probably more frequent stats updates or, like, using, like, there’s, like, some database scope configuration stuff and some trace flag stuff you can do to help SQL Server with the ascending key stuff. But, like, I would, like, I have clients where I update, like, I’ve set up stats updates jobs, like, specifically on some statistics for, like, like, every 20, 30 minutes because data is in flux so much and, like, bad query plans happen frequently for those.

We’re like, like, just updating stats to keep, like, keep up to date with a few specific ones was the best option. So, like, don’t overlook that as a possibility. All right.

Next question. Let’s see what we got here. Hi, Eric. Azure SQL Database for SQL Server on Azure VMs. What do you think the pros and cons of Azure SQL Database? What’s that old saying?

I’d rather have a sister in a whorehouse than a brother in the Navy? Well, that’s how I feel about Azure SQL Database. I would rather not have a brother or a sister using Azure SQL Database.

I hate that thing. I mean, it’s hard to tell if I hate it more or less than managed instance. Maybe less only because I had such high hopes for managed instance and Microsoft screwed that one so badly.

Like, oh, God. Like, whoever is in charge of that. I don’t know. It’s, I don’t know.

Bad stuff. Anyway, no. I would much prefer Azure on a VM to Azure SQL Database. Anytime someone tells me they’re using Azure SQL Database. Like, aside from the fact that, like, Microsoft was like, hey, the default isolation level here is recommitted snapshot isolation because we don’t want to deal with your stupid blocking problems.

Like, that’s nice, but, oh, the rest of it. Oh, the rest of it. Mm.

Mm. Have you ever seen a company, like, screw up an offering for their own product as badly as Microsoft has screwed up their platform as a service for SQL Server? Like, what happened?

Like, how is AWS so much better at this than you are? It’s embarrassing. Like, your pants left you. Like, how?

How did that happen? Oh, it’s a joke. Anyway, I’ve fallen in love with adaptive joins. Well, there’s someone out there for everyone. In the case of adaptive joins, there’s two someones apparently out there for you. There’s hash joins and nested loops joins.

They’ve solved problems with parameter sensitivity for me. Well, that’s nice. I’ve never seen you mention them in that context. Any reason why? Well, there are a few reasons why.

And to start with the reasons why, you kind of have to start with where adaptive joins really kick in, which is, of course, like, you know, either like something where, like, columnstore is involved because, like, you know, you do need something batch mode-y for adaptive joins to happen. And then where batch mode on rowstore kicks in, which is compat level 150 plus enterprise edition only. And only when SQL Server’s internal heuristics say, hey, I think some batch mode on rowstore might be good here.

And, hey, we could use an adaptive join here because we’re using batch mode on rowstore. So, like, there’s times when it might kick in and be very helpful. Even in some of my demos about parameter sensitivity.

I, like, you don’t have to use the old compat level to have them still work. It’s just that they work differently, right? So, the thing with the heuristic thing, like, if you create, like, some sort of columnstore index or object so that you sort of force the optimizer down the batch mode path, that’s one thing.

If you don’t and you’re relying purely on batch mode on rowstore, my experience with it is that when you use the little, when you compile for the little plan first, you don’t get batch mode on rowstore very often, right? Because SQL Server’s like, this is a tiny amount of rows. We don’t need batch mode anything.

We’re just going to do some nested loops, join some key lookups, and move on with our life. So, like, my experience with it is that, like, if it happens to kick in and solve a problem for you, that’s great. But you have to consider situations where the query might run and batch mode on rowstore wouldn’t kick in and you wouldn’t get an adaptive join where all of a sudden you’d be using, like, the old, like, just row mode, no adaptivity type execution plan, and it wouldn’t work for you.

So, like, if you’re going to consider that, like, a solution for parameter sensitivity, you better think real hard about, like, what luck you ran into or what you did to get SQL Server to consistently use a query plan where adaptive joins happen. So, like, when they kick in, they can absolutely be useful because SQL Server’s like, well, I’m going to start with this hash join, and if I don’t get enough rows, I’m going to switch to nested loops, which is great because you have that choice in there, right? Like, SQL Server’s like, oh, do this.

But if that doesn’t kick in, like, reliably for you, which I find is the case for a lot of the stuff I do, that, like, the heuristic-based stuff doesn’t kick in reliably enough for me. And then when I suggest a non-clustered columnstore index, people are like, oh, but I’ve read 17 million blog posts about why I shouldn’t use non-clustered columnstore indexes because I do an update. Oh, dear God.

I updated a data, and now I can’t use a columnstore. Like, make things much harder on yourselves than they need to be most of the time. All right.

Anyway, that’s five questions, right? One, two, three, four, five. Cool. I feel like I’ve done my good deed for the day. I’m going to call this one here. Thank you for watching.

Thank you for sending in questions. I hope you enjoyed yourselves. I hope you learned something. And I will see you in another Office Hours video probably, probably, like, next week. All right.

Cool. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.