How To Write SQL Server Queries Correctly: Apply

How To Write SQL Server Queries Correctly: Apply



Thanks for watching!

Video Summary

In this video, I dive into the world of SQL Server’s `APPLY` operator, specifically focusing on `CROSS APPLY` and `OUTER APPLY`. I explain how these operators can transform your queries to be faster and more efficient, especially when dealing with complex operations like windowing functions or derived joins. Whether you’re looking for top N rows, implementing row numbering, or performing pivot-like transformations, this video covers it all. I also highlight the importance of having appropriate indexes in place to leverage `APPLY` effectively, as well as discuss scenarios where using `APPLY` can lead to better performance than traditional derived joins. By the end, you’ll have a solid understanding of when and how to use `APPLY` to optimize your queries.

Full Transcript

Erik Darling here with Darling Data. Boy, am I hungry. It’s been a long day. In today’s video, we’re going to talk about, continue with my series, How to Write Queries Correctly. In this one, we’re going to cover the usage of apply. That’s cross-apply and outer-apply and different ways you can use cross-apply and outer-apply to make your queries faster and better and, I don’t know, do things like that. I like talking about apply. I like talking about apply. Hopefully, you’ll like me talking about apply. Hopefully, it just works out for everyone, right? Anyway, before we do that, we of course have some monetary concerns. We have to talk about fiscal policy here. If you would like to become a member of this fine channel, and maybe say thank you for the billions of hours of content that I produce, you can become a member. Join like, 30-something other people who have become members by clicking the first link in the video description down over there. It says become a member. If you are, I don’t know, if you are hawkish on your financial policy and you say, four bucks a month, Erik Darling, jeez, I don’t know. You can like, you can comment, you can subscribe. You can make my numbers go up in other places. Hopefully, it’s not going to be like blood pressure and cholesterol, because when those go up, bad things tend to happen.

Anyway, let’s not get grim. If you need help with your SQL Server, I am a SQL Server Performance Tuning Consultant. Some might even say they work at BeerGut Magazine, the best SQL Server Consultant in the world. I am available for all of these things and more. And as always, my rates are reasonable. If you would like to get access to my paid training, my paid SQL Server Performance Tuning training, you can get all 24 plus hours of it for about 150 USD.

And that lasts for life. It’s not a subscription. You have to renew every year. You buy it once and you have it. And then you can watch it and get better. And then you can be good at SQL, like me. Link, coupon code, done. No events. 2024 is done for events. 2025, I’m watching you. If there’s an event near you and you think Erik Darling would be good at that event, well, if you tell me what that event is, I’m not psychic. I have many things. I’m not psychic. I cannot magically guess which event you think I should go to. Let me know what it is.

And maybe I’ll be able to go. Who knows? I don’t know yet. Because you haven’t told me. All right. Now, let’s talk about apply. Now, I end up converting specifically a lot of derived joins, particularly ones that have windowing functions in them, to use apply instead for very specific reasons. Sometimes you need to create an index to support that thing. But mostly you want to avoid the eager index pool.

You need to at least be able to seek into an index on the inner side of a nested loops join to have that make sense. But particularly ones where row number is involved, it makes a lot of sense for reasons that I’m going to explain. All right. You will get a full explanation, but I just want to let you know that that’s usually where, for me, the apply stuff shines.

There are many other great reasons and places and things you can do with it. But that’s the one that I end up fixing the most. Now, when apply is most useful is if you have a small outer table and a large inner table.

Right. Because you want to have a small number of rows on the outer side of a nested loops join. And you can use that small number of rows to get to the inner side of the nested loops. Right. You don’t want nested loops for like you don’t want a big table on the outer side of nested loops.

And you don’t want two big tables involved with nested loops because you’re in for a bad time if you do. If the amount of work that like the query that goes into the apply is rather complex or does something that is computationally complex, windowing functions being one of those things, that’s another very good reason to use apply instead.

If I have very specific query goals that make apply pretty much the smartest way of doing things. Sometimes it’s like, you know, saying top three or offset zero rows fetch next three rows. Other times it is using a windowing function to filter out like where the windowing function is less than or equal to three.

Really, that depends a lot on data distribution density, stuff like that. That would be another good reason to use apply. If I am really trying my hardest to get a parallel nested loops plan, apply is usually a good way to do that.

If I need to replace scalar UDF in the select list with an inline UDF, that might be another good place to use apply. And if I need to use the values construct to do some surgery on one or more columns, that would be another good reason. We’ll talk through most of this stuff.

A lot of it is situational and it does require some practice to get familiar with it and know when the appropriate time to use apply is. Both cross and outer apply can be used in very similar ways to subqueries in the select list with the added bonus that, you know, like we took in the last video about subqueries. We talked about how you can really only return one row with them and you can only return one column with them.

With cross apply and outer apply, you don’t have those limitations. You can return multiple rows and multiple columns. That’s why I like the top end per group thing is really popular for apply.

What you really want to think of when you’re choosing which apply to use is cross apply really should be called inner apply because it’s like an inner join. And outer apply is actually appropriately named because it’s sort of like an outer join, right? Outer apply does not restrict rows.

Cross apply does. So here’s here’s sort of a simple example. Now you could use top three. You can use offset fetch in here. But let’s say that I just wanted to get the top three user the top three posts for a user.

I can do that with with this query pretty easily. Now. These are the these are the results.

Some of them. You will have three. Some of them you won’t. There might not be three for everyone. But for the people who do have three, you will get them. So that’s nice.

Right. There’s not really a great way to like insert a dummy row if if you just want like a third thing to show up for everybody. But no, whatever. Neither here nor there.

You can also use row number to do something similar if you don’t have good indexes in place or if your data distribution just sort of it just makes more sense to to use row number instead. There are some pretty good reasons to use row number if you are on if you’re in a higher SQL Server compat level where batch mode on rowstore is available. Or if you like, you know, can get batch mode involved using a trick with like a temporary table or an empty filtered non clustered columnstore index.

Because you can see the window aggregate from batch mode show up. And that’s a lot faster than the typical arrangement with row mode windowing functions where you’ll sometimes have a sort to put data in order for the partition by order by. And then the segment segment the sequence project.

The window aggregate is a lot usually a lot faster than the like the batch mode window aggregate is a lot faster than the row mode equivalent of those plans. So there are lots of good reasons to use apply depending on like and we’re, you know, we’re using apply in both of these just this one’s with row number and this one is you can again, you can use top or offset fetch. What what really drives the decision here is if this is fast enough, then cool, use this.

If this if that’s not fast enough, then you might want to think about using row number instead of the top with offset fetch. Really, like I said, it’s going to depend on like data distribution density, things like that. And, you know, getting like like top and offset fetch don’t really get like the batch mode benefit that windowing functions do.

So if if batch mode is a goal, then the windowing function will probably be faster there as well. So like for those queries, we’re just getting everyone from the users table who posted a question in the final days of 2013 ordered by when it was created and reputation and some other stuff. But the I guess the point is that this produces essentially a tabular result, right?

This produces like a second table that you’re joining to. And for everything that we find in here, for every row that we find that that meets this criteria, we apply this logic to every row. Right. So we can get multiple columns back.

We can get multiple rows back. We can get lots of stuff back. Now, one thing that I think is probably worth pointing out is like when we were talking about exists and not exists, one thing that I said is that it doesn’t matter if you what you put in the select list of exists and not exists because SQL Server just throws it away. Okay. Sort of in the same vein, it’s okay if you use star in apply or outer apply in the inside of the applied part of the query, because whatever columns you actually pull out of it up here, those are the only ones that the SQL Server is smart enough to realize you’re not selecting every column out of the post table.

SQL Server does some figuring when you first send it a query and it says, oh, even though there’s a select star in here, I know that in the outer select, I’m only getting title score creation date and last activity date. So I know that I like, I don’t actually need to treat this like a select star query. So using the select star inside of this is not a big deal because I’m not using select star in the final outer select slash project front for the results.

So that’s nice there. It’s sort of like how I have a select star here and a select star here, but SQL Server is like smart enough to realize that only these columns are involved aside from, you know, the stuff that I’m using inside of the query. So because the user’s table is correlated from ID to owner user ID in the post table, we do need to make sure that we at least have a good index that leads on owner user ID. So for every trip that we, for every time we apply that query to what’s in the user’s table, we have an efficient way to seek into that index and find the rows that we want.

You’re going to have some additional considerations with the windowing function thing, or if you have a top or offset fetch with a, with an order by in there, because you’re going to want to figure out how to not sort data every time, probably. Right. So just a couple notes on that. Another neat thing you can do with apply is sort of, you can do like a mock pivot and unpivot with apply. Itzik Ben-Gan has a lot of great videos on this. If you’ve never seen him present on apply, I would highly suggest just looking, looking for either, you know, his blog posts about apply or his videos about apply.

Pretty much anything where he talks about T-SQL is magical. But one thing that you can do is you can use this cross apply with a values clause to sort of combine the creation date and last activity date columns like this. And you can use that to sort of like pivot on them like this, like you’re turning each of these, you’re turning each of these, like each of these columns into a single column, right? So creation date and last activity date are two separate columns, but using values, we can pass them in as a single column and we can do, we can mimic the greatest and least functions that SQL Server 2022 added.

So like with SQL, if you’re on SQL Server 2022, we could just use greatest and least to figure out which value is higher or lower, the greater or the lesser. But with older versions of SQL Server that were the greatest and least functions aren’t available, which is weird because they’ve been around in other databases like forever. You can do something like this and this ends up with a pretty neat and nifty query plan.

We get all the stuff that we care about out of here. You’ll just have this constant scan, which you’ll see has just about twice as many rows as this because we need to basically make one long list from creation date and last activity date. And then we just aggregate to figure out the min and the max from those.

So you can use apply with the values clause for a lot of really powerful stuff. The choice to use apply really does depend on the goal of the query and the goals of the query tuner. It’s not always a magic performance tuning bullet, but under the right circumstances, it can really make things a lot faster than doing something like a derived join.

The choice of cross apply or outer apply, of course, comes down to query semantics. If you want the apply to restrict rows or filter rows, you want cross apply. If you want to do the equivalent of like an outer join, then you want to use outer apply.

One important difference in how the joins are implemented is in the optimizer’s choice between normal nested loops where the join is done at the nested loops operator and the apply nested loops, which is when the join keys are pushed to the index seek on the inner side of a join. When you see that, when you get apply nested loops, you can tell because when you highlight, when you hover over the nested loops join, you get the little tool tip that pops up.

Down at the bottom, you’ll see something that says outer references. And those outer references are the seek predicates being pushed into inside of the nested loops join rather than having them apply at the nested loops join. There’s a great post by Paul White about apply nested loops.

Again, if you’re feeling googly, definitely look for his post on Paul White apply nested loops because you’ll learn a lot about that there. Now, the optimizer is capable of transforming an apply to a join and vice versa. It will generally try to rewrite apply to a join during initial compilation because there’s more searchable plan space for that type of join.

If you transform to an apply early on, it may also consider a transformation back to an apply shape later just to figure out what would be cheaper. But just writing a query using apply does not guarantee that you get apply nested loops instead of just regular vanilla nested loops. Having good indexes in place is really like generally what tips the optimizer towards using that.

Now, there are a couple of things that, well, because we was talking about Itzikbengan earlier. There are a couple of cool things that you can do with apply that make life a lot easier. One of them is kind of what I showed you with greatest and least, except you can expand that to do lots of fun things, right?

Like finding the min and max per user. This isn’t a terribly fast query, but that’s okay for this one. We just start off, we start off by doing sort of what we did in the first query where we get sort of an initial min and max from things.

And then we use a slightly, I mean, it’s not even convoluted. It’s just something that we have to do some additional aggregations out here to have that group by ID and display name. Otherwise, we would have multiple rows, right?

We would get multiple rows back. Because, again, not like exists where you just get one thing. You know, like basically for every post in the users table, this is going to generate a row. So users who have multiple posts will generate multiple rows and we want to collapse that down just to get the min and max.

Another really cool thing that you can do with cross apply and continuing to use the values clause is you can sort of like, again, this is something Itzik talks about in his things. I think he’s right that this is a very neat trick, is you can do stuff like get the year that each of these things happened in.

All right. We have like the creation year and the last access date year. And then we could just assemble like the beginning and end span from that.

So we can use date from parts to take a year and just say 0101. Oops, that didn’t go well. Just say 0101 here and say 1231 here.

And we can get sort of like a span of time. So, I mean, some of these are more interesting than others, right? Like, well, like a lot of these are 2008 through 2018.

So it doesn’t really show off how cool this is. But this one, you know, we get the correct 2008 to 2017. Where are some good ones in here?

I don’t know. Some of these query results are just boring. But very neat things that you can do with apply and with values that can actually, and I’m going to talk about this when you get into CTE a bit more. With CTE, you typically have to like keep stacking them.

And, you know, the more complex things you have to do, the worse that stacking gets and like passing results and aliases down from one to another. But with apply, you can generally do stuff a lot more cleanly without having, without the fear of CTE executing the query inside them more than once and causing a giant cascading awful of query plan. So just a little bit about cross apply and outer apply there.

Very, very useful query techniques for all sorts of things. Most common use is sort of a top and per group thing. But, you know, there are lots of other cool uses for them that, you know, really, if I had all day to talk about apply, I could show you a lot of things.

But trying to keep this sort of short and basic so folks understand kind of what they are and how to use them. I don’t want to overcomplicate that and get into the really crazy stuff because I would probably lose a lot of people. So I don’t want to lose anyone.

I’ve lost enough in my life. I don’t know. We don’t need to talk about that. But you know who you are up there. Anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And I will see you in the next video, which is going to be about what’s number five here? In and not in.

So we’re going to have some fun things to say about in and not in in that video. So do try to contain yourselves. But I understand why some of you out there just might be orgasmic.

What we’re going to say in the next one. So I will see you in that video. Goodbye.

Not forever. Just until next time. Goodbye for now.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

How To Write SQL Server Queries Correctly: EXISTS and NOT EXISTS

How To Write SQL Server Queries Correctly: EXISTS and NOT EXISTS



Thanks for watching!

Video Summary

In this video, I dive into the world of SQL Server queries, focusing on the often-overlooked `EXISTS` and `NOT EXISTS` clauses. These powerful tools can significantly improve your query performance by reducing duplicate results and making your code more efficient. I share practical examples and explain why using these directives correctly is crucial for writing effective SQL queries. Whether you’re a seasoned SQL professional or just starting out, understanding how to leverage `EXISTS` and `NOT EXISTS` can greatly enhance your ability to write optimized and maintainable queries.

I also take some time to address common misconceptions about these clauses, particularly the idea that what you select in an `EXISTS` clause matters for performance. I demonstrate with examples that selecting anything—like one divided by zero or even the entire King James Bible—has no impact on the query’s execution plan. The key is understanding how SQL Server processes these clauses to find matches or non-matches, and why using them properly can lead to more efficient queries.

Full Transcript

Erik Darling here with Darling Data, and we are getting into video weird green screen. I don’t know why that’s happening behind me like that. I think I need to adjust the light over there a little bit. But in this video, we’re going to continue the series on how to write SQL Server queries correctly. And in this video, we are going to talk about the wonderful, the fabulous, the underused, the malnourished, the oft overlooked, exists and not. The number of times I have tuned queries just by changing some paradigm joins in, not in, things like that to use exists or not exists is amazing, quite frankly. So I think you’re going to enjoy this one. But before we do that, we need to talk about how much child support you owe me. Just kidding. If you like this channel and you want to become a member, there’s a link right in the video description. It says something like, become a member, you can choose to pay me $4 a month in child support if you like the video babies that I’m cranking out. If you cannot afford the $4 a month, of course, you can like, you can comment, you can subscribe, you can, you know, show your affection in other ways. I guess it’s sort of like a rich dad, poor dad or something, but you know, whatever. If you need help with SQL Server, in any, for really anything, you have the best SQL Server consulting in the world available to you, and I can do any of these things at a very reasonable rate. I can also do more stuff, depending on sort of what you’re what you’re into and what you need. But these are these are the things that I usually end up doing with people. So I’m pretty good at them at this point. If you need some very high quality, very low cost SQL Server performance tuning training, you can get all of mine, again, via link in the video description that talks about that says, like, buy training, or something, or something, or something, and you can get 75% off. That brings it down to about 150 US dollars. And that is for life. That is not an expiring thing. The only thing expiring is you. Video training will be there until, I don’t know, hard to say. Forever, maybe. I mean, as long as it’s useful. Yes. No upcoming events, end of the year, 2025.

We’ll figure it out. With that out of the way, let’s talk about exists and not exists. Now, I think what’s great about SQL is the structured query language is designed in a way where certain directives is very, very obvious what you are getting when you use them. exists and not exists. You can say, I want to find things where they exist, or say, I want to find things that don’t exist. I don’t know. I think these are wonderful things. What if you were able to find aliens like that, or like a multiverse or something?

All sorts of interesting things could happen. The sort of awful thing about SQL is that it has a lot of rules. And they are selectively applied, just sort of like the English language itself. I have a young daughter who is learning how to read and explaining to her various rules for spelling and pronunciation and grammatical correctness is… It’s a fun challenge. It’s a fun, fun challenge. I, of course, have, you know, my gripes and grievances with SQL.

If you want, like, the pettiest example, I think that instead of select, we should just write get. Not only is it half as long, but it’s far more obvious what you’re doing. When I go to the store, I do not select milk, eggs, steak, butter, salt, pepper, scotch, anything like that. I usually just get them. I get the things that I want. But, you know, that’s, you know, that’s my breakfast of champions. I don’t know what yours is.

But two of the most overlooked things in SQL are exist and not exist. Perhaps they would get more traction if they were called there or not there. But I think if you had to deal with a where clause and a there clause, things would get real weird real quick. But it might be kind of fun to say, like, select star from table where there is not or where there is or, you know, something like that.

Where something is there or not there. I don’t know. It would be fun for me anyway. But whenever I bring these up, people get kind of weird about them because they’ve usually read some, like, really incorrect blog post at some point in their life that says subqueries are bad. And, of course, exists and not exists work off something that looks pretty well like a subquery.

And they’re like, no, can’t do that. Bad, bad, bad, bad. But those people are fools and they read things by fools. And then now we’ve just multiplied the number of fools in the world.

This is the problem with people writing foolish things is that foolish people read them. So you have foolish people writing foolish things that foolish people read and become doubly foolish. The foolish is just exponential.

What is it? Phrase it something. Exponentially caustic foolishness in the world. Now, I think the fun thing about… That shouldn’t be there.

You get out, you idiot. Foolish thing. The nice thing about exists and not exists, and this is something that comes up quite frequently when we’re talking about using these directives in SQL, is people think that something, whatever you put in the select list of the exists makes some difference to performance.

It does not. You can put select star. You can put one divided by zero, which is my favorite thing to do to prove that it doesn’t mean anything. You could put the entire contents of the King James Bible in there.

And guess what? Wouldn’t make a difference. SQL Server throws it away, forgets it ever existed. Likewise, if you add distinct or top or anything like that to an exist clause, it doesn’t matter.

Offset fetch would be another row filtering thing. It doesn’t matter. Group by doesn’t matter, right?

Because SQL Server does not work, does not care about that. It only goes in to find a match or not a match. And there is a one-row goal on that anyway.

So it doesn’t care about that. One thing that we talked about in the joins video is if you have a one-to-many or a many-to-many relationship, SQL Server will show you the results when you use join with exists and not exists. We only care that a row is there or a row is not there.

Right? Don’t need to find… If we find, like, let’s say we have user ID 1 here and we have 10 rows that match user ID 1 here, we say where exists this.

SQL Server does not say, oh, I found one. Oh, I’m going to go find nine more ones. It just says I found a one. We’re good. If we say where not exists and the SQL Server is like, oh, wait, but I found a one. It’s going to say that one is there.

It’s not going to go find that one nine more times to make sure that it still doesn’t exist. Right? So we find one or we don’t find one and we bail out. So both exists.

We already talked about that. Now, let’s say you are a brand new query writer, you know, doing your thing in the world and you have been tasked. Your boss says, hey, pretty please, give me a list of people, of IDs and display names from the users table who have made a post, who have a reputation of one.

Right? What stinks is that if you were to write this query and, you know, the most straightforward way possible, you would get a whole bunch of duplicates back. Why?

Because there are a whole bunch of people who might have made multiple posts, who all have an ID of one. There are a lot of people in here who have that problem. Right?

I mean, just think about how many rows matched community up here. Lots of them. So we’re going to see lots of duplicates in here. Right? Here’s May Taha with like five rows of duplicates. You look at that and you say, ah, boy, I don’t feel like sifting through all that.

Ah, group by, I’d have to write two column names in there. I don’t feel like writing all that. Typing?

For idiots. Typing. Fingers get tired. I’m old. I’m just going to throw distinct up at the top. Okay. Well, you can do that.

And, I mean, the query itself doesn’t actually run that much faster, but we do get the results back faster because we have, we send fewer rows to SSMS to process and put into grid form. So, like, the query plans and the performance don’t matter much here. But you could do this and you could, you know, very easily get back the results that you want to see.

Which is fine. But, uh, that only works kind of up to a certain point performance-wise. After a certain point performance-wise, uh, throwing distinct, especially on a very long column list, uh, can, can, can become pretty painful.

I would, I would strongly advise against using distinct for long column lists. I would strongly advise in favor of, uh, you know, uh, either getting that distinctness some other way. Like, you could add, um, like, you, you might know that there are two or three columns in the results that make up a distinct, uh, tuple.

And you could use, like, a row number or something to just include those three columns to figure out dupes and then filter to where row number equals one. Uh, you could also write the query slightly differently so that, uh, duplicate results are discarded the very moment you start joining rows together. Right?

So, uh, let’s say that we wanted to write this query and we wanted to just get the distinct results from users that have a matching row in posts. Well, that is exactly what exists, exists to do. All right.

So if we run this query and we, uh, get this stuff like the same way that we did before, uh, we’re going to get an execution plan that has, well, this one’s a little misleading. Usually when you write a query that does this kind of thing, you will see, uh, a semi join or, uh, for exists or an anti-semi join for not exists. This one just works a little bit differently.

Uh, and this one just kind of groups by, uh, the post table. Uh, it aggregates all those rows. So there’s only one of each. And then we join a distinct result set of owner user IDs from posts.

All right. Uh, ooh, there we go. The emerge join to the users table. Um, if we throw, let’s just so I can show you the semi join version of this. Let’s say option force order and let’s get an estimated plan for this.

Oh no. It does the same thing. It just reverses it. Nevermind. Okay. Forget that happened. No query plan for you.

Um, so once exists locates a match, um, then it’s, it’s like, cool, we got this row. I’m going to return that row, right? It’s basically inter joining the tables together.

Uh, I see a lot of people attempt to write exists queries. The same way that they attempt to write in or not in queries. And again, this comes down to the column list that comes out of exists does not matter.

Right? Because notice, we notice what we’re missing here. There’s the, we had it up here. This, this was correct.

Right? We, we selected nothing. We selected one divided by zero from the post table, but we had this correlating where clause. Right? So where the owner user ID here matches here, I see a lot of people try to do this and it just doesn’t go well because there’s no correlation in here. So if there’s for every row and posts, SQL Server is like, yeah, yeah, there’s something’s there.

Okay. We, we, we get a thing and then SQL Server just basically gives us a, a, a, a left semi join with no join predicate, which is not what you want. This query is incorrect.

If you write your exists or not exist queries like this, you will be sadly, you’ll be very sad about the results because, um, they won’t, they won’t be right. They will be completely wrong. Right?

So make sure that when you write your exists and not exist queries, they are properly correlated inside of the exists and they do not just look like this because this is bad and wrong. We do not want this happening. Now, one thing that grinds my years, you know, it gets, gets me fired up, angry at the world.

Uh, I want to, I want to drink that 24 ounce glass of scotch, go out, go out there, the baseball bat is whenever I see a SQL tutorial, uh, they give this advice about finding rows in one table that do not exist in another table. And they, they, they seem to all, uh, uh, seem, all seem to favor using a left join to do that. So the basic query pattern, and I’m not saying that you should never do this.

There are times when this will be the better choice, when this will perform better. It is, uh, up to a lot of very, very localized, um, uh, things around like indexing and stuff like that. Uh, but in like, you know, SQLs and the optimizer making good join choices based on good cardinality estimates and things like that.

So things like indexes, the cardinality, uh, estimator you’re using legacy or default, legacy or new or default or new or whatever, old or new, let’s just say. Uh, lots of things can mess up how SQL Server chooses to do these sorts of, how to do these sorts of joins, how to, which physical join operator they choose. Um, so there’s all sorts of things that can make one choice or the other good or bad.

But the basic thing that you do is you left join, you select from the users table, and then you left join to whatever other table. And then you find, typically you use the primary key, um, ID is the clustered primary key in the post table. This is the most common thing that you’ll do.

You could use, technically you could use any non-nullable column that you want, but, you know, clustered primary key is a pretty good choice for that. The problem that you run into is that when you, when you use this pattern, what SQL Server does is it takes both tables, right? We have users here and we have posts here.

SQL Server does not choose for this query to do any early aggregation. Uh, that makes sense for the users table because ID is the clustered primary key. That probably makes less sense for the owner user ID column.

Remember when we did the exist query and I showed you, it did that, uh, aggregate from the post table. So there only one row would come out of that. Um, you know, SQL Server is just like, okay, well, no, uh, no early aggregation for you here.

Uh, we fully, you fully joined both tables together. And then after those two tables get fully joined together, then you filter out rows, right? This is where we start.

This is where we start reducing the result set in our where clause. And if you hover over the filter, you’re going to see that our predicate for that filter is where the ID column from the post table. Again, the primary clustered key is null.

And this is usually a pretty bad, this is usually a pretty bad query pattern to see because you want SQL Server to filter out rows as early as possible. Not join every single possible row and then filter out, uh, rows that don’t match. So, uh, a better way of writing that is of course to use not exists because not exists will tell you which rows aren’t there.

And if we run this, remember this query runs for about 1.6 seconds. Uh, we run this one. This one finishes up in about half the time, about 800 and some odd milliseconds.

But notice we have the different, we don’t have, we have all, we have a fairly close pattern here. Not exactly perfectly, you know, aligned, but, uh, SQL Server does opt for the early aggregation on, on, from the post table, right? So it makes a, it does the aggregation on the owner user ID column.

And now we have a different type of join, right? We don’t have just an outer join with a filter afterwards. We just have the aggregate that does the count afterwards. But we actually have up here is a left anti-semi join, right?

So this means that the rows get eliminated at the join rather than fully joining the tables and filtering them out later. So a lot of times when I’m tuning queries and I see that pattern with the left join where some column is null, like one of my first instincts is to replace that with not exist to see if that gets us any sort of performance improvement.

Most of the time it does, not every single time, but most of the time it will. Um, so, uh, your developer life will be a whole lot less confusing and tiresome, uh, if you make sure that you fully take advantage of all of the things that SQL Server has available to it. Um, the real tough thing, the thing, something that bothers me quite a bit about, um, you know, certain frameworks, like ORMs and any framework being one of them, uh, is that it’s not always obvious, or rather it’s not, it’s not always done correctly where, uh, a join or exist is used where it should.

They are capable of doing it, but it doesn’t always happen. And, and there’s different ways to write those types of queries so that if you’re doing, uh, joins to either find just the existence of something like, say like just to do, like you’re doing the join for the purpose of filtering, not for the purpose of displaying data. If you just need to find rows that match from one table to another, or rows that don’t match from one table to another, exists and not exists are usually the more efficient choices there.

Um, that changes, of course, if you needed to bring data back from the table as well, right? That’s when you would want to use a join because you can’t project data out of exists or not exists. That’s why the select list up here doesn’t matter.

So just kind of keep that in mind when you’re writing queries, uh, that, you know, like you can’t project anything out of exists or not exists. So if you, if we needed columns from the post table, we wouldn’t want to use that. We would, that’s where we want to use a join instead.

But if we’re just looking for rows, there rows, not there exists and not exists will usually get you there faster. Now, uh, the next thing we’re going to talk about in this series is, uh, subqueries of the correlated variety. Um, suppose non-correlated subqueries are a bit more exotic and a bit less useful, but, uh, that’s what we’re going to do.

So, I hope that, I hope that that excites and titillates you and, uh, you’ll stick around to watch that. So, I’m going to do that one next. So, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And, uh, what else is in there? I hope that you can’t hear the sirens currently going by. That would be nice.

Um, I’m actually going to, going to listen to this video, the very end of this video, uh, to see if, if the sirens show up on the recording. Because, you know, life in the big city. All right.

Anyway, uh, let’s get going here. I’m just, I’m just, just going on at this point. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Happy New Year, From Darling Data!

It’s Been A Wonderful year



Rest up, 2025 is gonna be a fun one.

Video Summary

In this video, I dive into the world of SQL Server maintenance and optimization, focusing on a common scenario that many DBAs face: dealing with server restarts and their impact on performance. We start by discussing how server reboots can affect indexes and statistics, leading to potential performance issues when queries are executed after the reboot. Then, I share practical solutions and best practices for minimizing these disruptions, ensuring your database operations run smoothly throughout the year.

Full Transcript

Hey, hey, it’s New Year’s. Happy New Year. Go back to bed. It’s time.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

How To Write SQL Server Queries Correctly: Subqueries

How To Write SQL Server Queries Correctly: Subqueries



Thanks for watching!

Video Summary

In this video, I delve into subqueries in SQL Server, addressing common misconceptions and providing practical insights. I start by challenging the notion that subqueries always run once per row, explaining how their performance can vary based on the join type chosen by the optimizer. Throughout the video, I share examples of when to use subqueries effectively, emphasizing the importance of good supporting indexes for optimal performance. By walking through a detailed query example, I demonstrate how eager index spools in the estimated plan can indicate potential performance issues and show how forcing SQL Server to use specific indexes can significantly improve execution times. The video also explores scenarios where collapsing repetitive subqueries into a single apply operation can enhance efficiency, although this isn’t always necessary or beneficial. Overall, it’s a deep dive into understanding and leveraging subqueries for better query optimization in your SQL Server environment.

Full Transcript

Erik Darling here with Darling Data. And it’s just, you know, me and Bats here, kicking it. Feeling real twinsy here. Mwah! Love you Bats. In today’s video, we’re going to talk about Intel updates getting the hell off my screen. Intel. What can Intel get right these days? That’s the big question. Anyway, in today’s video, we’re going to talk about subqueries. Oh dear. Now, in the last video, I did foreshadow this a little bit because we were talking about exists and not exists. And how a lot of people who I end up working with think that exists and not exists are bad because they’re subqueries. And they just have this thing against subqueries. So like subqueries, oh, they always execute once per row. Oh, they’re slow. They’re not as good as joins. They have these preconceived notions. They have these preconceived notions that certain things are just always true across every query ever. And that they know best and better. And they have some kind of authority in the matter. They’ve tested every conceivable query that you could ever write with every index and every database and every cardinality estimation model. And they just know everything is always true the way that they’re going to be.

They’re often wrong. And that’s what we’re going to talk about today. So before we do that, let’s talk about money. Everyone’s favorite subject, unless you ask them how much of it they have, because then things are good. That’s when people get uncomfortable. If you would like to become a channel member, you can do so for as little as four bucks a month. It’s just a nice way to say thank you for all of the content that I produce. You can do so for as little as five years. You can do that by clicking in the link in the video description that says, like, become a member. Cool. If you don’t feel like giving me four bucks a month, you can like, you can comment, you can subscribe, you can make different numbers go up, which also always fills my heart with joy and, you know, makes me feel less alone in the universe. If the things I’m talking about during these videos make you think, wow, that Erik Darling sure is good at SQL Server, you’d be right.

And I’m also available for consulting in SQL Server matters like these. And as always, my rates are reasonable. If you would like some very high quality, very low cost SQL Server content that you can buy once and will continue to be yours and you’ll have access to it for the rest of your life for about 150 US dollars, you can use the link in the video description. Or you can go to that URL right there and put in that discount code and you can get all 24 hours of my performance tuning training for the one time low cost of 150 bucks. No upcoming events. No upcoming events. It is the end of the year. I have no interest. 2025. I will have lots of interest. Anyway, let’s talk about sub queries because they’re a lot of fun.

So we’re going to talk about sub queries in the select list. I like them because you can use them to skip a lot of additional join logic. When you write a join in a query, the optimizer will do all sorts of funny things, wondering about where that join would best be placed in your query plan. It doesn’t really do the same thing with sub queries. Now, when people start saying that sub queries run once per row, they are not entirely wrong. But the thing is, if you write a join, that join might also run once per row too. If you write a sub query and SQL Server uses a nested loops join, and let’s just say that your query returns 1000 rows.

Yes, that sub query will run 1000 times to produce a result in a nested loops join. If you write a join that does whatever your sub query wants to do, and SQL Server chooses a nested loops join, your query might actually run, your sub query might actually run, or sorry, your join might actually run a lot more times. Because that join might happen way earlier in the query plan. And let’s say, rather than just running 1000 times at the end, what if it gets joined to a big table, and it has to run lots of times? Hmm. Gosh. This sure do get confusing.

So when people say things like, sub queries run once per row, well, lots of things can run once per row. It all depends on the type of physical join that the optimizer chooses to implement that semantically correct query logic physically in the query plan. Hash joins, merge joins, they typically do one big seek or scan of an inner table, bring a bunch of data back.

Nested loops joins, you know the algorithm, take a row from the outer input, send it to the nested loop, go do something in the inner input, grab a row here, bup, bup, bup, grab another row, bup, bup, bup, that is also once per row. No. Ah. Ha ha ha. My rates are reasonable.

Anyway, sub queries do have some limitations. There are times when you can’t use them, which is okay. Everything’s got to have limitations. I have limitations.

I don’t know anything about Oracle. I don’t know anything about cheap wine. Except not to buy it. I don’t know anything about…

I don’t know. What’s it? That’s something else. Let’s go on. Anyway, but if you use sub queries in the right way, they can be an excellent method to retrieve some calculation result without worrying about what kind of join you’re doing, and how the optimizer might try to throw that join into the mix of, let’s face it, I know your queries.

There are 42 left joins and 11 inner joins and 13 cross joins already. Throw another join into the mix. Why not?

Well, it could get worse, you buffoon. Since sub queries are in the select list, it is sort of like doing an outer join because the sub query is not actually allowed to eliminate any results. It takes the whatever, however many rows are going to get projected from everything from like the from down, your from join where, group by, stuff like that.

And it takes that result and it says, sure, for every row that I’m going to get out of this whole jumble of things that user x just did, I’m going to go run this query to get a result for the row, which could be a nested loops join or it could be not a nested loops join. But either way, it’s going to be an outer join because it’s not going to filter anything from the results.

So the optimizer doesn’t have to care as much about where that join gets placed. It’s usually going to be way to the left in the query plan because that’s where that’s the left outer join that’s not going to eliminate anything goes because we need to get all the stuff that we’re going to actually do stuff for first. So that’s good.

It’s all good, wonderful things about sub queries. And the optimizer is generally smart enough to retrieve data for the select list sub queries after all the other joining and filtering is done. So they can be evaluated for as few rows as possible.

There are, of course, bad ways to write queries that might end up with a query plan that contradicts that statement. But I’m not in the business of writing bad queries, in the business of writing good queries that run fast. So the most important thing that you can do as a developer, if you’re going to write sub queries, or really almost any kind of query in general, is to make sure you have good supporting indexes for them.

So this is where we’re going to talk a little bit about performance before we talk about other stuff. Now, I’ve already created good indexes to support my sub queries. I already have them.

But what I want to show you is what a query plan will look like. I’m forcing SQL Server to use the clustered index for each of these three sub queries. What I want to show you is what a query plan will look like when sub queries in the select list are going to be slow.

So let’s just get an estimated plan for this. And the important thing that I want to show you is over here. If you have a query, we’re talking primarily about sub queries in the select list, where you can see this happen.

But if you have any query, ever, and you look at the estimated query plan, or you’re looking at something from the plan cache or something in query store, and you see eager index spools like these, this and this, being built off a large table, like, say, the POST table in the Stack Overflow database, that means that this query is probably going to be a lot slower than you would hope. This query is not going to be in for a good time.

SQL Server is using a single thread to scan the POST table once there and once where my head is. So it’s scanning that twice. It can only use a single thread because it’s building an eager index pool here and here. So this isn’t actually even going parallel.

Sorry, this and this aren’t actually even going parallel. Even though they have parallelism operators on them, you can only build an eager index pool with a single thread. This is Microsoft, once again, kicking standard edition users when they’re down because they never want you to be able to build an index in parallel.

No, nothing for you. So that happens, right? Like, this is what a query plan will look like when sub queries are going to be bad.

Always keep an eye out for this. No matter what kind of query you’re writing, if you see eager index pools being built off large tables like the POST table, you’re either missing an index that would really help SQL Server, or the optimizer is making a buffoonish choice and you need to add in a force seek hint.

Because the optimizer will sometimes build an index pool off a perfectly good index that it could have seeked into. I brought this up to Microsoft and Microsoft does what it usually does and shrugs and says, Oh, sorry, we spent $70 billion on AI.

We can’t fix basic stuff in SQL Server. So that’s cool. Anyway, let’s move on. And let me show you, I’ve quoted out the index hint here. So now SQL Server is going to be free to use the indexes that I have created that are good for our sub queries.

And what you’ll see is that SQL Server does indeed choose nested loops. So these sub queries do run once per row. That does happen.

It do be like that. But you could write any kind of join. You could write apply. You could write cross apply. You could write outer apply. You could write a regular join. You could write a derived join. And SQL Server might choose nested loops for it.

In which case, it would run once per row anyway. It gads. I’ve been gapped. Anyway, if we run this query, and I’m actually going to give, oh, you know, I should turn on query plans.

That would help, right? If we run this query, I want you to note that it runs very quickly. We don’t spend a long time doing anything in this query. In fact, if we go look at the execution plan, it finishes in 23 milliseconds.

This doesn’t feel like sub queries being slow to me. This also doesn’t feel like there being a very big penalty for three sub queries running once per row.

We don’t really do much of anything in here. It’s just this final sub query that does a little bit of extra work. These seeks are very fast.

We get zeros in here. It’s just this final count one that does, you know, any sort of work. We get 22 milliseconds of work across those two operators. So that’s really not all that awful.

All right, this is a fairly quick query. Now, there are times when you have sub queries that have very, very common sub expressions, right?

Like all of these queries are doing something pretty similar. All three of them are correlating on the post type ID column, right?

But this is looking for post type ID one. This is looking for post type ID two. They’re all correlating on owner user ID equals user ID. They’re all three of them are doing that.

And all three of them are going to the post table. So there are times when very repetitive sub queries like this can be collapsed into a single apply and can be faster.

There absolutely can happen. Not going to BS you on that. But typically, that is when the sub query is rather complex.

What I see a lot in some client query plans when there’s a lot of complexity like this is let’s say that the query runs for one second.

And let’s say there are 100 operators in it, right? In the query plan because it’s a big complex query plan. And I’m running to get the actual plan and I start looking at operator times.

And there’s not really a single part of the plan that really contributes to that one second. There are lots and lots of little parts of the plan that all contribute to it taking one second.

When I see that and I need to make it faster than one second, my goal is to reduce the complexity of the plan which can, which in part of that is reducing repetitive sub queries in the plan or repetitive, just let’s just, let’s just even go a little further than that.

Let’s just say collapsing repetitive expressions in the plan. That can be a really useful trick. Okay. So I’m like, no, like you can, there are times when that makes sense to do.

This just doesn’t happen to be one of them. So if, if let’s say that, you know, like we, like we look at this query and we think, oh, I think that those, all three of those sub queries are cut, like, you know, we don’t, we 23 milliseconds.

We need it to be faster. Now let’s say that we wanted to be, this to be quick. And let’s say we, we had a mental problem with making three round trips to the post table to do the stuff that we were talking, that we were doing up there.

We have two ways that we could write that. We could rewrite this query, right? We could use a left, a derived left join, right?

Because a number sub queries in the select list will always be outer joins because they’re, they’re not filtering out any rows, any rows that qualify to be projected from the query. We want to find what the, the, whatever calculation in this, in the sub query.

So we could rewrite this as a derived left join. We could get the max for post type ID one, the max for post type ID two, and the total count. And we are looking for, of course, where post type ID is in one or two.

So we could do this and this would still be reasonably fast, but it is not 23 milliseconds fast. This is 59 milliseconds.

We spend 45 milliseconds seeking into the post table, and then another 14 milliseconds aggregating that data, right?

Because 59 minus 45 is 14. And this query ends up being just, just a little more than twice as slow as the three separate sub queries. We could even rewrite this as an outer apply, right?

With the exact same logic. Because remember, no cross apply because we’re not restricting rows. We’re just applying, we’re applying a calculation to the result, right?

We can run this, and this gets an identical plan to the derived left join, and ends up at 59 milliseconds, right?

We get the exact same query and timing from both of those. So in this case, making the three separate round trips is about twice as efficient as making a single, making a single trip to the table. Now, query rewrites to use specific syntax arrangements are not available in ORMs generally.

Many times we’re working with clients, we’ll stumble across really, really awful application generated queries.

I’ll, you know, we’ll look at them, I’ll be like, here’s a useful rewrite. Look how much faster this goes. We are twice as fast. We are three times as fast. We are 10, 100 times, 1,000 times as fast.

Look at how much better things could be if you weren’t using that ORM. Maybe, maybe we could take this, that, this query that the ORM is causing, is causing problems the way it generates, and maybe we could put that in a stored procedure with the way I’ve rewritten it, and make your life easier and better.

And they’re like, no, it’s ORMs only. And I say, okay, well, how, how would you use your ORM to build the query in this way? And they’re like, I don’t know, we can’t do that.

Okay, so, your only option is, is to stick with an ORM that builds an inefficient query, and you’re just, you’re just going to live with that, because you’re afraid of the stored procedures.

This is, this, this happens quite a bit, and like, I’m like, hmm. So, what, what, what should, what should we do here? What would you like me to tell you? What, what, how can I, how can I get this across to you in a different way?

Your ORM is good up to a certain point. These queries have reached the point where it is no longer good. You need a different API into the database, the stored procedure is just an API, right?

All it is is an access layer into the database. You can make this better. You can fix things. We can, we can improve things. All we have to do is get away from these ORMs. Not everything that, that, that, that, that, code produces is good.

Right? Built a query with code. The query sucks. Maybe, maybe the code sucks too. I don’t know. But, for the most part, people just don’t know how to improve that.

There’s really no, not a lot of fine grain control there. It’s up to you to rewrite queries with a better arrangement so that they can be faster. Now, in this case, both of the attempts at rewrites resulted in an identical query plan.

The optimizer did a fine job here, but both of the single trip queries were, but little bit over twice as slow than the original. In this case, the difference is absolutely microscopic, right?

It’s the difference between like 60 milliseconds and like 20 milliseconds, right? Not anything that we’re going to get crazy about, but I just do want you to see that like making those three round trips was more efficient than making the single round trip.

For me, the real advantage of writing out the three separate sub queries is to better understand which of those sub queries does work. You know, it’s almost the same thing.

And we’re going to talk about this more with CTE, but it’s very, very similar to that pattern where when I see lots of CTE being used in the single query, you know, one of the first things that I do is I start just individualizing those common table expressions as select into temp tables because it helps me figure out exactly which point in the CTE I can like has a, is like really slow or causing problems or causing bad estimations that trickle down to the rest of the CTE and where I can start fixing things.

And maybe, you know, sometimes the, sometimes the end result is that I need to use the temp tables. And sometimes the end result is that I can make meaningful changes within the CTE to make those faster. And we can stick with the original.

Now, if, like I was saying earlier, if these sub queries were a lot more complex, excuse me, and they had like a lot more going on in them, like, like we had to like find the top one, but then join to something else and have like an exist clause.

And, you know, we were just doing a whole lot more work in each of these. Then I would probably be a little bit more keen on like collapsing this stuff into one either derived join or apply, outer apply sub query or just like maybe dumping like, uh, where is it?

Sorry, up here a little bit. Maybe I would just dump this into a temp table with like an exists clause. Remember the exists in the last video with an exists on users so that I could filter out the size of this and, uh, you know, only bring rows in there and then just join the temp table to the, to the users table and bring that out.

Sometimes that can be a good approach too, but it all depends on, well, a lot of local factors. Again, a lot of this stuff does depend on local factors, but anyway, so sub queries in the select list, not always the worst choice.

Um, whenever someone says the row by row, well, it depends on what kind of join SQL Server implements to, uh, or rather what kind of physical join SQL Server implements.

to logically implement your sub query. If it’s nested loops, yes, it’s row by row, but you could get row by row from nested loops. If you write a join or an apply or anything else, all of these things can use nested loops and also be row by row.

So don’t let people drag you down with that because they’re idiots and they don’t know what they’re talking about. Be smart. Say, and say to them, I know what I’m doing.

Write the query. Show them it’s faster or show them it’s not slow. Show them it’s no different, whatever, but whatever, whatever you do, don’t let people get away with nonsense.

Don’t let people get away with their chat GPT responses to SQL, about SQL Server stuff. Cause it’s not always smart. It’s not always right.

I spend a lot of time with the AIs trying to get something good out of them. And it’s really hard. They’re abysmal places. Abysmal.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I hope that you will continue to write correlated subqueries. Because, gosh darn it, they can be pretty useful.

Alright. Cool. Well, I’m out of here. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

How To Write SQL Server Queries Correctly: Joins

How To Write SQL Server Queries Correctly: Joins



Thanks for watching!

Video Summary

In this video, I dive into the world of joins in SQL Server, explaining how they union data horizontally and how unions join data vertically. This is a fundamental concept that every database query must understand, especially when dealing with complex relationships between tables. I also discuss different types of joins, such as inner joins and left joins, and provide practical examples to illustrate their usage. Whether you’re looking for high-quality SQL Server performance tuning or simply want to improve your query writing skills, this video is packed with valuable insights that will help you write more efficient and logical queries. If you find the content helpful and would like to support my channel, consider subscribing or making a small donation. Your support helps me continue creating these tutorials and offering resources for SQL Server enthusiasts and professionals alike.

Full Transcript

Erik Darling here with Darling Data, and we’re kicking off a, I want to say at this point, this is going to be a 15 video series on how to write queries correctly. This was a bunch of blog posts that I had put together and written, typed and typed and typed, and they actually got a fair amount of comments, which I was happy about because normally when I write things, it’s like crickets. Not a lot of commentary on those things. I don’t know why. It’s okay. Sometimes reading is hard. Perhaps my written self is not quite as persuasive or not quite as engaging as my video self. It’s entirely possible that I just have an enormous physicality, a presence, an unstoppable presence that just does not come across in written form, and that’s okay. Video just might be the better vehicle for me. And that’s all right. You live and you learn. But we’re going to start this off by talking about joins. There are all sorts of joins. There are all sorts of important things to think about and know about when it comes to writing queries that are joining data together. And that’s what we’re going to talk about today. But before we do, I need to convince you to give me some money. If you like this channel and the content provided therein, you can subscribe.

You can join 30 other people. I think over 30 other people. The last time I hit refresh on YouTube, which I do every 5 or 10 minutes. There are over 30 people who have decided that I am worth giving at least $4 a month to. So that’s nice. I appreciate you 30 some odd people who do that, who made that choice, who made that sacrifice upon the altar of darling data. But if you just don’t have $4 a month, you can like, you can comment, you can subscribe, and you can make a totally different set of numbers go up.

Which I would be, of course, very happy if you did. If you need help with SQL Server, if the kind of things, the performance tuning stuff, the topics that I cover in this channel are of interest to you in a way that you think I could come and look at your SQL Servers and help you with these things. Well, I am the absolute best SQL Server consultant in the world.

And, of course, my rates are reasonable. If you would like some very high quality, very low cost training, you can get over 24 hours of my performance tuning stuff, my thoughts, my knowledge, my beautiful voice, for about $150 USD. There are links for everything that I talk about in the video descriptions, the membership, the training.

But if you feel like just typing on your own, you can go to that URL and put in that discount code and you can get the whole kit and caboodle. Again, it’s the end of the year. I don’t have any events upcoming. In the new year, we can discuss these sorts of plans.

Right now, I am far too focused on festivities to travel. Well, sorry, to travel for work. I will do all sorts of other travel.

Trust me on that. But with that out of the way, let us begin our journey into joins. Let’s get over to SQL Server Management Studio. I believe I’ve already run this thing, so we’re good there.

Now, if we were to define joins in SQL Server, we could say that joins union data horizontally. That’s sideways, right? You have a table and you put it next to another table and you match the rows or whatever you’re doing.

So that’s one way of thinking about it. The inverse of that is union queries where unions join data vertically. So joins union data horizontally, unions join data vertically.

Kind of a funny way to think about it. But there’s probably one of the most basic things you can do with a database query, aside from selecting data from one table.

If you have the kind of database where you only have one gigantic, very wide, very long table full of data, I will reiterate for all of you, my rates are reasonable. If you’re having problems with that sort of thing.

The important thing to remember about joins is that they will respect one-to-many or many-to-many relationships, meaning that they will duplicate data, right? They don’t duplicate data.

But if you have like one row and one table that matches three rows and another table, or like four rows and they all have like the value two in the join key, and they match eight rows in the other table, all those things will join together and make a set, right?

So joins do respect that. We’re going to talk about some alternative syntax. I don’t mean using distinct, of course, that will join tables together without needing to deal with duplicates.

But that’s for another video. That’s, oh, that’s, I don’t know, that’s video like two or three or something. So you always have something to look forward to.

But one thing to understand is that if you don’t need to show data from another table, a lot of people just use joins to filter data from one table, like from a result set. Say something like, well, if the rows are here but not here, we don’t want to show the rows that aren’t here, right?

It’s almost like having a where clause on your query. But here’s an example of when a join is necessary. Let’s say that we want to get all the users in our users table with a reputation over 500,000, and we want to sum up all of their scores from the post table and figure out what kind of post those points were awarded to.

We could write a query that looks like this, where we’re selecting data from the users table. We’re joining that to the post table. We’re asking only for users with a reputation over 500,000.

And of course, because we have an aggregate in here, we need to group by stuff down here. Now, I do want to talk for a quick moment about what we’re doing with the score column, right? Because in Stack Overflow, scores can be negative numbers and many scores might be zero.

For some queries that you write, it would make a lot of sense to get rid of things that you maybe don’t want to include in there. Like negative numbers, you actually do probably want to include those because they do take away from a user’s total score. But for some types of reports, you may not want to do that.

For some types of reports, you may only want to include positive numbers. You may not want to take into account anything that might detract from the total score, which would be like either one could be sensible depending on what you want to show.

The other thing that is worth bringing up is that because anything with the score of like absolutely zero will not add anything to the totals, it could make sense to filter out rows where score is, I mean, this says greater than zero, but we’d probably want to say not equal to zero, right?

Because for the sake of this query, we want to include negative numbers. But for some reports, we might not want to. So actually, let’s just say we don’t want to.

It might make sense to do this for some cases because guess what? If we start screening out rows where the score is greater than zero or not equal to zero or greater than zero, we will have fewer rows in total to join together.

Remember, the where clause generally will like just filter out rows before the joins happen. So you could end up joining fewer rows and that could make your joins more efficient. We’re not doing that here.

I’m not going to go through all that because, well, I am primarily a SQL Server performance tuning consultant. Not every, and there are going to be some tidbits of performance tuning wit and wisdom throughout these videos. Most of this is just making sure that you can write your queries logically correctly.

So that’s what we’re doing here. That is the purpose of our presence here today. So this will give us some results that look like this.

And we will see the majority of the post types are answers. And we’ll see the total score for all these answers in here. And then we’ll get down a little bit further and we’ll finally have some, you know, results for questions.

And I think there may be some other. Yeah, there we go. There are some other things. And here’s where things kind of get interesting.

There’s a total score of zero for a lot of these. Right? So we might want to start filtering out other things. Like, like for like, so like if we were to filter out anything with a score of zero, we would have, we would not have to deal with a lot of the other stuff where we just had a bunch of zeros in there. Right?

Because we don’t, do we really care about that? Not particularly, to be honest with you. So that’s inner joins. Right? That is the, probably the simplest possible type of join you can write. Now, let’s say that we wanted to generate a report of people whose reputation is at, for sitting at one, which is the site minimum.

Right? Stack Overflow protects you from having a reputation in the negatives. If they did allow that, there would probably be, you know, quite a few sock puppet accounts out there that were really, really trying hard to have the lowest negative score.

Like, like, how can I, how can I be like the upside down John Skeet? Actually, I’m pretty sure someone would use, would, would take advantage of the Unicode usernames and Stack Overflow. Write John Skeet in upside down font and try to have negative, like a million whatever reputation.

Pretty sure someone would do that. We could write a query like this. Now, this is where we have to use a left join.

Right? Where we’ve changed from a left join to an inner join because there are going to be a bunch of people with a reputation of one who have not made any posts. There are people who probably joined the site, thought about asking a question because, or answering a question, but never actually did it.

So we can find all those people. But we can also find other types of people. Now, there are a bunch of reasons why an account with a high reputation, or sorry, with a, with a high scores in the post table might in the user’s table have a reputation of one.

Could be like, you know, they could, their account could be suspended or something. You know, they could have been downvoted into oblivion despite making a bunch of posts. So what we can do is look at what users within those situations by using a left join.

Now, to show you sort of the outlier stuff from that, I’ve ordered by total score descending. So here’s a bunch of people, including this lovely person, your common sense, who has a reputation of one, but a total score of 9,500. And, you know, some other stuff in there.

What was this? This one’s supposed to be total. Oh, yeah. So another thing that’s kind of interesting about when you start joining tables together is you might want to count things in the other table. When you’re looking at left, when you’re doing left joins, sometimes you do need to think about whether you want to get a count or a count distinct.

Right. So if we were counting, like this one getting the total count is just getting owner user ID. Right.

So we don’t need to worry about this being distinct because we’re using a non-nullable column in the table. Most often for something like this, you would want to use the table that you’re left joining to as primary key. In that case, that would be the p.id column.

It’s not shown here because it doesn’t help us in any way. But you can totally use another non-nullable column to do that here. You don’t want to count the nulls.

You want to count the actual values. So that’s where this sort of thing, that’s where this sort of logical stuff starts to get important. But you look over in the results. Like when we count up like the total post type IDs, you get a much different number from the distinct post type IDs.

Most people are only going to do something with a question or something with an answer. There are only two post type IDs for that. But just notice that when you count up, when you tally up the number of non-distinct values, you get the same answer here.

But for distinct values, you only get that there. So just something to consider when you’re writing these queries is when you’re getting things like counts and sometimes, well, I mean, sometimes other things like averages or sums or whatever, you do need to consider whether you want to get a distinct count or just a total count in there.

And when you do want to get a total count from a table you’re left joining to, most of the time you’re going to want, as long as that table has a primary key, you’re probably going to just want to get a count of the primary key. If not, you can use whatever non-nullable column from the left join to table you want.

Like you generally don’t want to include nulls and you don’t want to include like fake nulls where like, you know, a column might have like a million fake, a million real nulls in it. But then like you get up with a million more like sort of fake nulls from the left join.

So just some stuff to think about with that. Now, before we start talking about the overall logic, actually, we’ve already covered that. So the results that we get back find all sorts of interesting things that like we kind of talked about users who are very active, but then had their accounts suspended or deactivated or something.

Users who have posted but were heavily downvoted and users who haven’t posted at all. So like if we scroll down far enough, I think we have some negative numbers like way down at the bottom here. If the scroll bar will work.

For some reason, scroll bars have like stopped working for me. Where, yeah, there are some negatives in there. So there are some negative numbers where people have actually made posts but then got downvoted to nothing, right? So that’s there.

Now, we do need to make a small digression to talk about logical query processing. There are far smarter people who cover this stuff. Itzik is one of them.

But this is like my, like the reason why I want to talk about this is because a lot of people, when they write left joins, they do their filtering in a dumb way or they do their filtering in an incorrect way. It’s sort of hard to figure out all of this stuff when you’re just learning.

But here’s the deal. When you write queries, this is the order that you write things in. Select from, joins, where, group by, having, order by, like all the stuff that you can put like, you know, in here.

Oops. Any expressions like column one plus column two, aggregates like some count aliases, like, you know, let’s say you wanted to rename a column. Distinct and top.

Of course, when you’re writing your from clause, you can also write, you also put your joins in here and all your on conditions. Your where clause, like, I’m not a fan of like the old anti joins where it’s like from table one, comma, table two, and then put the on clause, like the basically put your on clause in the where clause. I don’t like that.

It just rattles my brain too much. Then you use your where clause to restrict rows, right? And this is kind of where we can stop here. Now, when SQL Server, when SQL, when, not just SQL Server, when any database runs queries, like the order that queries kind of get processed in is a bit different, right? You have your from clause and then your where clause and the, I don’t know, Zoom, it just exploded on me.

And then after everything you have like select and order by. So like the, like the, basically just the, the order of queries get processed, processed in is different from the order you write them in. But the important part here is for you to understand when you’re writing queries with left joins, where to do restrictions.

Now, if we, if we write, if we write the query this way and we say where a score is not equal to zero and the where clause that actually restricts, restricts things from the entire result set. And what, one thing that we’re doing is we’re getting rid of like, this would also filter out any nulls, right? So what this, what SQL Server actually does is internally, when you have a left join and then you have something in the where clause for that left join, that gets rid of values, including nulls.

SQL Server actually turns that into an inner join logically, right? Because that left join is, that left join is no longer useful. SQL Server says that’s not a real left join because we would be getting, we would be eliminating nulls from the, from the results.

So SQL Server actually just internally changes that to a left join. So if you wanted to filter out rows where score equals zero from the table that you’re left joining to, you would actually want to put that condition in the join clause, right? So not in the where clause, because the where clause will restrict it from the entire result set.

This will restrict it just from the table that you’re joining to. So for this one, SQL Server would still respect the semantics of a left join and give you, give you the results that we had before where we were, you know, way down the bottom. We show people who are users, but haven’t done anything.

And then over in the query plan, you can see that this remains an outer join. But of course, SQL Server rewrote our query and said, or rewrote our join order and then did a right outer join instead. So SQL Server flipped it on us.

Nutty SQL Server. So if you want to preserve results from the outer table, like, and keep like any non-matching null rows, but still get rid of some things from that, put your, put the things you want to get rid of in the join clause, not the where clause, right? There are some restrictions around, right, right joins.

I, I realize that not many people will actually ever intentionally write a right join. But one thing there’s to be aware of is that with some right joins, some right join plans can not happen. Specifically like nested loops with specific join orders.

So if we look at the query plan for this, SQL Server is perfectly capable of doing a left join and putting the tables in this order. But if we were to try to do this and say right join instead of left join, SQL Server can’t even come up, can’t even come up with a query plan for this one. All right.

So they get an error. SQL Server says, nope, not with those hints. You have messed us up too badly. We cannot do anything. That of course, if we said hash join, oops, not ha, ha, ha, join, but SQL Server can make a hash join plan out of that, right? It’s not a nested loops plan.

So just something to be aware of. You know, probably not the most important tidbit of information that you’ll come across in this video, but something that you should be aware of when you’re writing queries. Now, that brings us, so we’ve covered inner joins.

We’ve covered left joins. We covered the little thing that you need to know about right joins. It is worth talking about full joins just briefly. I’ve never found too, too many great uses for full joins except for doing things like writing exception reports.

So those queries tend to look something like this, where we’re taking everything from users and everything from posts, and we’re full joining on this condition. But the only thing that we care about, like we’re going to sum up any time where null, where we have, sorry, where we have a null in the users table, we’re going to say that’s a post that doesn’t have a user attached to them. And then every time we have something where the, again, these are the primary keys, the clustered primary keys for both of these tables.

So any time when these are null, we’re going to say either that’s a post without a user when u.id is null, and then users without a post when p.id is null. And if we run this, we will get a sum of users who, or posts without a user and users without a post. I’ve never, again, I’ve never had a tremendous amount of use for full joins, but they are kind of good for some things.

Cross joins are another thing that I don’t see things, or that I don’t have a tremendous amount of use for. I really don’t, what I really dislike about cross joins is, especially when very large tables get involved, the only type of, the only physical type of join that you can do a cross join with is nested loops.

And you can end up with real bad performance problems if you start cross joining big tables, right? It doesn’t have to be two big tables. You can do, you can cross join a small table to a big table and still run into nasty perf problems because of the nested loop nature of cross joins, right?

The only way they can be physically implemented is with nested loops. So, but where I see cross joins get used a lot is with stuff like permissions, where you need to like take a table of users and cross that, cross join that to a table of permissions and then figure out what users have with what permissions, right?

You need to blow the whole thing up and like assign every user every permission and then figure out which ones they actually have. So if you, let’s say we wanted to do something a little bit more fun and say that we have a table of scotches and a table of, a table of glass sizes and you want to show someone every possible example of scotch and glass size and then like with prices.

So you would have, you would have like, you know, your Lagavulin, Laphroaig, Ardbeg, Bunnehaben, Cow Isla, like all the good Isla scotches. And you would want to have like, like one ounce, two ounce, four ounce, six ounce, eight ounce, 24 ounce glass of scotch.

And then you would do some math with like the price per ounce and say, you know, whatever, like however many ounces of, you know, Bunnehaben 25 times 24 would be like a $30,000 glass of scotch or something, right?

So you would do something like that. Where I end up using cross joins most frequently is with DMV queries. Specifically, I use this in SP pressure detector to do some math on the timestamp column that you get from sys.osinfo, I think.

Or sorry, from, well, yeah, it comes from sys.osinfo. But you need to do some math to figure out when things happen in there. So we have a table. I’m just going to show you the top 10 rows from the ring buffer table.

But then we have literally one row in here. Now, for convenience, because I need to do, I need to convert this to XML and then work with the XML, I’m actually cross joining to the larger table.

It would probably be a lot more common of a solution to cross join to the single row table. Like just logically, a lot of people would say cross join this single row thing. I don’t know.

This is just me writing the query the way I have it. Because I have to do the convert to XML in here and then do some XML parsing out here, I’ve just chosen to use this as the inner thing where I can have the convert in here. But what you end up with is a convenient way to do this date math in here for every row and come up with a sample time for that.

So just a little bit of information there about inner joins, left and right outer joins, full joins, and cross joins. Again, full joins and cross joins, I don’t come up with too many uses for. But inner joins, left joins, all the time have to do stuff with those.

Be very careful with how you set up filtering for left joins. Because putting stuff in the where clause is a lot different from putting stuff in the on clause. And remember, let’s see, other stuff that we covered in here.

Joins will respect one-to-many and many-to-many relationships and show you data for all of the things that meet your join criteria. So what we’re going to talk about, I think, next, I think next, I think number two is exists and not exists. We’re going to talk about that next because those are somewhat different from joins.

And there are some important things that you need to know about them and when to use them and why to use them. And so that will be our next video. This ended up being a little bit longer than I intended.

I’m not sure exactly where I got carried away, but oh well. You got 25 free minutes out of me, I guess. Not exactly 900 number prices, but, you know, whatever.

Anyway, thank you for watching. Hope you enjoyed yourselves. I hope you learned something. And I hope that you will continue to watch these videos to learn more about how to write queries correctly. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Two Of My Favorite NOLOCK Demos from Paul White and Aaron Bertrand

Two Of My Favorite NOLOCK Demos from Paul White and Aaron Bertrand



This week, I’ve got a coupon that offers an even steeper discount on my training than the one at the bottom of the post.

It brings the total cost on 24 hours of SQL Server performance tuning content down to just about $100 USD even. Good luck finding that price on those other Black Friday sales.

It’s good from December 23rd-27th, or until I remember to turn it off. Hurry along now.

Thanks for watching!

Video Summary

In this video, I delve into the world of isolation levels in SQL Server, particularly focusing on the pitfalls of using `READ UNCOMMITTED` and `NOLOCK` hints. Wearing a new Adidas trefoil shirt, I wanted to take a moment to thank two influential figures in the SQL Server community—Paul White and Aaron Bertrand—for their invaluable contributions that have shaped my learning journey over the years. To keep things fun and engaging, I also share how you can support me by becoming a member of this channel for just $4 a month, contributing directly to my Adidas t-shirt fund. If you’re not into t-shirts, feel free to subscribe or leave comments—every little bit helps, even if it’s just the satisfaction of seeing numbers on a screen increase! Whether you need SQL Server consulting or high-quality training at an unbeatable price, I’m here to help. So, let’s dive into some practical demos that illustrate the challenges and risks associated with `NOLOCK` hints, showcasing how they can lead to data inconsistencies and errors.

Full Transcript

Erik Darling here with Darling Data. Wearing a slightly different Adidas shirt than my normal three-stripe, I got the trefoil, I believe it’s called, design going today. Feeling pretty spiffy, to be honest with you, very spiffy. And today’s video, I want to give kudos to a couple people in the SQL Server community who have really helped me to get the data. help me a lot directly and indirectly in my SQL Server learning over the years. Of course, that’s Paul White and Aaron Bertrand, you know, prolific answers on the stack sites, prolific bloggers, at one point, prolific presenters. But the way that I want to do it is because I’ve been talking a lot about isolation levels. And even in one of my isolation levels videos, I talk about, you know, the problems that you are into with read committed or nolock hints, which are synonyms for each other. And one of the things that I talked about in there, which I didn’t have a demo for at the time was when you have a query that hits an error because of read uncommitted nolock because of data movement. So I have a demo from Paul White that shows that and then I have a very cool demo that shows missing and skipped and double counted rows because of nolock hints in tables. But before we get into that stuff, we of course need to talk about how you can buy me more new Adidas t shirts. For four bucks a month, you can sign up and become a member of this channel and contribute to my Adidas t shirt fund. I can’t promise that they’re all going to have different logo designs on them. But they will all be black and white. So at least we’ll have some consistency.

There and me and my friend bats will continue to match. So if you if you put like I wonder, I think if I just put this close enough to the camera and just sort of superimpose his face on mine, it’s probably a close enough match, at least as far as I can tell. If you don’t want to contribute to my Adidas t shirt fund, if you’re just feeling especially scroogey and stingy at this time of the holiday season, you can like you can comment, you can subscribe. And well, I won’t get any new t shirts from that. I will live with the immense gratification that comes from numbers on a screen going up when I hit refresh. So there’s that. If you need help with SQL Server, perhaps you’re saving that $4 a month. Because you’re like, I’m going to hire Erik Darling as a consultant someday. I’m pretty good at all these things. In fact, beer gut magazine says that I am the best in the world at all these things. So if you if you need help with a SQL Server that is giving you problems, you should stop talking to other people because they’re a waste of your time. If you need some very high quality, very low cost training, you can get 24 hours of my SQL Server performance tuning training for about 150 USD.

You can go to the link up there and you can put in the discount code over there or you can click on the link in the video description and find the same thing all in one place. It’s amazing the way technology works, isn’t it? Again, upcoming events, nothing for the rest of this year, maybe next year. If there’s an event that you care about and you want me to come to, let me know what it is. I’m not psychic, and I’m not constantly just out in the world looking for events to go to. I am very selective in what I apply to.

So if there’s something good that I don’t know about, well, gosh darn it, let me know. And with all that out of the way, let’s talk about some of these null locking problems. So the first demo that I’m going to show you is Paul White’s demo.

We’re going to create a simple table. It’s got no indexes. It just has one column in it. It doesn’t have to look like this in order for you to hit this problem. This is just a very easy and convenient way to do it.

In the first session, we are going to have this query running. And gosh, this is a very, very clever query. Because not only are we setting the transaction isolation level to read uncommitted up here, but despite of that, we are starting a transaction with a name, a name transaction.

It’s amazing. And then, well, this, I don’t know, perhaps slightly less clever one equals one loop runs. Over in this window, we are going to do this.

And we are going to say, while a transaction with this name exists, do this and insert some data and delete some data. All right? So pretty simple stuff there.

Okay? So we’re going to come back over here and we’re going to start this loop off. And this thing is just going to run and do a thing and assign a thing to a variable. And then as that’s running, we’re going to run this portion of the code.

And I want you to keep an eye on the window on the tab right next to this one that says executing. Because it’s going to stop executing real quick. As soon as I do that.

Right? That was a very, very quick cutoff. And, of course, over here we have this lovely message. Could not continue scan with no lock due to data movement. Again, very simple but effective demo.

This isn’t exactly what has to happen in order for you to have problems with no lock queries hitting these errors. It’s just a good example of how you can actually hit it. It’s bad.

It’s not good for you. It’s painful. Good luck figuring this out. Just looking at the error log. All right. Demo number two. Aaron Bertrand. Great fella.

Rumored to be Canadian. But we’ll let that slide. And this comes from a blog post that Aaron wrote. So I should have the link for it somewhere. If not, just look for Aaron Bertrand.

No lock. Not only will you find this post, you’ll find lots of other great posts about no lock. So in this query, Aaron also has a loop that runs and does stuff. Right?

It’s this thing. And he also has an update that runs. And it just says go 100 times. So I’m going to kick this off because this runs for a bit longer than the other thing does. And if we run this, we’ll see double counted rows eight skipped rows seven.

Right? Already we have hit problems. And we can keep running this until that go sort of exhausts itself. It does go 100 times.

This time we have one double counted row and seven skipped rows. Then we can essentially just keep doing this. And now we have, oh, two double counted rows and seven skipped rows. So this just gets more and more interesting the more you do it.

I guess seven and eight is a pretty common scenario for this thing having problems. Seven and eight just doing its thing. Oh boy.

This one looks a little bit different too. So this one we skipped a bunch of rows and then, wow, we skipped a lot of rows and then ended up double counting some rows. So you can see that no lock, I mean, you know, aside from the demo that I showed you in my isolation level series where there was an update and one of the tables referenced in the update had a no lock hint.

And we ended up putting bad data into a table. No lock hints can, of course, cause all sorts of problems just for your regular select queries, whether it’s errors because no lock cannot continue to scan due to data movement, never a good time, or you just start returning incorrect results of things.

Because you are no locking and seeing all sorts of things happening, flying like monkeys all around your query. It’s not a good time.

All right. So as always, stay safe out there. Don’t use no lock hints. Please use optimistic isolation levels. Do not harbor under the misgivings of others that optimistic isolation levels cause dirty reads and other unfortunate things because that’s just not true.

They prevent these problems. They prevent lots of other problems. They prevent locking and blocking and deadlocking problems. They’re really wonderful. You should try them out sometime.

Anyway, just a quick video today. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And, well, I don’t know. Depending on when this video publishes, I hope you have a great day and or weekend and or life.

May our roads someday cross again. All right. Goodbye. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Merry Christmas, From Darling Data

Ho Ho Ho


A different kind of video today. It’s my favorite Christmas song.

I hope you enjoy it, too.

This week, I’ve got a coupon that offers an even steeper discount on my training than the one at the bottom of the post.

It brings the total cost on 24 hours of SQL Server performance tuning content down to just about $100 USD even. Good luck finding that price on those other Black Friday sales.

It’s good from December 23rd-27th, or until I remember to turn it off. Hurry along now.

🎅🎅🎅

Video Summary

In this video, I wanted to take a moment to wish everyone a very Merry Christmas! While the holiday season is all about spending time with loved ones and enjoying festive cheer, sometimes we find ourselves stuck in front of the screen. So, if you’re looking for a quick break from your usual content, here’s a little something special. In this segment, I share some tips on managing your SQL Server databases during the holidays—keeping an eye on performance, ensuring backups are up to date, and making sure your servers aren’t overloaded with unnecessary tasks. After all, it’s important to keep things running smoothly even when everyone is away from their desks!

Full Transcript

Hey, it’s Christmas. Merry Christmas. Get off YouTube.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About LOCK_TIMEOUT In SQL Server

A Little About LOCK_TIMEOUT In SQL Server



This week, I’ve got a coupon that offers an even steeper discount on my training than the one at the bottom of the post.

It brings the total cost on 24 hours of SQL Server performance tuning content down to just about $100 USD even. Good luck finding that price on those other Black Friday sales.

It’s good from December 23rd-27th, or until I remember to turn it off. Hurry along now.

Thanks for watching!

Video Summary

In this video, I dive into the often-overlooked `LOCK_TIMEOUT` command in SQL Server and dispel some common misconceptions about how it works. Erik Darling from Darling Data shares practical insights by demonstrating a query that simulates a long-running update operation on the `high question scores` table, highlighting how lock timeouts are triggered only after significant work has been done, not before the query starts executing. I also touch on alternative methods to bypass potential lock issues and discuss why optimistic isolation levels might be preferable in certain scenarios. If you’re interested in more detailed guidance or consulting services related to SQL Server performance tuning, feel free to support my channel with a membership or explore the training content available through the links provided in the video description.

Full Transcript

Erik Darling here with Darling Data. I wasn’t sure that I had my microphone on. Whoops. Anyway, in this video we’re going to talk a little bit about the lock timeout command. Though not often used by people, the people who I do see using it often have a lot of strange misconceptions about it.

Like, when it checks for a lock timeout, or what locks might cause a timeout. So, we’re going to do that today. I guess you could call this a little about lock timeout. If you are so moved and motivated by the content that I produce here on YouTube for free, you can join the 30 other people who support my channel with a low-cost $4 a month membership.

It’s a nice way for me to, I don’t know, buy an espresso. Not a Nespresso machine, but like, an espresso. I live in New York after all. Espresso ain’t cheap.

If you don’t want to buy me an espresso, you can like, you can comment, you can subscribe. And while I won’t be able to buy an espresso with any of that, I will feel a little bit less lonely. So, you know, we all win, don’t we?

If you are in need of SQL Server consulting for any of these things, or anything else beyond this, golly and gosh, I’m probably the best person you could hire to do any of it. So, stop screwing up and hiring other people.

You’ll all end up with me anyway. And as always, my rates are reasonable. So reasonable, in fact, that I, year-round, blow the doors off everyone’s Black Friday sales. You can get 24 hours of SQL Server performance tuning training content for about $150 USD if you go to that website and use that discount code, both of which are also available.

Wouldn’t you know? There’s a link in the video description. You can become a member. You can get the training. You can get me.

Ah, it’s all there for you. All you have to do is click. Upcoming events, end of year, nah. 2025, hit me. Hit me with them. I’ll be there.

And anyway, let’s talk a little bit about Lock Timeout and SQL Server. So, what I’m going to do is I’m going to run this query. And I want you to understand that I have some hints on this query because I’m working on some other demos, so I have an index hint.

So I use the clustered index. Don’t worry about that so much. And then I also have a force order and a maxed up one hint on here. But I have those there for good reasons.

And this is mostly just to simulate a long-running query that does an update. Okay? I am updating a table called high question scores. And I am updating that with the score from the post table joined to the users table where post type ID equals one to get the right data.

So I’m going to run this query without any competing locks going on. And we’re going to look at the query plan. We’re going to talk a little bit about the query plan.

And then I’m going to show you how Lock Timeout can actually get sort of messed up by queries that look like, or rather, locks that may not be involved in the thing we’re trying to lock.

The thing that almost everyone thinks about Lock Timeout is that it checks for locks. And then if there are no locks, it runs the query. But that’s not true.

Lock Timeout kicks in for your query. And if your query encounters any locks that surpass the Lock Timeout, it will kill your query. All right?

So this thing runs for about 12 seconds. And this is the shape of the query plan. All right? This stuff over here. We scan the clustered index on the users table. And then we scan the clustered index of the post table.

We do a hash join to bring those tables together. We do an aggregate to find our max. And then up here, we join the result of that to the high question score table. And then we update the high question score table here.

Now, one might think that this query would check for locks on the high question score table. And if there are any locks that would cause this query to be blocked or potentially to deadlock, it would just say, no, the lock timed out. But that’s not really the way it works.

So I’m actually going to use an update that I used in my everything you know about isolation levels is wrong series. And this update does something rather cute and clever. Let’s just make sure this is all rolled back.

I see an error message in the corner there. So I assume it is. This, this, what this update does is it adds one trivial millisecond to the very last row in the post table. Okay.

So the very last row in the post table, we’re going to begin a transaction and we’re going to, we’re not going to roll it back. And the rollback is quoted out, but notice we’re updating the post table here, right? So I’m going to run this and that’s going to finish very quickly.

We’re going to roll that back afterwards. But now like watch, watch what happens when I run this. What a lot of people expect is that this query would check for a lock for a hundred milliseconds. And if that lock and if it find, and if there’s a lock for more than a hundred milliseconds, the query doesn’t run.

That’s not at all what happens. If I run this query, it’s going to run for almost the full duration of what it, what it would have run for before to do the join, do the three joins, right? Users to posts.

And then the result of users to post to the high question score table, and then update the high question score table. This executed. You can, I have to move over a little bit so you can see it. We don’t have a query plan for this because of course, you know, we, the query failed, so we don’t get a query plan.

But this ran for nine seconds. Nine seconds is just about the amount of time that we had in like the, the join between users and posts and other stuff, right? Like that was about nine seconds in there.

So this, this query actually didn’t check for locks first. It started running in countered locks and then the lock timeout hit. So because we’re like, and it’s not happening up here.

The only other thing going on on this server is this, where we’re updating the, we were updating the post table, the very last row. So SQL Server scanned all of the users table, did whatever it had to do to prepare the hash join, started scanning the post table, got to the very last row in the post table, and then said, uh oh, I’m locked. So we had the, that whole thing happen before the lock timeout kicked in.

And this lock is not on the table that we’re trying to update. This lock is on the post table, right? So you can kind of figure out now why I had the query plan look the way it does, right?

Because we want to make sure that we do a bunch of work to show you that this query actually runs for a good bit of time before the lock timeout kicks in and kills the query. So, uh, if we roll this back, uh, then everything will be fine and this query will be free to run and do its thing. This is going to run for another 13 seconds, but I do think it’s important for people to understand where they’re going to use things like this, that they understand where in the query the lock timeout intervenes.

It is not before the query executes. It is the query does not check for locks first. The query starts running processes as much as much as it can.

And then at the very end, uh, or rather, uh, like as it’s, as it’s running, if it encounters any locks, that’s when lock timeout kicks in. So lock timeout kicked in and the execution plan way down in here. Right?

So we did a lot of other work before SQL Server was like, Oh, I can’t do anything else. So be real careful when you’re using lock timeout, make sure that, uh, you understand what locks will cause your query to hit a lock timeout. It might not be looking for locks on the table that you’re modifying or even like, uh, I mean, I guess it would happen if you were just, if you were doing a select, right?

I think this is a particularly good illustration of a query that would have to take these kinds of locks up here. That doesn’t have to take anything except normal sort of read committed locks in here and doing its thing. Now, of course you could probably get past this if you are the type of, you know, potato brain, foul mouth dim that uses no lock hints.

Um, you could even get past this if you were using read past hints, right? Read past would be a one way to bypass any lock data. That would be another way that you might do this though.

You might end up missing some valuable data that is locked. Um, uh, so read past, not always a hot choice, no lock, never a hot choice. And of course, uh, our, our lovely optimistic isolation levels like, uh, read committed snapshot and snapshot isolation would avoid this scenario entirely.

Of course, if you were the type of big brain, constant light bulb overhead, uh, real bright bulb person who used optimistic isolation levels, you probably wouldn’t be the type of person who’s worried about setting lock timeouts anyway. Would you? You wouldn’t, wouldn’t happen.

Nope. You would just be free to do your thing. Be free to be you and me. Anyway, that’s a little about lock timeout. I hope you enjoyed yourselves.

I hope you learned something. Uh, thank you for watching. And, uh, I will see you in the next video, which I don’t know, I don’t know what it’s going to be about because, uh, I haven’t gotten there yet. But, um, I think I have some, some, some written blog posts that I would like to have in video form because, uh, videos get a lot of views and blog posts I don’t, I don’t think anyone can read anymore.

So, uh, I’m going to, I’m going to spoon feed you video content. And, the event. Uh, that’s a good thing in this video, which we have.

adulation声 with power sound of Ruby S rebels.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Parameter Sensitive Plan Optimizations In SQL Server 2022

A Little About Parameter Sensitive Plan Optimizations In SQL Server 2022



This week, I’ve got a coupon that offers an even steeper discount on my training than the one at the bottom of the post.

It brings the total cost on 24 hours of SQL Server performance tuning content down to just about $100 USD even. Good luck finding that price on those other Black Friday sales.

It’s good from December 23rd-27th, or until I remember to turn it off. Hurry along now.

Thanks for watching!

Video Summary

In this video, I dive into the world of parameter-sensitive plan optimizations in SQL Server, specifically focusing on how Microsoft has been addressing the challenges posed by parameter sniffing through intelligent query processing features. I discuss the evolution of these features from earlier versions to the introduction of parameter sensitive plan optimization in SQL Server 2022, highlighting its purpose and functionality. I also touch on the upcoming vNext version, speculating about potential new features like vector search and regex support for T-SQL, which are sure to bring both excitement and challenges for database administrators and developers alike. Throughout the video, I share insights into how indexing can significantly impact query performance and plan choice, emphasizing that while a perfect index might not solve every parameter sensitivity issue, it’s crucial in many cases.

Full Transcript

Erik Darling here with Darling Data. And today, me and Bats are going to talk to you about a little bit, a little bit in, a little bit about, I guess, I guess this is going to be a little bit about video, a little bit about SQL Server’s parameter sensitive plan optimization. Now, I realize SQL Server 2022 is now, I don’t know, two, almost three years old, I guess. Getting close to time. So, I guess, the next to 2025? We’re getting close to that, I think, that vNext feeling. And the reason I say that is because the last few SQL Server 2022 cumulative updates, they haven’t really had a lot of meat to them. They’re starting to ease off the big changes in there. So, I get this feeling, deep in my bones, that we’re going to start seeing some vNext action soon.

Part of that is also because the, what do you call it, the vector search stuff in Azure SQL DB recently went from private preview to public preview. Ooh la la. I still don’t have such like, you know, whatever ANN type indexes available for that yet. But the fact that it went to public preview also kind of indicates to me that there will be, there will be blood soon. Some vNext blood will be shed. And I’m excited about that. I mean, the vector stuff, I’m sure a lot of people are going to want to do it.

I don’t know how good it’s going to be yet, but we’ll see. I do have some confidence in the PMs involved there. But really what I’m excited about is that Regex is finally coming to T-SQL. Because that was also an Azure preview thing that I think recently became public. And what’s going to be fun about that is all the people who start putting that in where clauses. I can’t wait. I am so excited for regex where clauses where I get to tell people you’re going to have a bad time.

So, without further ado, let’s talk a little bit about me and you. If you feel, if you’re feeling like the pre-Christmas spirit, if you’re just like, it’s getting close to Black Friday and I just want to spend money, the very first link in the video description next to the words, become a member, will take you to a page where you can donate $4 a month to the Erik Darling brunch fund. It’s not going to go very far. Eggs in New York are for some reason $25. But you can do that if you’re feeling real spendy. If you’re not feeling real spendy, there are other numbers that make me happy to keep producing this content, like the likes and the comments and the subscribes. So, those things, wonderful. Beautiful. If you are feeling extra spendy and you’re like, wow, that Erik Darling sure does know a lot about SQL and, or boy, he sure does know a lot more than me about SQL.

And I’ve got this SQL Server causing all sorts of problems. Boy, I bet Erik Darling would be good at some of this stuff. You can also hire me to do some consulting and I can show up just wearing this same shirt, just sitting down at my desk over there instead. And we can talk about your SQL Server specifically and we can make that thing go faster. And as always, my rates are reasonable.

Ding. Ding. Ding.

Ding. Ding. Alright, cool. If you would like some high quality, low cost training content, at a much better discount than those other Black Friday sales, you can get all of mine for about 150 USD and that is for life. That is not just for a year. So, you could enjoy that. And, you know, like I always say, the longer you live, the less it costs because that cost that’s amortized or whatever the word is for that, where the longer you have it, the, like the more that initial investment spreads out over time.

So, like $150 for the first year is $150 and then for the second year, it’s $75 a year. And then for the third year, I don’t know anymore. Okay, give up. So, upcoming events, let’s be honest, I’m not doing anything until the new year anyway. If you’ve got any events coming up in, like, spring of 2025 or summer of 2025, let me know. It was announced at Pass that Redgate, a cool bunch of cats they are, are going to do a few mini-Passes this year. They’re going to be doing Dallas, New York, and Amsterdam.

I hope that I can make it to the New York one since that’s my home turf, but we’ll see if I can get DAL and AMS on the list too. That’d be nice. I do like to travel, so that’d be cool. So, you know, hopefully show up at those, but if you’ve got another local event that you’re thinking, wow, I sure would like Erik Darling to show up and say, All Day I Dream About Sets. Wait, All Day I Dream About Sets? That was a backwards, backwards grope at my local conference.

Boy, how do I get that? Well, just let me know and I’ll be there wearing my Adidas. With that out of the way, though, let’s talk about these parameter-sensitive plan optimizations, because that’s fun. Good stuff to do. So, as far as I’m concerned, this sort of thing has been a long time coming.

When Microsoft first started adding these intelligent query processing features, I think a lot of them were sort of geared towards dealing with parameter sensitivity, like side effects, like adaptive joins, where you were just like, whoa, crap, I chose a nested loops join, and I should have used a hash join, or vice versa. The merge join isn’t in there. Good job. Good job not including that turd.

And then stuff like the memory grant feedback, where it’s just like, hey, this memory grant, that was a good guess. And then they’re like, wait a minute, that memory grant was a really bad guess. Now, granted, the memory grant feedback isn’t a runtime adaptive thing like adaptive joins are.

That’s like after the query executes. So, you still have like the query that runs, and you’re like, wow, that was way slow. That spilled everything to disk.

And then, like the next time it runs, as long as it was like a big plan again, that bigger memory grant would make sense, because the memory grant would get adjusted up. But if it was a small plan again, TQL Server would be like, oh, wait a minute.

It was just that grant back down. And then a big one in the run, and you’d be like, oh, that sucked. And then back it up and down. So, you know, that one is not as great, but whatever. Let’s not get carried away with ourselves here.

But, yeah, like a lot of, like, I think a fair amount of the intelligent query processing stuff was to deal with sort of like the side effects of parameter sniffing. Then in 2022, we got this feature, the parameter sensitive plan optimization, which was designed to more directly deal with parameter sniffing by assigning different query plans, three different, up to three different query plans.

Well, you probably get three, right? You get three. Like, like, SQL Server decides what they are when they, like, get compiled.

But you’ll get, it has like three plans kind of in mind, I guess, based on a single equality predicate parameter. So no inequality predicates, no, like, double parameters. It doesn’t matter if they’re on different tables.

One parameter, one equality, three plans. So that’s what you guess. But what’s funny is, well, actually, what I think is good is that the feature is called the parameter sensitive plan optimization. Because when you have parameter, oh, ZoomIt is just being an absolute turd goblin on this one.

There we go. So when you have parameter sniffing, this is when the optimizer creates and caches a plan based on a set of parameters that the query is initially compiled with. And that’s just parameter sniffing.

And that’s something that you generally want. You don’t want SQL Server coming up with a new plan every single time. I’m not allowed. I don’t know if I’m allowed to say the person who I would normally attribute this to.

I don’t know if I’m allowed to attach their name to this. But I’ll say someone I know who does a lot of work with really highly concurrent workloads was complaining about Postgres migration because Postgres does not offer this. Like you can sort of do like a plan per connection.

But you can’t like you don’t get like a global plan cache like SQL Server has where you cache a plan and any query can reuse it. And sometimes the cost of creating like if you have a very highly concurrent workload where queries are always coming in and blah, blah, blah, blah, blah. The fact that you’re, you know, you used to go from like, you know, zero milliseconds because you were reusing a cache plan to now like 10, 20, maybe more milliseconds because you’re coming up with a plan every time.

That introduces a level of latency to your workload that is unkind. So that’s not fun there. And then parameter sensitivity is a different issue because parameter sniffing is a generally good thing that you want happening.

And then parameter sensitivity and wow, ZoomIt is just having all sorts of bad problems today. Thanks. Thanks, sysinternals.

Parameter sensitivity is when you when parameter sniffing happens, but then the parameter that you pass in, it does not create a good plan. That’s a rather does not cache a plan that is a generally good set of choices for other sets of parameters. So the easiest way to think about it is like, you know, you cache a plan for Rhode Island, but then the next query that runs is like Texas or California, or something like that, where, you know, you have a much, much bigger population of data than Rhode Island or Connecticut or what’s another small state in there?

No, Maryland or something. I don’t know. I actually know.

I don’t actually know what Maryland looks like. I don’t know. Is it Baltimore? Is it Delaware? I don’t know. I couldn’t draw Delaware. All right.

The ramparts we watched. But the first one is usually a good thing. The second one can pose problems. You know, but I think that the parameter sensitive plan optimization was a pretty good fit for the intelligent query processing set of features. Because, you know, like a lot of what I had to do as a consultant to deal with parameter sniffing was dynamic SQL.

I know it sounds counterintuitive. I’m not going to lie. There are a lot of recompile hints involved in those fixes, too.

No BS there. Recompile saves a day most of the time in a way that most people are okay with dealing with. Writing a lot of dynamic SQL, especially like branch dynamic SQL.

I actually have a whole video called defeating parameter sniffing with dynamic SQL on this very channel that you can watch. That goes into some of the ways that I some of the things that I did over with clients to help them with their problems. Now, a lot of the reason why this happens, why you have like parameter sensitivity issues, will come down to indexing.

Like that’s not the only thing, but indexing is going to be a big part of it. Like just like to give you like an idea of like the number of times where the plan chosen for a small amount of data was like little index seek, little key lookup, bam, everything was done really quickly. Then for a large amount of data turned into like a much larger index seek and then a much greater number of key lookups was like the parameter sniffing problem was huge, right?

There were like there were all sorts of other things, of course, like serial plan versus parallel plan, stuff like that. But in general, like but in general, like it was almost always like a costing choice between like like seek, little seek lookup and then like clustered index scan. So there was a lot of there were a lot of times in like indexing would be a way to give SQL Server one like much better plan choice than like several icky plan choices or one or rather a good one good plan choice.

And like several other icky plan choices for like other parameters, other predicates passed in as parameters. Now, even a perfect index like isn’t going to solve every single problem. That is the absolute that is like that is like my version of like the classic parameter sensitivity issue.

But there are all sorts of other things that can happen in a query plan based on parameters and cardinality estimation and all that other stuff. The types of joins you use throughout the query, the order of those joins, memory grants, whether the query goes parallel or not, types of aggregates, the placement of aggregates. Sometimes SQL Server is really good about like pre aggregating data, like for a join column before a join.

Other times it’s just like, oh, what’s the point? There’s only like a thousand rows. And then it’s like, wait a minute. There were like two million rows this time. Oh, we should have crunched those down.

And of course, another big one is especially for newer versions of SQL Server is whether or not batch mode gets engaged. There are really a lot of possibilities. There are so many things that can happen while SQL Server is coming up with a plan because of costing cardinality estimation, all that stuff that you really like would have a hard time accounting for all of them in a simple video like this.

And this doesn’t even get into all of the crappy things that you do to SQL Server when you write your queries, table variables, local variables, optimize for unknown, writing non-targable predicates, using the wrong cardinality estimation model, row goals, out of date statistics. And like, I don’t know, like CTE that you join to 70 times and like joins with or clauses. And this list goes on.

The mind continues to boggle. So let’s talk a little bit about what exactly you see when a plan gets the parameter sensitive plan optimization. If you look at the query text, you will see now that I formatted this a bit because I really dislike that it’s just one big long string at the end of the query.

Like if you look at it. But so I formatted this a bit. But this is what you’ll see.

This is not an option hint that you can specify with your query. At least not yet. Maybe someday Microsoft will be like, oh, yeah, you can do that because we trust you. Just kidding.

I don’t think that’ll happen. But you’ll see this option plan per value. You’ll see query variant ID with the number one through three in there. This just happens to be three, but you’ll see one, two, or three.

And then you’ll see this thing says predicate range. And this will give you the column and the parameter that was chosen for the parameter sensitive plan optimization. This is the cardinality that will guide which query variant you get.

Now, there are all sorts of reasons why the parameter sensitive plan optimization won’t actually kick it. At least as of SQL Server 2022. I’m on CU 16 right now.

And there are 40 reasons for that. Some of them more obvious than others. Things like no parameter with recompile or unrecompile. I don’t even know what unrecompileable means.

You turn off the database scope configuration. You have a query hint that would make different plan choices impossible like option loop join or hash join. There’s other stuff in here.

Query text too large. All right. Okay. I don’t know. Non-inline UDF. Skewness threshold not met. Compat level below 160. Other threshold not met.

I don’t know. Who knows what that means. System DB unsupported objects. I don’t know. There’s all sorts of stuff in here. But you can use extended events to figure out maybe why your query is not getting this optimization if you are particularly keen on it. So what’s interesting to me is like not only how SQL Server chooses which query or rather sorry which parameter will be the focus of its parameter sensitive plan optimization.

So I have two columns in the post table that are fairly highly skewed. This is the parent ID column and I’m only returning the top 10 from this because like you can see that parent ID 0 has 6 million rows and like the second highest one has 518. And then everything below that is just a dwindling number of parent IDs.

Right. We don’t have that many of them. The parent ID 0 is for of course for questions because they don’t have a parent. So there’s 6 million questions and then there’s like you know 11 million answers to those questions and the highest at least for the Stack Over 2013 database this post ID has 518 answers.

Everything below that is a much lower number. Then there’s this post type ID column which you know has a pretty high spread across like eight values. So there again there’s post type ID 2 which is answers.

There’s 11 million of those. There’s post type ID 1 and if we zoom out if we frame this up a little bit better. Like you’ll see well so there’s there’s a few other types of posts in the post table that also have a parent ID of 0.

So like like like straight up questions on the site there are 6 million 220 223 but there are a bunch of other types of posts that add another like 500,000 or 50,000 to that. That’s most likely going to be like these two added in there right. So there are two these two columns are both fairly highly skewed.

But in this in this for this particular query. So that should be post type ID of 1 for a question and parent ID of 0. SQL Server will choose the parent ID column is the one that it thinks is easiest to parameter sensitivize.

Right. So this one chooses parent ID over here. For some reason I can never get it to choose post ID or sorry post type ID.

Even if I you know like like I tried recompiling this and I was like okay well we’re going to look for post type ID 2 with 1 8 4 6 1 8. This thing just you know even like again like I said even if I recompile the SQL Server will consistently choose the parent ID. So but for this query it doesn’t matter.

We’re just getting that we’re getting the top 5,000 and we have a good index in place for this one. So it’s not really that big a deal. So what I so what it was kind of fun is during my talk the SQL Server performance tuning tasting menu. This is one of my favorite demos and the way that I leave this demo off is this will either be a really great parameter sniffing demo or a really great demo that shows off the parameter sensitive plan optimization.

You up until like this most recent cumulative update this thing would never get the parameter sensitive plan optimization. So see you 16 someone tweaked something or torqued something in there and this thing started getting it. This thing only has one parameter vote type ID and this thing used to never get the parameter sensitive plan optimization.

I used to have to show off using a temp table for this one but this is the where that parameter gets used and I just have the hint in here to make sure that no matter what compat level my database is in this query is gets is available to get the parameter sensitive plan optimization. Now the vote type ID column is skewed just like the post type ID column is there are just a few more vote type IDs. But you’ll see a vote type ID of two that’s an upvote there’s 37 million of those and then there’s like you know a few million of some others and then like high to low hundred thousands down to nothing of all these other ones.

So what I used to show people I’m going to show people was uh look at this this is so stupid when I run this for for for vote type ID for I get this query plan. And then when I run this for vote type ID to I get the same query plan and it takes like five hours to run. But the good news is this doesn’t happen anymore and we get the parameter sensitive plan optimization.

So if I run these two back to back and I realize I probably should have done that from the get-go but it only takes about four seconds. So you’ve already been here for 21 minutes. So what’s another few seconds do you we get two very different query plans here.

And if we look at the text that this thing gets will have this option plan per value thing like I showed you before terribly unformatted. This is why I formatted it before but here’s query variant ID 2 and down here if we scroll there’s query variant ID 3. All right cool.

The thing that kind of stinks and this is this is where we go back to like you know the good and bad of these things. All right because there’s always good and bad. What I used to complain about was that you know two and four four would run and we’d use the plan for two and four would run and we’d run it for vote type ID one and would use the plan for four and like that was no good.

But what I what I what I was hoping for was that like if I ever got the parameter sensitive plan optimization for this that SQL Server might be like hey this number of rows down here is not really similar to this number of rows. This number of rows is way more similar to this number of rows right like even though like they’re like they are far apart. Like I was hoping that SQL Server would be a little bit smarter about what it did for stuff like in this region.

But now when I run this for vote type ID one. That doesn’t quite happen. The parameter sensitive plan optimization did kick in and it did do something but the plan that we get is the same plan that we got for for vote type ID.

What was that for? All right. We get the same plan.

If we go and look at the query text, it’ll be query variant to whatever it was. But yeah, this thing actually reuses the plan for for for vote type ID one, even though vote type ID one has 3.7 million rows up here and vote type ID four has 733 rows down here. So, uh, excite like progress.

It did kick in. It did start working for for this query. So that’s exciting. But why wouldn’t you use like that middle ground plan for all this stuff that’s like in here?

Anyway, cool. Well, uh, that’s it for me. Uh, this, this video dragged on a bit longer than I thought it would, you know, I’m a rambling man.

Uh, but anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in a video hopefully with, uh, I don’t know, a happier outcome. Here, here at Darling Data, we love a happy ending.

Now, we’re doing a nice differing journey. Appreciate it.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

IF Branching In SQL Server Stored Procedures: Dynamic SQL!

IF Branching In SQL Server Stored Procedures: Dynamic SQL!



Thanks for watching!

Video Summary

In this video, I delve into the world of parameter sensitivity issues within SQL Server stored procedures, particularly focusing on how if branches can lead to these problems. I explore three effective methods for mitigating these issues using dynamic SQL, which allows us to tailor query plans based on different parameters without relying on Enterprise Edition or SQL Server 2022’s new features. By demonstrating practical examples and providing the code necessary to implement these solutions, I aim to equip you with the tools needed to optimize your stored procedures for better performance. Whether you’re looking to improve query execution by dynamically adjusting plans based on parameter values or simply want to avoid sharing plans across different data volumes, this video offers valuable insights and actionable steps to achieve those goals.

Full Transcript

Erik Darling here with Darling Data, doing everything I can to save the free world. In this video, we’re going to… This is the third and final part of my videos on if branches. In the first one, we sort of defined the problem. In the second one, we talked about other ways people use if branches stupidly and incorrectly in store procedures. The differences between with recompile and option recompile and using substore procedures along with a wrapper store procedure to sort of fix some of the problems that you can run into with if branches. The problem that you can still run into when you use store procedures or anything with parameters, anything with if branches, is that you can end up with a bad parameter sensitivity problem. SQL Server does have a newish feature as of SQL Server 2022 called the parameter sensitive plan optimization thing. The problem that the problem with that is it’s quite limited. It’s only for equality predicates. You only get three plan variations. Sometimes SQL Server will pick the wrong parameter to give sensitivity training to. So there’s a lot of stuff that can get weird with that. In this video, we’re going to talk primarily about how to use dynamic SQL to fix parameter sensitivity problems. I’m going to give you three ways that I use pretty frequently to do that. Maybe you can use… Maybe you can do something with that. Maybe you can’t. I don’t know.

If you can’t. If you can’t. Well, let’s talk about… Let’s talk about if you can’t do it, maybe I can. If you like this channel. If you’re like, wow, that Erik Darling. I hit mute and just watch him talk. He’s handsome. He’s in great shape. He’s young. Good looking SQL Server consultant. Voted by BeerGut Magazine to be the youngest, handsomest SQL Server consultant in the world. I just want to pay four bucks a month to see him wobble around on my screen and smile and make faces and hand gestures. Well, you can do that for four bucks a month. It’s a pretty good deal. At least as far as I’m concerned. It’s like… I don’t know. I mean, granted, if you buy scratch tickets, you maybe stand a better chance of becoming fabulously rich and famous than if you watch my YouTube videos. So I don’t know. It’s up to you what you spend that four bucks a month on. But, you know, if you give it to me, I promise I’ll spend four bucks a month on scratch tickets. How about that? If you don’t want to give me four bucks a month for scratch tickets, you can like my videos, you can comment on my videos, and you can subscribe to my channel.

Right now, we’ve got about 30 people who do this, and they’ve got about 5,000 people who do that. So, I don’t know. A bit lopsided, but we’re working on it. YouTube doesn’t let you do this until you have a lot of that. So, I don’t know. I’ve got to play in ketchup, I guess. If you are in need of SQL Server Consulting, I do all of these things. And I do them very well. Again, BeerGut Magazine says I am the youngest, handsomest, not only best looking, but best all around SQL Server Consultant. So, if you need help with any of these things, hit me up and we can talk.

If you would like some very high quality, very low cost SQL Server training, and you’re like, wow, I’m looking around at all these Black Friday sales, and gosh, there’s still a lot of money. It’s still like two grand for any of this stuff. You can get all mine for 150 bucks for the rest of your life. Just head to that site, use that discount code, and boy howdy, you’ll be swimming in SQL Server knowledge that you never dreamed of. If you ever dreamed of SQL Server knowledge, I feel bad for you. That’s depressing.

Anyway, I got no upcoming events. I keep saying this. If you have anything for 2025 that you want a young and handsome SQL Server Consultant to show up to and wax philosophical about performance tuning, well, gosh darn it, I might be the fellow for you. With all that done and out of the way, let us continue our magical journey into if branching. Now, there’s a lot of green text here, and this green text is mostly an ode to my discovery of and my ever-increasing love affair with Dynamic SQL.

If you ever look at the analysis or procedures I write, like pressure detector, quickie store, log hunter, health parser, there’s probably another one out there. I forget. I don’t know. They’re all good. You should use them, too. There’s a lot of Dynamic SQL in those, and I love Dynamic SQL for a lot of reasons.

And as soon as I started to write not crappy Dynamic SQL, I found even more good reasons to use it. So, you know, that was great, too. But the problem that we’re going to talk about today is using Dynamic SQL to fix parameter sensitivity problems because you can absolutely do that.

You know, SQL Server does give you ways to not deal with getting the same plan for a query every single time and using that and having problems with that by using Dynamic SQL. It’s a wonderful thing. It just, what can’t it do?

It’s like coffee. Every time you read a new study about coffee, it’s like, it protects you for life. It makes you a better person. You’ll be happier and more charitable and loving and caring.

Like, wow, coffee does it all. So does Dynamic SQL. So let’s look at how we can use Dynamic SQL to solve parameter sensitivity issues. I’m going to give you three options for it.

Two I’m going to sort of explain to you and show you the code for. One of them I’m going to show you, I guess, query plans for, too. But they all do roughly the same thing, just in slightly different ways. So if we look at the votes table and we look at the distribution of vote type IDs, you’ll see that some of these vote type IDs have way more values than others.

If you wanted to separate this into buckets, like sort of the way the parameter sensitive plan optimization does, we have this bucket up here for 37 million. We have these buckets down in here for like, let’s include that one, too, for like about a million to about 4 million. And then we have these buckets down here for low numbers up into like the low hundred thousands.

And what I’ve done in this store procedure, I’m going to just minimize that for a moment so you can bask in the glory of my fantastic Dynamic SQL. Is, of course, with Dynamic SQL, we must use the unicodes. So I have initialized three unicode things in here.

Now, you notice that I do have a thing for post parameters, but just for the sake of brevity, I’m cutting out the post portion of this so we can focus on the votes table vote type ID portion of this. So don’t worry about that, right? It’s okay if you declare a variable and then don’t use it.

It’s not like terribly wasteful because we’re just assigning this string to it. We’re not doing some big dastardly subquery to assign values to it. So stop it.

Stop yelling at me. God, I can hear you from here. And then down here, this is where we begin our Dynamic SQL. This is just sort of a remnant from when this was split into running two different queries. But things that I like to do with my Dynamic SQL, I like to format it nicely and make sure everything is sort of, you know, on the line.

So, you know, I got like, you know, this part sort of indented over to here and I got a little a little leftover space down here where I can plop this thing onto. And one thing that I think is tremendously important to do with Dynamic SQL is to put a comment in it with the name of the module that owns the Dynamic SQL. So if someone finds this looking through query store or the plan cache, they can say, oh, this belongs to this store procedure.

This is not just some random piece of ad hoc SQL that I might or might not care about. Okay. So that’s one good thing.

Now, I’m just going to scroll down a little bit and I’m going to bring our results back up here. So let’s just say that if our vote type ID is one of these, we want to change our query slightly and we want to add this string to it. And then if our query down here, say, is in one of these, that’s 16 through 9, then we are going to tack this onto the end of our query.

And if our vote type ID is down here in one of these, we are going to tack this thing onto our query. Right. So adding any one of these three dummy additional ands to our query actually does generate different query plans because SQL Server is like, oh, it’s a different query.

Look, this one wants to know if one equals select one. This one wants to know if two equals select two. And this one wants to know if three equals select three.

It’s a wonderful thing. So it’s a fantastic thing that you can do to almost do the same thing as SQL Server’s parameter sensitive plan optimization without having to be on Enterprise Edition or SQL Server 2022 or Compat Level 160. You can hand roll this thing.

It’s wonderful. All you have to do is be as smart as me or maybe hire me to do this sort of thing for you. Either option. Pretty good.

My rates are reasonable. It’s Christmas. Darling children need presents for Christmas. So we’ve got this store procedure here. And just so you can see the different results, I’m going to print out the SQL there.

And then I’m going to execute our dynamic SQL. This is probably the third and one of the most important parts of writing good dynamic SQL is to parameterize your dynamic SQL. There should be no plus signs concatenating strings into your dynamic SQL.

And you should use sp execute SQL with passed in parameter values. If you don’t do that, you are a foolish idiot and you deserve all the SQL injection that you get. And I don’t feel bad saying that.

So with this done, what we’re going to do is run this version of the store procedure. We have query plans turned on. So we’re going to run this for vote type ID 7.

And if we look over in the messages tab, this will be the query that ran. Look how nicely formatted that is. And this is where we got the 3 equals select 3 version of the query. The query plan that we got for it looks like this.

Simple, elegant, efficient, parallel, nested loops. Ooh, beautiful. I don’t know what that accent that is. I apologize if I offended anyone’s nationality.

And then if we run this version of the store procedure, let’s just go to the messages tab. That’ll print out eventually, I suppose. Do-do-do.

Do-do-do-do. There we go. So vote type ID 2 is a little bit of an outlier because this one has a lot of votes to it. This one always takes a little bit longer.

But if we go to the messages tab, this did indeed get 1 equals select 1 and does get a very different query plan from the first one. If I were going to be a little bit more fair and not say get the 37 million vote 1, oops, we could do vote type ID.

And I keep hitting, oh, I hit insert. No wonder everything in my life was going wrong. And we do vote type ID 1. This should run a little bit quicker because we’re not getting 37 million rows. We’re getting 3.7 million rows.

So this one does run a little bit better. And, you know, SQL Server did misestimate some stuff for vote type ID 2. But that’s okay. We got a little bit of spilling.

Oh, I still have insert turned on. Shame on me. But, you know, there are things that we could do to fix performance generally after that. Right? We could certainly do things that would make even the vote type ID 2 version of the query more efficient.

But the big point is that we are not sharing query plans anymore between small amounts of data and large amounts of data. We are sharing plans for large amounts of data and medium amounts of data and small amounts of data.

Those will all get shared. But for the big amounts of big, small, medium, and large, we are no longer sharing plans across those things. Right? So we might still have some performance problems that we need to fix for large amounts of data. But that’s a totally different thing that we can totally fix in other ways.

Right? That’s all sorts of other options we have for that. I’m going to show you two other things that I’ve employed over the years that have done sort of similar things.

I’ve used them situationally. Obviously, not every situation has this obvious of a fix for it. I think one very specific problem that you might run into is date ranges.

So one thing that I’ve had to fix quite a bit using Dynamic SQL is to say, hey, if the date range is recent, right? Like, let’s say it’s either an open-ended where start date is greater than, or it’s a closed-ended with like a start date and an end date.

Sometimes you want to look at like whatever parameter value someone passed in and be like, hey, if this is like today, do the one equals select one. If this is like a year ago, do the two equals select two.

If this is 10 years ago, do the three equals select three. So like, or like, you know, you can apply that to open-ended with like I just described or closed-ended where you’re like, hey, if this is like a week, one equals select one.

A year, two equals select two. Five years, three equals select three. You can do stuff like that to figure out whether the, you know, the start date open-ended or the start date end date closed-ended ranges are like, like what breadth of data they’re going to have to touch.

But there are other ways to do that. Like you can, like one thing that I have done at times is like, let’s say that like going back to the, the data distributions where there’s a large amount of data, right?

So two, one, three, five, 10, six. Those are the ones that had the most data assigned to them. Let’s say that even sharing a plan within those is kind of annoying.

So what we can do is say if vote type ID is in one of those big things, then we’re just going to stick an option recompile hint on the end of the query. If it’s not one of the big things, then we’re not going to stick an option recompile hint on there.

So what that means is that a SQL Server will run this query with, and like use recompile to do parameter embedding, like I talked about in the second video, come up with a query plan specific for the value that’s in there and do that.

Option recompile, depending on the frequency of execution, the plan complexity can be an unattractive solution at times. If it is to you and you don’t want to do this, there’s another really neat thing you can do that requires a little bit of string substitution.

That’s what I’m going to show you next. So rather than say optimize, rather than like have an optional option recompile on there, one thing, one other thing you can do is say option optimize for, and you can use like a weird like string that will never actually get used in the data.

In this case, it’s just open bracket, close bracket, and three at signs. Right? And so we’re going to say, like we couldn’t, we can’t put a parameter or a variable in there. And because I hate string substitute, and because I hate doing string concatenation stuff, I want something that I can easily identify and replace in that string.

So what I’m going to do is every time this runs, I’m going to run the replace command, and I’m going to replace that little canary string with whatever the vote type ID is. This is cool because this will get you plan reuse for each different vote type ID.

This will get you a different query plan for each vote type ID. This has upsides and downsides. And the upside for me is that there are like 16 vote type IDs that we could possibly have.

Right? Which, you know, I mean, sure, we have 16 plans. So what?

16 plans is nothing in the grand scheme of things. If you need to do this for like a lot of different values, then I wouldn’t recommend that. I would recommend this for things with a sort of known quantity, known small quantity of values.

But if, you know, if you are completely out of options, maybe the optional option recompile doesn’t, won’t work for you for whatever reason. Maybe the, you know, the one equals select one, two equals select two, three equals select three thing where you can bucket things a little bit better.

Or rather, you can bucket things in a way that you can’t do with this. If that doesn’t work for you for whatever plan sharing reason within those buckets, or maybe just fix your buckets.

I don’t know. We could talk about this all day. But this can be a pretty good solution if you have a small number of known values with a high skew between them, which the vote type ID column in the vote table certainly does.

Remember like 37 million, then like 3.7 million to a million, and then like 800,000 to like two or something. Those are like pretty good buckets to work in.

But like, let’s just, you know, for the sake of argument, say that sharing plans for the big buckets, medium buckets, and small buckets, like even within those groups, doesn’t go well. For a small number of values, we can also optimize for a specific value in there.

So those are the three different ways that I’ve, oh, why are you all the way over there? Those are the three different ways that I’ve used dynamic SQL to fix parameter sensitivity problems.

And the reason why this is included in the if branches is because how did we do this? We used if branches to decide what we were going to do in a correct way with dynamic SQL to make sure that not only do we not compile every plan in a store procedure, but we also compile a good plan for the parameter values that we pass in.

This is a great use of if branching. So with that done, thank you for watching. Hope you enjoyed yourselves. I hope you learned something and I will see you in a future video.

I’ve got kind of a fun one about lock time out coming up. So we’re going to, we’re going to have that. We’re going to, you have that to look forward to. This video should be dropping on a Friday. So if you’re, if you, if you’re watching this on a Friday, have a great weekend.

It’s all I ask of you is have a great weekend. Anyway, thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.