How To Write SQL Server Queries Correctly: Apply
Thanks for watching!
Video Summary
In this video, I dive into the world of SQL Server’s `APPLY` operator, specifically focusing on `CROSS APPLY` and `OUTER APPLY`. I explain how these operators can transform your queries to be faster and more efficient, especially when dealing with complex operations like windowing functions or derived joins. Whether you’re looking for top N rows, implementing row numbering, or performing pivot-like transformations, this video covers it all. I also highlight the importance of having appropriate indexes in place to leverage `APPLY` effectively, as well as discuss scenarios where using `APPLY` can lead to better performance than traditional derived joins. By the end, you’ll have a solid understanding of when and how to use `APPLY` to optimize your queries.
Full Transcript
Erik Darling here with Darling Data. Boy, am I hungry. It’s been a long day. In today’s video, we’re going to talk about, continue with my series, How to Write Queries Correctly. In this one, we’re going to cover the usage of apply. That’s cross-apply and outer-apply and different ways you can use cross-apply and outer-apply to make your queries faster and better and, I don’t know, do things like that. I like talking about apply. I like talking about apply. Hopefully, you’ll like me talking about apply. Hopefully, it just works out for everyone, right? Anyway, before we do that, we of course have some monetary concerns. We have to talk about fiscal policy here. If you would like to become a member of this fine channel, and maybe say thank you for the billions of hours of content that I produce, you can become a member. Join like, 30-something other people who have become members by clicking the first link in the video description down over there. It says become a member. If you are, I don’t know, if you are hawkish on your financial policy and you say, four bucks a month, Erik Darling, jeez, I don’t know. You can like, you can comment, you can subscribe. You can make my numbers go up in other places. Hopefully, it’s not going to be like blood pressure and cholesterol, because when those go up, bad things tend to happen.
Anyway, let’s not get grim. If you need help with your SQL Server, I am a SQL Server Performance Tuning Consultant. Some might even say they work at BeerGut Magazine, the best SQL Server Consultant in the world. I am available for all of these things and more. And as always, my rates are reasonable. If you would like to get access to my paid training, my paid SQL Server Performance Tuning training, you can get all 24 plus hours of it for about 150 USD.
And that lasts for life. It’s not a subscription. You have to renew every year. You buy it once and you have it. And then you can watch it and get better. And then you can be good at SQL, like me. Link, coupon code, done. No events. 2024 is done for events. 2025, I’m watching you. If there’s an event near you and you think Erik Darling would be good at that event, well, if you tell me what that event is, I’m not psychic. I have many things. I’m not psychic. I cannot magically guess which event you think I should go to. Let me know what it is.
And maybe I’ll be able to go. Who knows? I don’t know yet. Because you haven’t told me. All right. Now, let’s talk about apply. Now, I end up converting specifically a lot of derived joins, particularly ones that have windowing functions in them, to use apply instead for very specific reasons. Sometimes you need to create an index to support that thing. But mostly you want to avoid the eager index pool.
You need to at least be able to seek into an index on the inner side of a nested loops join to have that make sense. But particularly ones where row number is involved, it makes a lot of sense for reasons that I’m going to explain. All right. You will get a full explanation, but I just want to let you know that that’s usually where, for me, the apply stuff shines.
There are many other great reasons and places and things you can do with it. But that’s the one that I end up fixing the most. Now, when apply is most useful is if you have a small outer table and a large inner table.
Right. Because you want to have a small number of rows on the outer side of a nested loops join. And you can use that small number of rows to get to the inner side of the nested loops. Right. You don’t want nested loops for like you don’t want a big table on the outer side of nested loops.
And you don’t want two big tables involved with nested loops because you’re in for a bad time if you do. If the amount of work that like the query that goes into the apply is rather complex or does something that is computationally complex, windowing functions being one of those things, that’s another very good reason to use apply instead.
If I have very specific query goals that make apply pretty much the smartest way of doing things. Sometimes it’s like, you know, saying top three or offset zero rows fetch next three rows. Other times it is using a windowing function to filter out like where the windowing function is less than or equal to three.
Really, that depends a lot on data distribution density, stuff like that. That would be another good reason to use apply. If I am really trying my hardest to get a parallel nested loops plan, apply is usually a good way to do that.
If I need to replace scalar UDF in the select list with an inline UDF, that might be another good place to use apply. And if I need to use the values construct to do some surgery on one or more columns, that would be another good reason. We’ll talk through most of this stuff.
A lot of it is situational and it does require some practice to get familiar with it and know when the appropriate time to use apply is. Both cross and outer apply can be used in very similar ways to subqueries in the select list with the added bonus that, you know, like we took in the last video about subqueries. We talked about how you can really only return one row with them and you can only return one column with them.
With cross apply and outer apply, you don’t have those limitations. You can return multiple rows and multiple columns. That’s why I like the top end per group thing is really popular for apply.
What you really want to think of when you’re choosing which apply to use is cross apply really should be called inner apply because it’s like an inner join. And outer apply is actually appropriately named because it’s sort of like an outer join, right? Outer apply does not restrict rows.
Cross apply does. So here’s here’s sort of a simple example. Now you could use top three. You can use offset fetch in here. But let’s say that I just wanted to get the top three user the top three posts for a user.
I can do that with with this query pretty easily. Now. These are the these are the results.
Some of them. You will have three. Some of them you won’t. There might not be three for everyone. But for the people who do have three, you will get them. So that’s nice.
Right. There’s not really a great way to like insert a dummy row if if you just want like a third thing to show up for everybody. But no, whatever. Neither here nor there.
You can also use row number to do something similar if you don’t have good indexes in place or if your data distribution just sort of it just makes more sense to to use row number instead. There are some pretty good reasons to use row number if you are on if you’re in a higher SQL Server compat level where batch mode on rowstore is available. Or if you like, you know, can get batch mode involved using a trick with like a temporary table or an empty filtered non clustered columnstore index.
Because you can see the window aggregate from batch mode show up. And that’s a lot faster than the typical arrangement with row mode windowing functions where you’ll sometimes have a sort to put data in order for the partition by order by. And then the segment segment the sequence project.
The window aggregate is a lot usually a lot faster than the like the batch mode window aggregate is a lot faster than the row mode equivalent of those plans. So there are lots of good reasons to use apply depending on like and we’re, you know, we’re using apply in both of these just this one’s with row number and this one is you can again, you can use top or offset fetch. What what really drives the decision here is if this is fast enough, then cool, use this.
If this if that’s not fast enough, then you might want to think about using row number instead of the top with offset fetch. Really, like I said, it’s going to depend on like data distribution density, things like that. And, you know, getting like like top and offset fetch don’t really get like the batch mode benefit that windowing functions do.
So if if batch mode is a goal, then the windowing function will probably be faster there as well. So like for those queries, we’re just getting everyone from the users table who posted a question in the final days of 2013 ordered by when it was created and reputation and some other stuff. But the I guess the point is that this produces essentially a tabular result, right?
This produces like a second table that you’re joining to. And for everything that we find in here, for every row that we find that that meets this criteria, we apply this logic to every row. Right. So we can get multiple columns back.
We can get multiple rows back. We can get lots of stuff back. Now, one thing that I think is probably worth pointing out is like when we were talking about exists and not exists, one thing that I said is that it doesn’t matter if you what you put in the select list of exists and not exists because SQL Server just throws it away. Okay. Sort of in the same vein, it’s okay if you use star in apply or outer apply in the inside of the applied part of the query, because whatever columns you actually pull out of it up here, those are the only ones that the SQL Server is smart enough to realize you’re not selecting every column out of the post table.
SQL Server does some figuring when you first send it a query and it says, oh, even though there’s a select star in here, I know that in the outer select, I’m only getting title score creation date and last activity date. So I know that I like, I don’t actually need to treat this like a select star query. So using the select star inside of this is not a big deal because I’m not using select star in the final outer select slash project front for the results.
So that’s nice there. It’s sort of like how I have a select star here and a select star here, but SQL Server is like smart enough to realize that only these columns are involved aside from, you know, the stuff that I’m using inside of the query. So because the user’s table is correlated from ID to owner user ID in the post table, we do need to make sure that we at least have a good index that leads on owner user ID. So for every trip that we, for every time we apply that query to what’s in the user’s table, we have an efficient way to seek into that index and find the rows that we want.
You’re going to have some additional considerations with the windowing function thing, or if you have a top or offset fetch with a, with an order by in there, because you’re going to want to figure out how to not sort data every time, probably. Right. So just a couple notes on that. Another neat thing you can do with apply is sort of, you can do like a mock pivot and unpivot with apply. Itzik Ben-Gan has a lot of great videos on this. If you’ve never seen him present on apply, I would highly suggest just looking, looking for either, you know, his blog posts about apply or his videos about apply.
Pretty much anything where he talks about T-SQL is magical. But one thing that you can do is you can use this cross apply with a values clause to sort of combine the creation date and last activity date columns like this. And you can use that to sort of like pivot on them like this, like you’re turning each of these, you’re turning each of these, like each of these columns into a single column, right? So creation date and last activity date are two separate columns, but using values, we can pass them in as a single column and we can do, we can mimic the greatest and least functions that SQL Server 2022 added.
So like with SQL, if you’re on SQL Server 2022, we could just use greatest and least to figure out which value is higher or lower, the greater or the lesser. But with older versions of SQL Server that were the greatest and least functions aren’t available, which is weird because they’ve been around in other databases like forever. You can do something like this and this ends up with a pretty neat and nifty query plan.
We get all the stuff that we care about out of here. You’ll just have this constant scan, which you’ll see has just about twice as many rows as this because we need to basically make one long list from creation date and last activity date. And then we just aggregate to figure out the min and the max from those.
So you can use apply with the values clause for a lot of really powerful stuff. The choice to use apply really does depend on the goal of the query and the goals of the query tuner. It’s not always a magic performance tuning bullet, but under the right circumstances, it can really make things a lot faster than doing something like a derived join.
The choice of cross apply or outer apply, of course, comes down to query semantics. If you want the apply to restrict rows or filter rows, you want cross apply. If you want to do the equivalent of like an outer join, then you want to use outer apply.
One important difference in how the joins are implemented is in the optimizer’s choice between normal nested loops where the join is done at the nested loops operator and the apply nested loops, which is when the join keys are pushed to the index seek on the inner side of a join. When you see that, when you get apply nested loops, you can tell because when you highlight, when you hover over the nested loops join, you get the little tool tip that pops up.
Down at the bottom, you’ll see something that says outer references. And those outer references are the seek predicates being pushed into inside of the nested loops join rather than having them apply at the nested loops join. There’s a great post by Paul White about apply nested loops.
Again, if you’re feeling googly, definitely look for his post on Paul White apply nested loops because you’ll learn a lot about that there. Now, the optimizer is capable of transforming an apply to a join and vice versa. It will generally try to rewrite apply to a join during initial compilation because there’s more searchable plan space for that type of join.
If you transform to an apply early on, it may also consider a transformation back to an apply shape later just to figure out what would be cheaper. But just writing a query using apply does not guarantee that you get apply nested loops instead of just regular vanilla nested loops. Having good indexes in place is really like generally what tips the optimizer towards using that.
Now, there are a couple of things that, well, because we was talking about Itzikbengan earlier. There are a couple of cool things that you can do with apply that make life a lot easier. One of them is kind of what I showed you with greatest and least, except you can expand that to do lots of fun things, right?
Like finding the min and max per user. This isn’t a terribly fast query, but that’s okay for this one. We just start off, we start off by doing sort of what we did in the first query where we get sort of an initial min and max from things.
And then we use a slightly, I mean, it’s not even convoluted. It’s just something that we have to do some additional aggregations out here to have that group by ID and display name. Otherwise, we would have multiple rows, right?
We would get multiple rows back. Because, again, not like exists where you just get one thing. You know, like basically for every post in the users table, this is going to generate a row. So users who have multiple posts will generate multiple rows and we want to collapse that down just to get the min and max.
Another really cool thing that you can do with cross apply and continuing to use the values clause is you can sort of like, again, this is something Itzik talks about in his things. I think he’s right that this is a very neat trick, is you can do stuff like get the year that each of these things happened in.
All right. We have like the creation year and the last access date year. And then we could just assemble like the beginning and end span from that.
So we can use date from parts to take a year and just say 0101. Oops, that didn’t go well. Just say 0101 here and say 1231 here.
And we can get sort of like a span of time. So, I mean, some of these are more interesting than others, right? Like, well, like a lot of these are 2008 through 2018.
So it doesn’t really show off how cool this is. But this one, you know, we get the correct 2008 to 2017. Where are some good ones in here?
I don’t know. Some of these query results are just boring. But very neat things that you can do with apply and with values that can actually, and I’m going to talk about this when you get into CTE a bit more. With CTE, you typically have to like keep stacking them.
And, you know, the more complex things you have to do, the worse that stacking gets and like passing results and aliases down from one to another. But with apply, you can generally do stuff a lot more cleanly without having, without the fear of CTE executing the query inside them more than once and causing a giant cascading awful of query plan. So just a little bit about cross apply and outer apply there.
Very, very useful query techniques for all sorts of things. Most common use is sort of a top and per group thing. But, you know, there are lots of other cool uses for them that, you know, really, if I had all day to talk about apply, I could show you a lot of things.
But trying to keep this sort of short and basic so folks understand kind of what they are and how to use them. I don’t want to overcomplicate that and get into the really crazy stuff because I would probably lose a lot of people. So I don’t want to lose anyone.
I’ve lost enough in my life. I don’t know. We don’t need to talk about that. But you know who you are up there. Anyway, thank you for watching. I hope you enjoyed yourselves.
I hope you learned something. And I will see you in the next video, which is going to be about what’s number five here? In and not in.
So we’re going to have some fun things to say about in and not in in that video. So do try to contain yourselves. But I understand why some of you out there just might be orgasmic.
What we’re going to say in the next one. So I will see you in that video. Goodbye.
Not forever. Just until next time. Goodbye for now.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.