Situations
I end up converting a lot of derived joins, particularly those that use windowing functions, to use the apply syntax. Sometimes good indexes are in place to support that, other times they need to be created to avoid an Eager Index Spool.
One of the most common questions I get is when developers should consider using apply over other join syntax.
The short answer is that I start mentally picturing the apply syntax being useful when:
- There’s a small outer table (FROM), and a large inner table (APPLY)
- I need to do a significant amount of work on the inner side of a join
- The goal of the query is top N per group, or something similar
- I’m trying to get parallel nested loops instead of some alternative plan choice
- To replace a scalar UDF in the select list with an inline UDF
- In order to use the VALUES construct in an odd way
Most of this is situational, and requires a bit of practice and familiarity to spot quickly.
Both cross and outer apply can be used in similar ways to subqueries in the select list, with the added bonus that you can return multiple columns and rows with apply, which you can’t do in a normal subquery.
What Apply Does
The way to think about what apply is doing when a query runs is supplying a table-valued result on the inner side of a join for each row supplied by the outer side of a join.
Here’s a simple example:
SELECT u.Id, u.DisplayName, u.Reputation, p.Title, p.Score, p.CreationDate, p.LastActivityDate FROM dbo.Users AS u CROSS APPLY ( SELECT p.* FROM dbo.Posts AS p WHERE p.OwnerUserId = u.Id AND p.PostTypeId = 1 ORDER BY p.CreationDate DESC OFFSET 0 ROWS FETCH NEXT 3 ROWS ONLY ) AS p WHERE u.CreationDate >= '20131230' ORDER BY u.Reputation DESC, p.CreationDate DESC;
We’re getting everyone from the Users table who Posted a Question in the final days of 2013, ordered by when it was Created.
For every qualifying User, we get a tabular result showing the Title, Score, CreationDate, and LastActivityDate of their question.
You can picture it sort of like this:
Some Users have have more than three results, and some may have fewer than three results, but since the query is self-limited to only the first three, our query sets a row goal and quits once three are found.
More About The Query
Because the Users table is correlated by Id to the Posts table on OwnerUserId, we need to make sure we have a good index to make that access seekable.
Also because we’re only looking for Questions (PostTypeId = 1), and ordering by the most recent ones (CreationDate DESC), it’s also a wise idea to have those in the key.
It’s also worth talking over an interesting point in the query itself: The select from Posts inside of the apply is doing a select star, sure, but the outer query is only pulling a few of the columns. The optimizer can recognize this, which means we don’t need a gigantic covering index to make this query fast. We also… Don’t really need a covering index at all in this case. Just the key columns are good enough.
CREATE INDEX u ON dbo.Users (CreationDate, Reputation, Id) INCLUDE (DisplayName) WITH (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE); CREATE INDEX p ON dbo.Posts (OwnerUserId, PostTypeId, CreationDate) WITH (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
Efficiency Unit
An efficient query plan when using apply will look like this:
An inefficient query plan using apply will look like this:
If you’re experimenting with apply, either cross or outer, pay close attention to the query plans. If someone says “cross apply is always slow”, you can bet they stink at indexes.
Choices, Choices
The choice to use apply at all depends on the goal of the query, and the goals of the query tuner. It’s not always a magic performance bullet, but under the right circumstances, it can really make things fly.
The choice to use cross or outer apply depends on the semantics of the starting query. An inner join commutes easily to cross apply, and a left join commutes easily to outer apply.
One important difference in how the joins are implemented is in the optimizer’s choice between normal nested loops, where the join is done at the nested loops operator, and apply nested loops, which is when the join keys are pushed to an index seek on the inner side of the join.
For an in-depth look at that, I’d suggest reading Paul White’s post, Apply versus Nested Loops Join.
The optimizer is capable of transforming an apply to a join and vice versa. It generally tries to rewrite apply to join during initial compilation to maximize the searchable plan space during cost-based optimization. Having transformed an apply to a join early on, it may also consider a transformation back to an apply shape later on to assess the merits of e.g. an index loops join.
Just writing a query using apply doesn’t guarantee that you get the apply nested loops version of a nested loops join. Having solid indexes and easy to search predicates can help push things in the right direction.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.