Followers
Common table expressions remind me of clothes in the 70s. A bunch of people with no taste convinced a bunch of people with no clue that they should dress like them, and so we got… Well, at least we got the 80s afterwards.
The big draw with common table expressions is that they filled in some blanks that derived tables left unanswered.
- The first problem with common table expressions is that most people use them like nose and ear hair trimmers: they just sort of stick them in and wave them around until they’re happy, with very little observable feedback as to what has been accomplished.
- The second big problem with common table expressions is that the very blanks they were designed to fill in are also the very big drawbacks they cause, performance-wise. Sort of like a grand mal petard hoisting.
To bring things full circle, asking someone why they used a common table expression is a lot like asking someone why they wore crocheted bell bottoms with a velour neckerchief in the 70s. Someone said it was a good idea, and… Well, at least we got the 80s afterwards.
Much like joins and Venn diagrams, anyone who thinks they have some advanced hoodoo to teach you about common table expressions is a charlatan or a simpleton. They are one of the least advanced constructs in T-SQL, and are no better or worse than any other abstraction layer, with the minor exception that common table expressions can be used to build recursive queries.
Other platforms, enviably, have done a bit to protect developers from themselves, by offering ways to materialize common table expressions. Here’s how Postgres does it, which is pretty much the opposite of how SQL Server does it.
By default, and when considered safe, common table expressions are materialized to prevent re-execution of the query inside them.
You can force the issue by doing this (both examples are from the linked docs):
WITH w AS MATERIALIZED ( SELECT * FROM big_table ) SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref WHERE w2.key = 123;
Or go your own way and choose to not materialize it:
WITH w AS NOT MATERIALIZED ( SELECT * FROM big_table ) SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref WHERE w2.key = 123;
You don’t get those options in SQL Server as of this writing, which really sucks because developers using other platforms may have certain expectations that are, unfortunately, not met.
Likewise, other sane and rational platforms use MVCC (optimistic locking) by default, which SQL Server does not. Another expectation that will unfortunately not be met for cross-platform developers.
Common Table Cult
The amount of developer-defense that common table expressions get is on par with the amount of developer-defense that table variables get.
It’s quite astounding to witness. How these things became such sacred cows is beyond me.
First, there are times when using a common table expression has no impact on anything:
WITH nocare AS ( SELECT u.* FROM dbo.Users AS u WHERE u.Reputation > 999999 ) SELECT n.* FROM nocare AS n; WITH nocare AS ( SELECT u.* FROM dbo.Users AS u ) SELECT * FROM nocare AS n WHERE n.Reputation > 999999;
SQL Server is at least smart enough to be able to push most predicates used outside of common table expressions up into the common table expression.
One example of such a limitation is when you put a windowing function into a common table expression:
WITH nocare AS ( SELECT v.*, n = ROW_NUMBER() OVER ( PARTITION BY v.UserId ORDER BY v.CreationDate ) FROM dbo.Votes AS v ) SELECT n.* FROM nocare AS n WHERE n.VoteTypeId = 8 AND n.n = 0;
If VoteTypeId were in the PARTITION BY clause of the windowing function, it could be pushed into the common table expression portion of the query. Without it in there, it has to be filtered out later, when the where clause also looks for rows numbered as 0.
But this does bring us to a case where common table expressions are generally okay, but would perform equivalently with a derived table: when you need to stack some logic that can’t be performed in a single pass.
Using a common table expression to filter out the results of a windowing function just can’t be done without some inner/outer context. Since objects in the select list are closer than they appear, you can’t reference them in the where clause directly.
I’m fine with that, as shown in the example above.
Common Stacks
Stacked common table expressions are also “fine” up to a point, and with caveats.
One of my absolute favorite paging methods uses stacked common table expressions.
DECLARE @page_number int = 1, @page_size int = 100; WITH f /*etch*/ AS ( SELECT TOP (@page_number * @page_size) p.Id, n = ROW_NUMBER() OVER ( ORDER BY p.Id ) FROM dbo.Posts AS p ORDER BY p.Id ), o /*ffset*/ AS ( SELECT TOP (@page_size) f.id FROM f WHERE f.n >= ((@page_number - 1) * @page_size) ORDER BY f.id ) SELECT p.* FROM o JOIN dbo.Posts AS p ON o.id = p.Id ORDER BY p.Id;
The reason why this is okay is because each common table expression has a single reference. There are two points in the query plan where data is acquired from the Posts table.
Where things get tricky is when you keep doing it over and over again.
Attack Stacks
Take a query like this, and imagine what the query plan will look like for a moment.
WITH top5 AS ( SELECT p.*, n = ROW_NUMBER() OVER ( PARTITION BY p.OwnerUserId ORDER BY p.Score DESC ) FROM dbo.Posts AS p WHERE p.OwnerUserId = 22656 AND p.PostTypeId = 1 ) SELECT u.DisplayName, t1.Title, t2.Title, t3.Title, t4.Title, t5.Title FROM dbo.Users AS u LEFT JOIN top5 AS t1 ON t1.OwnerUserId = u.Id AND t1.n = 1 LEFT JOIN top5 AS t2 ON t2.OwnerUserId = u.Id AND t2.n = 2 LEFT JOIN top5 AS t3 ON t3.OwnerUserId = u.Id AND t3.n = 3 LEFT JOIN top5 AS t4 ON t4.OwnerUserId = u.Id AND t4.n = 4 LEFT JOIN top5 AS t5 ON t5.OwnerUserId = u.Id AND t5.n = 5 WHERE t1.OwnerUserId IS NOT NULL;
An utter disaster, predictably:
We hit the Posts table a total of five times, or once for each reference back to the original common table expression.
This is not a good use of a common table expression, and is a pattern in general to avoid when using them.
Think of common table expressions sort of like ordering a Rum Martinez. You might be happy when the results eventually show up, but every time you say “Rum Martinez”, the bartender has to go through the whole process again.
There’s no magickal pitcher of Rum Martinez sitting around for your poor bartender to reuse.
That’s called a Shirley Temp Table.
Pivot Peeve
This particular query could use a temp table to materialize the five rows, and re-joining to that would be cheap and easy, even five times, since it’s only five rows going in.
WITH top5 AS ( SELECT p.*, n = ROW_NUMBER() OVER ( PARTITION BY p.OwnerUserId ORDER BY p.Score DESC ) FROM dbo.Posts AS p WHERE p.OwnerUserId = 22656 AND p.PostTypeId = 1 ) SELECT t.OwnerUserId, t.Title, n INTO #top5 FROM top5 AS t WHERE t.n <= 5;
You could also also just PIVOT this one, too:
WITH u AS ( SELECT TOP (5) u.DisplayName, p.Title, n = ROW_NUMBER() OVER (ORDER BY p.Score DESC) FROM dbo.Posts AS p JOIN dbo.Users AS u ON p.OwnerUserId = u.Id WHERE p.OwnerUserId = 22656 AND p.PostTypeId = 1 ORDER BY p.Score DESC ) SELECT p.* FROM u AS u PIVOT ( MAX(Title) FOR n IN ([1], [2], [3], [4], [5]) ) AS p;
For all the problems PIVOT can cause when misused, this is a full 19 seconds faster than our most precious common table expression query.
With a half-decent index, it’d probably finish in just about no time.
I’d take this instead any day.
A Note On Recursion
There may be times when you need to build a recursive expression, but you only need the top N children, or you want to get rid of duplicates in child results.
Since you can’t use DISTINCT, TOP, or OFFSET/FETCH directly in a recursive common table expression, some nesting is required.
Of course, we can’t currently nest common table expressions, and to be clear, I think that idea is dumb and ugly.
If Microsoft gives us nested common table expressions before materialized common table expressions, I’ll never forgive them.
WITH postparent AS ( SELECT p.Id, p.ParentId, p.OwnerUserId, p.Score, p.PostTypeId, Depth = 0, FullPath = CONVERT(varchar, p.Id) FROM dbo.Posts AS p WHERE p.CreationDate >= '20131229' AND p.PostTypeId = 1 UNION ALL SELECT p2.Id, p2.ParentId, p2.OwnerUserId, p2.Score, p2.PostTypeId, p2.Depth, p2.FullPath FROM ( SELECT p2.Id, p2.ParentId, p2.OwnerUserId, p2.Score, p2.PostTypeId, Depth = pp.Depth + 1, FullPath = CONVERT(VARCHAR, CONCAT(pp.FullPath, '/', p2.Id)), n = ROW_NUMBER() OVER (ORDER BY p2.Score DESC) FROM postparent pp JOIN dbo.Posts AS p2 ON pp.Id = p2.ParentId AND p2.PostTypeId = 2 ) AS p2 WHERE p2.n = 1 ) SELECT pp.Id, pp.ParentId, pp.OwnerUserId, pp.Score, pp.PostTypeId, pp.Depth, pp.FullPath FROM postparent AS pp ORDER BY pp.Depth OPTION(MAXRECURSION 0);
To accomplish this, you need to use a derived table, filtering the ROW_NUMBER function outside of it.
This is a more common need than most developers realize when working with recursive common table expressions, and can avoid many performance issues and max recursion errors.
It’s also a good way to show off to your friends at disco new wave parties.
Common Table Ending
Common table expressions can be handy to add some nesting to your query so you can reference generated expressions in the select list as filtering elements in the where clause.
They can even be good in other relatively simple cases, but remember: SQL Server does not materialize results, though it should give you the option to, and the optimizer should have some rules to do it automatically when a common table expression is summoned multiple times, and when it would be safe to do so. I frequently pull common table expression results into a temp table, both to avoid the problems with re-referencing them, and to separate out complexity. The lack of materialization can be hell on cardinality estimation.
In complicated queries, they can often do more harm than good. Excuses around “readability” can be safely discarded. What is “readable” to you, dear human, may not be terribly understandable to the optimizer. You’re not giving it any better information by using common table expressions, nor are you adding any sort of optimization fence to any queries in them without the use of TOP or OFFSET/FETCH. Row goals are a hell of a drug.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.