The How To Write SQL Server Queries Correctly Cheat Sheet: Common Table Expressions

Followers


Common table expressions remind me of clothes in the 70s. A bunch of people with no taste convinced a bunch of people with no clue that they should dress like them, and so we got… Well, at least we got the 80s afterwards.

The big draw with common table expressions is that they filled in some blanks that derived tables left unanswered.

  • The first problem with common table expressions is that most people use them like nose and ear hair trimmers: they just sort of stick them in and wave them around until they’re happy, with very little observable feedback as to what has been accomplished.
  • The second big problem with common table expressions is that the very blanks they were designed to fill in are also the very big drawbacks they cause, performance-wise. Sort of like a grand mal petard hoisting.

To bring things full circle, asking someone why they used a common table expression is a lot like asking someone why they wore crocheted bell bottoms with a velour neckerchief in the 70s. Someone said it was a good idea, and… Well, at least we got the 80s afterwards.

Much like joins and Venn diagrams, anyone who thinks they have some advanced hoodoo to teach you about common table expressions is a charlatan or a simpleton. They are one of the least advanced constructs in T-SQL, and are no better or worse than any other abstraction layer, with the minor exception that common table expressions can be used to build recursive queries.

Other platforms, enviably, have done a bit to protect developers from themselves, by offering ways to materialize common table expressions. Here’s how Postgres does it, which is pretty much the opposite of how SQL Server does it.

By default, and when considered safe, common table expressions are materialized to prevent re-execution of the query inside them.

You can force the issue by doing this (both examples are from the linked docs):

WITH w AS MATERIALIZED (
    SELECT * FROM big_table
)
SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
WHERE w2.key = 123;

Or go your own way and choose to not materialize it:

WITH w AS NOT MATERIALIZED (
    SELECT * FROM big_table
)
SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
WHERE w2.key = 123;

You don’t get those options in SQL Server as of this writing, which really sucks because developers using other platforms may have certain expectations that are, unfortunately, not met.

Likewise, other sane and rational platforms use MVCC (optimistic locking) by default, which SQL Server does not. Another expectation that will unfortunately not be met for cross-platform developers.

Common Table Cult


The amount of developer-defense that common table expressions get is on par with the amount of developer-defense that table variables get.

It’s quite astounding to witness. How these things became such sacred cows is beyond me.

First, there are times when using a common table expression has no impact on anything:

WITH
    nocare AS
(
    SELECT
        u.*
    FROM dbo.Users AS u
    WHERE u.Reputation > 999999
)
SELECT
    n.*
FROM nocare AS n;


WITH
    nocare AS
(
    SELECT
        u.*
    FROM dbo.Users AS u
)
SELECT
    *
FROM nocare AS n
WHERE n.Reputation > 999999;

SQL Server is at least smart enough to be able to push most predicates used outside of common table expressions up into the common table expression.

One example of such a limitation is when you put a windowing function into a common table expression:

WITH 
    nocare AS 
(   
    SELECT   
        v.*,
        n = 
            ROW_NUMBER() OVER 
            ( 
                PARTITION BY 
                    v.UserId
                ORDER BY 
                    v.CreationDate 
            )
    FROM dbo.Votes AS v 
)
SELECT  
    n.*
FROM nocare AS n
WHERE n.VoteTypeId = 8
AND   n.n = 0;

If VoteTypeId were in the PARTITION BY clause of the windowing function, it could be pushed into the common table expression portion of the query. Without it in there, it has to be filtered out later, when the where clause also looks for rows numbered as 0.

But this does bring us to a case where common table expressions are generally okay, but would perform equivalently with a derived table: when you need to stack some logic that can’t be performed in a single pass.

Using a common table expression to filter out the results of a windowing function just can’t be done without some inner/outer context.  Since objects in the select list are closer than they appear, you can’t reference them in the where clause directly.

I’m fine with that, as shown in the example above.

Common Stacks


Stacked common table expressions are also “fine” up to a point, and with caveats.

One of my absolute favorite paging methods uses stacked common table expressions.

DECLARE
    @page_number int = 1,
    @page_size int = 100;

WITH
    f /*etch*/ AS
(
    SELECT TOP (@page_number * @page_size)
        p.Id,
        n =
            ROW_NUMBER() OVER
            (
                ORDER BY
                    p.Id
            )
    FROM dbo.Posts AS p
    ORDER BY
        p.Id
),
    o /*ffset*/ AS
(
    SELECT TOP (@page_size)
        f.id
    FROM f
    WHERE f.n >= ((@page_number - 1) * @page_size)
    ORDER BY
        f.id
)
SELECT  
    p.*
FROM o
JOIN dbo.Posts AS p
  ON o.id = p.Id
ORDER BY
    p.Id;

The reason why this is okay is because each common table expression has a single reference. There are two points in the query plan where data is acquired from the Posts table.

sql server query plan
uno dos!

Where things get tricky is when you keep doing it over and over again.

Attack Stacks


Take a query like this, and imagine what the query plan will look like for a moment.

WITH
    top5 AS
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = 22656
    AND   p.PostTypeId = 1
)
SELECT
    u.DisplayName,
    t1.Title,
    t2.Title,
    t3.Title,
    t4.Title,
    t5.Title
FROM dbo.Users AS u
LEFT JOIN top5 AS t1
  ON  t1.OwnerUserId = u.Id
  AND t1.n = 1
LEFT JOIN top5 AS t2
  ON  t2.OwnerUserId = u.Id
  AND t2.n = 2
LEFT JOIN top5 AS t3
  ON  t3.OwnerUserId = u.Id
  AND t3.n = 3
LEFT JOIN top5 AS t4
  ON  t4.OwnerUserId = u.Id
  AND t4.n = 4
LEFT JOIN top5 AS t5
  ON  t5.OwnerUserId = u.Id
  AND t5.n = 5
WHERE t1.OwnerUserId IS NOT NULL;

An utter disaster, predictably:

sql server query plan
moo 🐮

We hit the Posts table a total of five times, or once for each reference back to the original common table expression.

This is not a good use of a common table expression, and is a pattern in general to avoid when using them.

Think of common table expressions sort of like ordering a Rum Martinez. You might be happy when the results eventually show up, but every time you say “Rum Martinez”, the bartender has to go through the whole process again.

There’s no magickal pitcher of Rum Martinez sitting around for your poor bartender to reuse.

That’s called a Shirley Temp Table.

Pivot Peeve


This particular query could use a temp table to materialize the five rows, and re-joining to that would be cheap and easy, even five times, since it’s only five rows going in.

WITH
    top5 AS
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = 22656
    AND   p.PostTypeId = 1
)
SELECT
    t.OwnerUserId,
    t.Title,
    n
INTO #top5
FROM top5 AS t
WHERE t.n <= 5;

You could also also just PIVOT this one, too:

WITH
    u AS 
(
    SELECT TOP (5)
        u.DisplayName,
        p.Title,
        n = ROW_NUMBER() OVER (ORDER BY p.Score DESC)
    FROM dbo.Posts AS p
    JOIN dbo.Users AS u
      ON p.OwnerUserId = u.Id
    WHERE p.OwnerUserId = 22656
    AND   p.PostTypeId = 1
    ORDER BY
        p.Score DESC
)
SELECT
    p.*
FROM u AS u
PIVOT
(
    MAX(Title)
    FOR n IN ([1], [2], [3], [4], [5])
) AS p;

For all the problems PIVOT can cause when misused, this is a full 19 seconds faster than our most precious common table expression query.

With a half-decent index, it’d probably finish in just about no time.

sql server query plan
PIVOT TIME!

I’d take this instead any day.

A Note On Recursion


There may be times when you need to build a recursive expression, but you only need the top N children, or you want to get rid of duplicates in child results.

Since you can’t use DISTINCT, TOP, or OFFSET/FETCH directly in a recursive common table expression, some nesting is required.

Of course, we can’t currently nest common table expressions, and to be clear, I think that idea is dumb and ugly.

If Microsoft gives us nested common table expressions before materialized common table expressions, I’ll never forgive them.

WITH 
    postparent AS 
(
    SELECT 
        p.Id, 
        p.ParentId, 
        p.OwnerUserId,
        p.Score,
        p.PostTypeId,
        Depth = 0,
        FullPath = CONVERT(varchar, p.Id)
    FROM dbo.Posts AS p
    WHERE p.CreationDate >= '20131229'
    AND   p.PostTypeId = 1

    UNION ALL

    SELECT
        p2.Id,
        p2.ParentId,
        p2.OwnerUserId,
        p2.Score,
        p2.PostTypeId,
        p2.Depth,
        p2.FullPath
    FROM
    (
        SELECT
            p2.Id, 
            p2.ParentId, 
            p2.OwnerUserId,
            p2.Score, 
            p2.PostTypeId,
            Depth = pp.Depth + 1,
            FullPath = CONVERT(VARCHAR, CONCAT(pp.FullPath, '/', p2.Id)),
            n = ROW_NUMBER() OVER (ORDER BY p2.Score DESC)
        FROM postparent pp
        JOIN dbo.Posts AS p2
          ON pp.Id = p2.ParentId
        AND p2.PostTypeId = 2
    ) AS p2
    WHERE p2.n = 1
)
SELECT 
    pp.Id,
    pp.ParentId,
    pp.OwnerUserId,
    pp.Score,
    pp.PostTypeId,
    pp.Depth,
    pp.FullPath
FROM postparent AS pp
ORDER BY
    pp.Depth
OPTION(MAXRECURSION 0);

To accomplish this, you need to use a derived table, filtering the ROW_NUMBER function outside of it.

This is a more common need than most developers realize when working with recursive common table expressions, and can avoid many performance issues and max recursion errors.

It’s also a good way to show off to your friends at disco new wave parties.

Common Table Ending


Common table expressions can be handy to add some nesting to your query so you can reference generated expressions in the select list as filtering elements in the where clause.

They can even be good in other relatively simple cases, but remember: SQL Server does not materialize results, though it should give you the option to, and the optimizer should have some rules to do it automatically when a common table expression is summoned multiple times, and when it would be safe to do so. I frequently pull common table expression results into a temp table, both to avoid the problems with re-referencing them, and to separate out complexity. The lack of materialization can be hell on cardinality estimation.

In complicated queries, they can often do more harm than good. Excuses around “readability” can be safely discarded. What is “readable” to you, dear human,  may not be terribly understandable to the optimizer. You’re not giving it any better information by using common table expressions, nor are you adding any sort of optimization fence to any queries in them without the use of TOP or OFFSET/FETCH. Row goals are a hell of a drug.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.



6 thoughts on “The How To Write SQL Server Queries Correctly Cheat Sheet: Common Table Expressions

  1. Hair trimmers, 70s/80s, charlatans and simpletons, what an opening. I always appreciate the wit and creative writing in your posts which is second to none and always enjoyable. After I got done feeling all nostalgic, I realized there’s some solid SQL Server educational content in there as well. Another great post Erik.

Comments are closed.