YouTube Days: A Little About Common Table Expressions In SQL Server

A Little About Common Table Expressions In SQL Server


I’m working on ramping up my recording and streaming setup now that life has settled down a little bit, and publishing the results to YouTube while I work out the wrinkles. Enjoy some free video content!

Video Summary

In this video, I dive into the world of Common Table Expressions (CTEs) in SQL Server and share my perspective on why they often don’t live up to their hype. I explain that while CTEs can be useful for certain complex queries, such as cascading filters or generating row numbers, they are not a silver bullet for improving query readability or performance. Instead, I emphasize the importance of proper formatting and logical query processing in making your SQL code more readable. The video covers various scenarios, including how CTEs interact with indexes, top syntax, and multiple references within queries, ultimately concluding that while CTEs can be handy in specific situations, they should not be relied upon for significant performance gains or to make queries magically easier to read.

Full Transcript

E-Dot Darling here, it’s my cool new rap name, with Darling Data. And today we are going to talk about CTE in SQL Server. The reason we’re going to talk about CTE in SQL Server is because I am absolutely sick of CTE in SQL Server. They don’t make queries more readable. Formatting does. Formatting makes queries very readable. especially if you don’t put leading commas in your queries. And they don’t have any magical performance, impact, benefit. They are not good, great, wonderful, spectacular, superb, extraordinary. They are none of these things. There is not a… they’re just exhausting at this point. They… they… they… they tire me. If you ever wonder why I look or sound so tired, you can probably blame CTE. All right. So let’s run this query and let’s get a query plan just to set a baseline, right? So we get the estimated plan for this. And now this is what I would call a fairly typical user query where we access one index on a table. You may have seen other videos of mine recently where we talked about how multiple indexes on a single table can be used and can be used all together.

to make a query happen. All right. You can have key lookups. You can have index intersection. You can have index union. And all of those things can use multiple indexes on the same table to make… make… make it a query happen. But in this case, we’re just touching one index to get our data for this query. All right. Now, where CTE don’t completely befuddle, baffle, and… what’s another good B word? I can’t think of one. If you think of one, and leave one in the comments, where I don’t hate them is when you sort of like cascade stuff down through a CTE. So like in this case, SQL Server is smart enough to sort of optimize away the fact that we have this thing in the middle. All right. So if we run this query, or let’s just get the estimated plan, we only touch the user’s table once in here. Okay. Fair enough. SQL Server is pretty smart with that.

You can even do slightly more complicated things and still have SQL Server be fairly smart about it. So here we’re going to have one CTE where we select this, another CTE where we select this from the preceding CTE, and then a third CTE, which I just noticed I have misaligned slightly, where we select from the preceding CTE and apply another filter. So did I do that? All right. I did that wrong. Let’s see. Actually, no, that looks a little funny. And we’ll fix that later. So if we get the estimated plan for this, SQL Server is again, smart enough to sort of abstract away those useless abstractions that you have stuck into your query, and just give us one single solitary index seek, where all of the predicates that we are, come on, tooltip, you are being so good there, where all of the predicates that we care about, are applied to that one axis of the index. We have our seek predicate to the ID, right, because that’s the clustered primary key of the table, we can seek right to that. Then we have a couple residual predicates before I lost my finger in a tragic green screen accident, where we look at the date filter, where we look at the date filter, where we look at the date filter and the display name filter that we applied in the other CTE.

Where things change a little bit with CTE is when we introduce the top syntax to our CTE. Now, this one here isn’t going to force SQL Server to rerun the initial syntax multiple times, but it is going to change the query plan in sort of in a weird way. So, if I get the estimated plan for this, it looks kind of weird now, doesn’t it? Looks kind of funny, a bit unexpected. We have this series of top syntax. We have this series of filters where different predicates are applied. Now, when we touch the clustered index, we still apply that same initial seek predicate, right?

We still seek to where id equals 1. But because we introduced top and we set what’s called a row goal inside of each one of the CTE that we executed, or that we, I mean, they all execute together, kind of. We have a top and then a filter and then a top and then a filter and then another top. So, we have the three tops and two filters. The two filters are going to be on the additional predicates that we applied.

And thank you, SQL Server Management. So, I must have made Aaron Stellato a real mad SQL Server Management Studio is stabbing me in the back today. Today, we have the first filter, which hits the predicate on creation date. And then we have another filter, which applies the predicate on display name, where that equals community over there. So, introducing top into your CTE sort of introduces a bit of a fence because of the row goals that get introduced there.

A slightly more verbose example of that would be these CTE that we’re going to look at next. Now, in the first one, we are joining users to comments right there. And then in the second one, sorry, not the third one, there is only two. There is no three, there is only two. We are joining the post table and the votes table, right? And the post table gets joined both to the comments table and the users table from the first comment table expression.

And then the votes table we’re joining to the post table, right? So, if we look at the query plan for this, we’re going to see that SQL Server’s cost-based optimizer, cost-based query optimizer, was free to rearrange the joins to suit its own cost-based needs. All right. So, have this, it joins comments to votes over here. It joins the post table down over here. And then it joins the users table up over here. All right. Now, if we were to go and stick a top in our first comment table expression only, all right, because I don’t want to add a bunch of tops in, too many, too many tops, spoil the broth.

If we add a top in here, but not here, we’ll see the query plan change and the join between users and comments will be fenced off. And what I mean by that is now we have this top operator here and we have the join between users and comments behind that top over here. Now, sometimes it’s tough to get behind a top, but in this case, it worked out pretty well.

All right. So, we have our top operator and we have users and comments over here. And then SQL Server was free to join votes and posts and whatever order it chose in the second comment table expression. So, the top operator will fence off, at least in the current iteration of SQL Server’s cost-based optimizer.

The top operator will fence off things inside of a comment table expression, but neither a comment table expression nor a comment table expression with a top inside of it will materialize the query inside of the comment table expression. Where comment table expressions start running into problems is when you start referencing them multiple times in your query.

Now, I showed you where, like, if you stack them and sort of run down through them, things can turn out okay. But let’s look at what happens when you reference CTE multiple times, sort of without that stacking effect. All right. So, the first execution plan that we get here, back to baseline, where we only seek into the user’s table once.

Now, if we join that CTE to itself, we are now going to have two seeks into the user’s table, right? Because we had to rerun the expression inside of that common table twice. It’s actually not really a table at all.

It’s more like a tabular, right? It’s a common tabular expression because we are not materializing this result anywhere. And if we do that a third time, we will now see that we touched the user’s table three times.

We have the one, we have the two, we have the three. I think that a kick in the head. I think an easier way to sort of get across what happens when you do that is if we were to just union all and sort of explicitly select from the CTE twice, we would go back to seeing the two accesses of the user’s table with the concatenation operator over here, which unioned all of those two results.

So the same thing would happen if we did union. But just think of it as just like this query, union all, this query again, right? Because even though you make this reference once here, each reference out here means you have to rerun this expression.

All right? Now, where CTE or derived tables or anything like that can be useful is when you need to do something in a query that you can’t do in a single step. A pretty common thing would be to generate, like if you wanted to find duplicate results or you wanted to find like the first result, using something like row number is a really common way to do that.

The thing is you can’t filter on row number here, right? You can’t say something like where n equals zero because n is in the select list. And the select list, if you’re familiar at all with logical query processing, happens way at the end.

And the where clause, join clause, stuff like that occurs way earlier when the query is built, when the shape of the query is generated. So we can’t filter on this here, right? We’ll just get an error, an invalid column name n.

So we have to do stuff like this outside of a CTE, right? So if we run this, it is perfectly valid syntax. Now, this sort of gets into other, you know, sort of query tuning dilemmas and things that you might run across in query plans.

But this is one of those cases where when I see a filter operator, I understand why it’s there and I’m not angry at it. Because we have to generate that expression for the row number before we can filter on it. So we have to do all this work to get all the data we want.

And in this case, well, I mean, for various reasons. I’m on SQL Server 2022. I am in compat level 160. And so I’m getting the cool feature batch mode on rowstore automatically.

I mean, that’s been available since SQL Server 2019. But it’s only Enterprise Edition. I’m using Developer Edition.

So I get all the Enterprise Edition features, which is why Developer Edition is like a great gateway drug for Microsoft. Because you start doing all this development in Developer Edition. And you’re like, wow, what great performance I have.

And then you move it over to standard, like your production environments in Standard Edition. And all your queries are complete stink bombs. And Microsoft’s like, well, okay, well, just give us $5,000 more core and you’re all set. Our dreams of a feature flag for Developer Edition to remove Enterprise Edition features is long in the making.

It doesn’t exist, but I’m sure they’re working hard on it so you don’t run into that. This awful conundrum, quandary, conundrum. It doesn’t matter.

Anyway, comment table expressions can be useful when you have to do stuff like this. Because you couldn’t do this all in a single step. Just don’t expect doing this to have any magic performance benefits for you.

Comment table expressions just aren’t helpful in that way. All right. Cool.

Hopefully, I have absolved and alleviated you of all your misconceiverations about CTE. They truly do not help performance in any way just for existing. They don’t make queries any more readable just for existing.

Again, query formatting is more important for making a query readable than jamming a completely unreadable query inside of a CTE. Anyway, I hope you enjoyed yourselves. I hope you learned something.

I will see you in the next video, which temporally could happen at any moment now. It may have already happened by the time I say this, which is really crazy. It’s just impossible to tell.

Anyway, thank you for watching. And I’ll catch smell you later, stinky pants.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.