Join Simplification In SQL Server
Video Summary
In this video, I delve into the fascinating world of joint simplification within SQL Server execution plans. I explore scenarios where joins cannot be simplified due to potential null values and unique key constraints, as well as cases where they can be pruned out entirely when no additional rows are projected from an outer-joined table. The discussion is complemented by detailed walkthroughs of various query examples, demonstrating how the SQL Server optimizer identifies opportunities for simplification based on the query’s structure and requirements. I also share some personal insights about setting up a green screen for video recording, navigating through technical challenges like allergies that can disrupt even the best-laid plans, and the importance of maintaining clear communication with viewers despite any minor distractions.
Full Transcript
Hello! Welcome back! Erik Darling here with Darling Data. I know that you probably can’t tell by looking behind me, but I have a freshly steamed green screen with no weird artifacts, at least at the moment. We’ll see what happens. We’ll see if the lighting in here cooperates. Right now I have basically the equivalent of a full tanning bed blasting light at this thing. So hopefully, aside from some weird artifacts, where my tattoo colors match the color of the green screen chroma key stuff, we can avoid any unpleasantness. Sorry if that’s distracting. Anyway, I wanted to talk a little bit today about joint simplification in SQL Server. I was going to record this yesterday, but I got terribly, horribly sidelined by allergies, which you can probably still hear some remnants of from this absurdly large nose of mine. So, there are of course rules to when things can be simplified. There are a number of other simplification rules and steps outside of joins, but I think joins are probably a pretty interesting one to start with, at least for now. I’m not going to fix that in post. You can just deal with it.
So, one time when joins can’t be simplified, and I’m going to explain simplified as we sort of look at execution plans, is when we, I mean we can just get an estimated plan for this one. Right? So, what happens here is because this is a right outer join, and this could happen in a situation where the optimizer rewrites a query to use a right join rather than a left join. It can reorder and reorganize all the things that it wants. But because we do this, and the results that we get here add nulls to what comes out of the users table, right? This u.id column that we’re selecting.
We get this query plan over here, where we have to touch both the votes table and the users table. SQL Server couldn’t simplify that away because we were going to get additional results back from the users table. So, it added nulls to the output where nulls didn’t exist before. The column that we’re selecting here and the column that we’re joining to here, that’s the primary key. It’s a clustered index of the users table. So, no nulls are allowed to exist in there. Another place where SQL Server is allowed to simplify joins is when we don’t select, we’re not projecting any columns from a table that is outer join to, and we wouldn’t get any duplicated rows.
Now, this is a fake join because this isn’t actually the relationship between posts and users. I’m joining on two unique columns, the id column in both tables, again, primary key clustered index. So, SQL Server knows that there’s not going to be a many-to-many relationship here. If we get the estimated plan for this query, you’ll notice that unlike the query plan for the last query that we looked at, sorry, that jumped around a little bit. And this one we only touch the users table, right? So, that join to the post table is completely taken out of the query optimization steps.
It’s pruned out, as smart people might say. Now, if we look at this query, which is an inner join from the users table to the users table, which I know looks a little weird, but what we’re going to do is run this and get the estimated plan. And SQL Server is actually not free to simplify this one. We touch both tables. The reason why is because we used an inner join, and an inner join might actually eliminate rows. This one’s a little funny because, you know, we’re joining the table to itself on its own primary key.
So, maybe simplification could be a little smarter. I don’t know. I don’t think I like this one very much. But in this example, where we wouldn’t actually eliminate any rows because we have an additional condition here, or rather we’re doing a left outer join here, what we’re going to see is SQL Server only hit the users table once on that one. SQL Server can also apply that to much larger queries.
So, I’m going to run this whole thing. And the only part of this that really, really matters is way down at the bottom. And I do apologize to the greater SQL Server community for having that column up there has a leading comma, but I only do that to make life easier when I need to quote it out. So, you can deal with it just for this one query. And if we get the estimated plan for this one, we get this whole gigantic query plan back.
If we say zoom to fit, you can sort of start to grasp the absolute magnitude of this thing. Every single one of those derived left joins that we do throughout the entire query is part of the query plan. SQL Server hits all of these things. If we change this query just a touch, and again, apologize for the leading comma.
Hopefully no one beats me up at the next conference. We quote that out and we rerun this whole entire thing just like before. And we get the, well, we’re not going to run it, but we are going to get the estimated plan for it. The estimated plan this time around is much simplificationed, simplified, because we only touch the users table this time.
We don’t do all the other joints. We don’t do all the other work. So, SQL Server’s query optimizer, when it’s looking at the query that you send into it, one thing that it’s going to do is try to find things that it doesn’t have to do. The optimizer is lazy, just like me. And if it doesn’t have to do some work because it just doesn’t need to, it’s going to find a way to not do that work.
Query simplification can apply in a lot of places. Part of it would be like contradiction detection, right? So if you have a query that’s like where ID equals one and ID equals two, SQL Server is going to say, well, ID can’t be one and two at the same time. Even if you’re one of those nudge nicks that does like a comma separated list in an ID count, it can’t be one and two simultaneously.
So it would just give you a constant scan and say, guess what? Your query didn’t give me anything. So that’s fun. And I don’t know. I think that’s it. This is sort of another little test run video with the new setup. Make sure audio comes through well.
Make sure video comes through well. Make sure that the green screen is still mostly functioning, except where it looks like I have holes in my arm over here, which is interesting anyway. Well, I don’t know. That’s it. I’m going to go blow my nose. These allergies are awful. Anyway, thank you for watching. I hope you learned something.
I hope you enjoyed yourselves and I will see you soon, soonly in another video, hopefully with my simplified recording setup. Thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.