How To Write SQL Server Queries Correctly: Joins With OR Clauses Part 1
Thanks for watching!
Video Summary
In this video, I delve into the intricacies of joins with OR clauses in SQL Server, demonstrating how they can sometimes lead to unexpected behavior and performance issues. Starting with a simple inner join setup between comments and posts tables, I illustrate how SQL Server infers join conditions from the WHERE clause, making the query efficient when both columns are involved. However, introducing an OR condition that splits the join across different tables causes SQL Server to struggle, resulting in slower execution plans. To address this, I show a workaround by adjusting the OR clause to involve only one table, allowing SQL Server to seek into indexes more effectively and improve performance. This video is part one of a series on joins with OR clauses, setting up the groundwork for exploring more complex join scenarios like left joins in future parts.
Full Transcript
Erik Darling here with Darling Data. Got a barbell next to a dumbbell. Anyway, today’s video we’re going to talk about joins with or clauses. I’m actually going to do this in two pieces. The first piece just kind of sets it up a little bit and gets some of the easier stuff out of the way. And then the second part we’ll go into more detail. And I tried to do it in one and it just went on too long. Even for me, I actually apologized to my own self for my length for once. So that was a fun change of pace. If you would like to support this channel by giving a dollar sign a finger mustache, you can click the link down in the video description. For as little as $4 a month, you can support me by giving a dollar sign a finger mustache. You can click the link down in the video description for as little as $4 a month. And I’ll see you next time.
Training. You probably need it. If you’re here, you definitely need it. You’re probably not getting enough of it. It’s like vitamins. You can get all of mine for about $150 by using that coupon code at that website up there. That’s about it. There’s also a link for that down in the video description. It’s very good stuff. Paul White Tech reviewed it. So I don’t know. Maybe that’ll convince you. No upcoming events until 2025. Ba-ba-ba-ba-ba-ba. With that out of the way, let’s talk about joins with OR clauses. So SQL Server has funny ways of handling OR expressions sometimes. Sometimes it can infer certain things depending on how you set your query up. And other times it cannot.
So I think a good way to start this off is by showing you this query. The top one is just there to get a quick result back. But we have a join between comments and posts. And the join is on p.ownerUserId equals c.userId. And we have a where clause to say where ownerUserId equals 22656. If we run this query, SQL Server is able to infer that we meant that we only want to see userId 22656 or we only care about ownerUserId 22656.
Because what happens is SQL Server tells us we don’t have a join predicate. We get this stupid little warning right here. And it says, yeah, guess what? No join predicate. Shame on you. You’re an idiot. The number of times I’ve seen this in a query plan and been like, huh, do we really not have a join predicate?
And been like, oh, no, no, no, we totally have a join predicate is pretty high at this point. Microsoft summer interns. What would you do without them? But if we look a little bit more closely at the plan, we will see a predicate on this side of the join that says where ownerUserId equals 22656.
And we will see a predicate on this side of the join that says where userId equals 22656. And so all of the results that come back from the post table and from the comments table have a join condition or rather have a value in their join column that is the same. So there is actually no need to do anything with that.
Right. We don’t need to do any additional join conditioning. We have met the condition of our join via two when right when we touch the indexes. So that’s where SQL Server is kind of good at stuff.
You know, we touch the indexes on both sides. We do the filtering right there. And then everything that we need to mush together is the same value. They’re all it’s two sets of 22656.
And there’s like, what do you join there? It’s already joined for you. We have we have implied the join via the where clause. Where the optimizer is somewhat less good at figuring these things out is when, of course, when you get separate columns involved.
Right. So if we look at this query. Right.
And I have these quoted out for the time being for dramatic reasons. We’re joining comments to votes on c.postid equals v.postid. And we’re going to say where c.postid equals 838 or v.userid equals 831.
Right. And if we run this query. Do do.
It takes a few seconds. We get 42 rows back. SQL Server has chosen the most adaptive of joins. We scan here for about a second and a half. We scan here for about half a second.
And in total, we take about 2.6 seconds to figure this all out. Now, if we quote out these four sequence and we rerun this, SQL Server says, there is no way we could do that. It is impossible.
Why have you tried to do this to me? You are trying to bend the rules of space time. You have divided by zero. You are not fit for this planet.
You think you’ve seen too many Doctor Strange movies. But this is kind of a funny thing because when you look at what SQL Server does, look at the execution plan. And then the entire query is evaluated here at the join.
We have this whole or clause thing in here. Now, what we can do is, since we know something about our data that SQL Server doesn’t, is that if we run this and we say where we flip that so that we say where both of these columns come from the votes table. Right?
Right? Now we’re saying where V is this on both. We get the same results back, right? This is 41. 41. Da, da, da, da, da, da, da, da, da, da, da, da. And this is 41. But much more, a much quick, much more quick 41.
Now SQL Server is able to seek into two indexes on the votes table and a third index on the comments table. And we are able to get back our results. The only change I’m making is that I’m using the post ID column from the votes table instead of the post ID comment from the comments table.
Right? So before it was c.post ID and v.user ID. Now it’s v.post ID and v.user ID.
So when you comment on or you vote on something, the post ID is logged. Right? You’re going to, since we’re joining on post ID, right, it’s going to be the same for both.
We didn’t change the v.user ID. If the v.user ID were the problem, that would not give us correct results. Or rather, that would pose a different set of issues.
Sort of like what we talked about with the where or clause stuff in the last one. But if we just change v.post ID to 138, then from c.post ID equals 138, then we’re getting all the same stuff back because we’re joining on post ID. Right?
That’s all we care about. But for some reason, this or clause getting slightly different tables involved really screws SQL Server up. So when you’re writing queries, these are the little things that you need to tweak and experiment with to make sure that you are not missing anything obvious.
A lot of times when I talk about query complexity, people start picturing that big, you know, monolithic query that spans 17 scrolls of SS through SSMS. And if you make the font in SQL Server Management Studio like 4%, it’s still just like it doesn’t fit on one screen. That, of course, is a different brand of hell.
But complexity in queries can arise in many small interesting places. You know, in clauses, or clauses. Like anything that adds a little bit of drama to your queries can certainly make things somewhat strange in there.
So that’s the set things up. Now, this was fairly simple because this was an inner join with just a where clause that had an or for just, right, a simple inner join, right? Just one table to another.
The where clause was only on a couple columns with one or condition. What we’re going to look at next is more complicated join scenarios with left joins and stuff. So we have that to look forward to.
Anyway, I hope you enjoyed yourselves. I hope you learned something. And I will see you in part two of Joins with Orr clauses when we will get deeper into the subject. And I will show you more awful things that can happen and how you can fix them.
Because what’s the point in just showing you bad things that can happen if I don’t show you how to fix the bad things? It doesn’t seem like a very, it doesn’t seem like very good training if all I do is tell you that bad things happen. It sounds like, it sounds like other training I’ve seen.
Bad things happen. How do you fix it? It’s hard. Yeah, you know, life’s hard. Screw it.
All right. Thanks for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.