Fendi Bendi
Video Summary
In this video, I dive into an intriguing limitation of implied predicates in SQL Server. Typically, SQL Server is quite adept at leveraging a predicate across multiple parts of a query to optimize performance. However, I explore a scenario where the presence or absence of certain predicates can significantly impact the execution plan and runtime. By analyzing a specific example, we uncover how the optimizer’s handling of join predicates within hash joins can lead to suboptimal plans, even when the data filtering seems straightforward. This discussion highlights the importance of residual predicates in query optimization and offers insights into why sometimes adding an index or restructuring a query can dramatically improve performance.
Full Transcript
In this exciting, scintillating, thrilling video, we will be discussing some, well I think is a pretty interesting limitation of implied predicates in SQL Server. Normally, SQL Server is quite good at taking a predicate and using it in as many places as possible to to make a query fast. Let’s take this query for example. Right? Now, the results of this thing are absolute garbage. There is nothing useful about this. But the query plan has a few useful things in it. One, we have a very rare impact of 100 on the nose for a missing index request. Fortunately, this query already finishes relatively quickly and already has a fairly low cost. But one, one might wonder, if we add that index, would it truly cost zero to run this query? Interesting, interesting question. One that we’re not going to answer today. We’re going to leave that there. Another thing that this query plan has going for it is this little red X on the nested loops join. And a warning that comes along with it saying that there is no join predicate.
Boy, oh boy. Boy, oh boy. I know what you’re thinking. Erik Darling has finally lost his damn mind. He has finally gone completely soft after four years of quarantine. He’s finally lost his mind. But no, it’s not me, it’s you. You. What happened? Well, if we look at this side, look at this clustered index scan, we have a predicate where user ID equals 22656. And if we look over here, we’ll have another one. We’re having a stubborn tool tip today. But if we look over here at this clustered index scan, we will also, oh, we were there. We will also have a predicate of 22656. Meaning that any row that leaves here or here will already be user ID 22656. So yes, there is no join predicate here, but every row is going to match anyway. So yes, there is no join predicate here. So let’s take a break. Let’s take a break.
I don’t know. Let’s take a break. So let’s take a break. That’s just because that’s what we were looking for. P. Owner user ID equals 22656. And we’re joined on user ID. So what could possibly go wrong? I appreciate how well intentioned that warning is but I think think maybe someone other than the summer intern should have worked on it because there’s some fault there’s some fault in those stars now that’s good right that’s that’s a good trick yeah we we used a predicate in the where clause to filter both sides of the join that’s that’s good news to me I like I like the way you move SQL Server this one’s a little bit different though so here we’re joining comments and votes and we’re joining them on post ID and we have a filter on where post ID equals 138 or the user ID in the votes table equals 831 now something kind of sneaky here is that user ID exists in both tables and it’s going to be user ID is also a match for between them so that’s that’s cool but let’s look at this query in this query plan and and try to figure out why it runs for like eight seconds to figure this figure this out to count 41 rows well over in the over in the query plan everything is kind of okay here and everything is I mean not great here now we have enough for those those unfortunate repartition streams but everything gets really wonky here why is it taking that long to do a join why our joins slow good lord everything that access consultant said was true joins are slow we should just move to excel files we put them in Hadoop and never have to join them but let’s look at what happened inside of this job we have a hash keys probe on post ID it’s kind of fairly fairly straightforward because that’s what we were joining on but then look at this probe residual that also was on post ID so we had to residually probe we had to make we had to make double extra sure that the hashing algorithm was correct and then we also have another another set of residuals on c dot post ID equals 138 or v dot user ID equals 831 now you might be wondering why we didn’t have those residual predicates over here right because that could be there and that could be there I don’t know I think that’s a little weird but let’s look at what happens if we change this query a little bit so let’s look at what happens when we put both predicates on the votes table so where v dot post ID equals 138 or v dot user ID equals 831 and let’s see what happens now we run this we get our 41 records back and we get them back very very quickly now we don’t have that big embarrassing hash join do we no we have a lot of an index seek into an apparently useful index on the votes table and another index seek into an apparently useful index on the votes table don’t let no one tell you Eric darling can’t index his way into a good demo it’s the one thing one thing I can do is get you like that man cap and then we have a concatenation of those two indexes and then a sort of a but but none of this is none of this is taking any time to do it right now we have a lot of time look at all those zeros look at all those wonderful goose eggs it’s like watching Nolan Ryan pitch doesn’t get much better than that and we have an index seek here more zeros there are zeros all the way to the end and despite all those zeros we still have a cost of go point something so I don’t know what maybe maybe we should look at that that that index on the post table query again I don’t know it might be fun might be interesting so so that’s nice and again I see this in a lot of entity framework queries where it is just doing something very very goofy with the where clause and things are things are turning out badly for people I don’t like it now or conditions kind of get a bad rap in general if this or were inside the join I would I would be nervous as heck and that’s that’s sort of what happened in the slow queries that that or clause got moved inside of the join now normally you might see like if it was written like actually handwritten inside the join you might see like some spooling or something going on in the plan but this one is just has this weird confusing residual predicates at the hash join so be mindful of those when one is query tuning always pay attention to the residual predicates inside of a hash join because you can find some interesting things inside there now the optimizer is generally pretty good at figuring this stuff out just not really in this case anyway I thought that was interesting hope you did too and I will see you over in the next video but whenever that is who knows it’s already been 25 years since I started recording these yes yes yes oh yes
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Related Posts
- The SQL Server Performance Tasting Menu: Performance Issues With Parallel Query Plans
- The SQL Server Performance Tasting Menu: How DISTINCT Queries Can Hurt Performance
- The SQL Server Performance Tasting Menu: How You Compare Date Columns Can Hurt Query Performance
- The SQL Server Performance Tasting Menu: How Multi-Statement Table Valued Functions Hurt Performance