YouTube Days: Predicates In SQL Server Query Plans
Video Summary
In this video, I dive into the world of predicates and how they impact query performance in SQL Server execution plans. Starting off with a lighthearted anecdote about adjusting my microphone settings after receiving feedback on audio volume, I transition to a more technical discussion. I explore different types of predicates, such as seeking directly to primary key values for efficient index usage, versus scanning entire tables when indexes are absent or not well-suited. The video delves into the nuances of equality and inequality predicates, demonstrating how SQL Server handles seek and residual operations in various scenarios. By walking through practical examples and execution plans, I aim to provide a clear understanding of why certain queries perform better than others and offer insights on designing effective indexes that align with query patterns.
Full Transcript
Erik Darling here with Erik Darling Data, fresh off reading some Streamlabs OBS tutorials about how to make microphones sound better in Streamlabs. As someone who spends much of his time wondering why no one does basic Google searches before writing a SQL query to figure out what might be a good, better, or best way to write the query, it didn’t occur to me until today to see if there were any nice, helpful blog posts about how to make Streamlabs microphones sound better and all that. Not that it sounded bad, it was just low. I got some complaints yesterday aside from people who were wrong about commas going first in lists of columns in select query, well any query really.
I got a slight neg from probably some spy at Beargut magazine saying that my audio volume was a little low. So I’ve cranked the gain up, I apologize if anyone goes deaf listening to me. I’ve done my best to screen out breathing noises and other uncomfortable mouth sounds that drive people with misophonia up walls, so hopefully that works out too.
So we’ll find out. Anyway, I am here today live in the flesh, in the present, to talk about different sorts of predicates that you might see evaluated in queries like seeks and selects and residuals and other things like that. When you look at your SQL Server execution plans, the majority of the queries that you’re going to see today are all going to have this little one equals select one tacked to the end of them like so.
And I’ve got a, what is hopefully a helpful blog post about why that’s there here, that will be in the show notes and all that good stuff. So anyway, let’s get going. So first, probably the easiest thing to start with is seeking to a value in your primary key slash clustered index.
If you’re one of those people who has a clustered index and a non-clustered primary key, you might see different results. I’m not saying that’s a bad thing. This is not quite commas first in the realm of bad ideas, but it’s certainly not a typical pattern that you see explored in databases, at least in SQL Server databases, perhaps on my learning path with other database platforms.
I’ll find out differently. But anyway, if we run this query and we look at the execution plan, we are going to see what is, at least I hope, this late day and age in our lives, a clustered index seek.
And we have our seek predicate down the bottom there. I’m working on my weather meteorologist pointing skills here. That’s a little funny because everything is backwards.
This is my left hand even though it appears on my right. Magic, how the magic happens, how the sausage gets made. But you can see that we seek directly to where id equals one, which is exactly what our query asked for.
No bugs in SQL Server at all for that. What happens if we have a where clause where we don’t have an index on the column that we’re searching, in this case this reputation column right here, is we are going to scan, in this case, the clustered primary key of the table and evaluate that predicate.
We don’t have an index that helps us find data. SQL Server is asking us, begging us for an index that would help that. But we don’t have one at the moment.
We’ll get around to that in a moment, I promise. But if we look at the, come on back tool tip. If you look at the tool tip for this now, we have a predicate right up there. Point D weather man.
Where we scan the entire table and as we’re reading data pages, we are filtering the reputation column to find where anyone has a reputation of 56106. So we do have to scan the whole table. That doesn’t take very long in the context of this query.
It’s not a very big table and my computer is pretty good. So we get through all those rows in about 258 milliseconds there, give or take some timing there, drawing up query plans and whatnot. But then if we search an indexed column, right, like the ID column again, it’s a clustered primary key, and also search the reputation table, we get back to doing pretty okay as far as performance and, you know, being able to seek to certain values goes.
This takes absolutely zero time whatsoever because we seek to like one data page pretty quickly. And if we look at the tool tip for this, we have a seek predicate at the bottom that evaluates the filter on ID, and then a residual predicate. It just says predicate up at the top there, but most people will call it a residual predicate because it’s a predicate that you evaluate after the main predicate.
So it’s kind of stuck on the back end. I think that’s Hollywood talk, right? Get some residuals in there. But we seek to where the ID column equals 77, and once we find any IDs, and this, I mean, it’s a primary key, so it’s unique, so there’s only going to be one of those.
But we seek to where we find 77, and then we evaluate to see if the row or ID 77 also has a reputation of 56106. And in this case, because I have very carefully engineered all of these demos, we do find one row of Mr. Darren Kopp. I don’t know if that’s supposed to be a funny name or not.
It might be a really great joke in German or something. And we can do that with multiple different kinds of searches on the ID column, right? Because it’s the leading column of the index.
It’s, again, the clustered primary key. We can do all sorts of searches against that column and still seek to those values, right? How efficient the seek is is going to depend on how many rows get located. Right now, the users table in the Stack Overflow 2013 database has around 2.4 million rows in it.
So searching for anyone with an ID under 78 is pretty quick. If we were to search for anyone with an ID greater than zero, we would seek, but we would seek to everything in the table. We would read all that stuff, and that’s not a very helpful seek, though it is a seek, and most people would fail job interviews for that.
We would say that seek is good. At least if I was giving the job interview. Watch out!
Anyway, we can get this. It’s pretty efficient. Darren Copp is still the only person who shows up. And we have a pretty close result to the last query.
Our seek predicate way down the bottom. We’re just looking for anything less than 78. And again, our residual predicate, beep, beep, beep, tickle, tickle, tickle that little one.
So we’re going to go to the actual equals sign. Searching for where reputation equals 56106. Now, for some reason, when I’m answering questions about indexes, I get a lot of questions about index design, index theory, like what’s the best index for this?
How’s the best index for that? And really, the best index is one that kind of closely matches the queries that you’re running. It’s not like you can just design an index out of nowhere and be like, good index.
Like you can put the most selective column first, always and forever. But if you’re not searching on that column, it just gets in the way. So to kind of demonstrate that, I’m going to create this index. And this index, which I’m going to talk about once I finish hitting the control and E stuff over there to execute it, has account ID first.
Account ID almost matches the ID column. It’s almost a mirror image of it. There are a few discrepancies, but that’s okay. It’s still a unique column, right? There’s still no duplicate values in there.
So we put a very selective column first, and we put reputation, which is mostly not terribly selective, second. The thing is, this index doesn’t help us run our query. I’m going to run a blast from the past, and we’re going to look at the execution plan, where we’re going to use the unhelper index in this case.
And we’re going to try to search for where reputation equals 56106. And we’re going to use that nonclustered index, but we still scan it, right? We look at this.
We still have to scan the entire index and apply that predicate to reputation. We can’t seek to any values in reputation because that account ID column is in the way. So like we could search for account ID equals something and reputation equals something, and that would get us where we’re going.
But just searching on reputation when we have an index that does not lead with reputation doesn’t really help us out all that much. So I’m going to create an index that is going to be helpful for several queries, and we’re going to look at how useful it is.
So if we force SQL Server to use the primary key, the cluster primary key of the table, and then we also force SQL Server to use the nonclustered index, and I just do that to make it very clear which index is going to be used where and what you should be looking for in them.
If we run these two queries, they’re both going to run pretty quickly in all the zeros of seconds. That’s fast. That’s fast. We don’t have any problems here. Great. Fixed it all.
We’ve already looked at that plan before, but let’s look at the plan now for the index with two key columns, and it has both reputation and ID. Now we actually have two seek predicates that get evaluated.
All right, we have a seek to reputation. We have a seek to ID. So SQL Server can evaluate multiple seek predicates within the same query as long as they are the key of the index. Right? Like we have an index on reputation and an index on ID.
We can seek to all the values and reputation and ID really easily. We can find those very quickly with our where clause. Things change a little bit when you get away from equality predicates and start getting into inequality predicates or range predicates.
So like greater than, less than, greater than, equal to, less than, equal to, not equal to, is not null, is also one of them. And if we look at this query, we get a few more rows back. We don’t just get Darren Kopp back.
We also get the lovely and talented Bob and the hopefully also lovely and talented Rex M. Maybe also funny jokes in German. I’m not entirely sure. But if we look at our index seek now, we have a seek to a reputation is greater than or equal to 56.
And then we don’t have the double seek. Then we’re back to having a residual predicate up there to find where anything is less than or equal to 77. The reason for that is because when you create an index and it starts putting values in order, even if you have duplicate values in that index, if you’re searching for greater than or equal to 56106, what happens is Segal Server might cross multiple boundaries once you find anything after 56106.
So we need a residual predicate on less than or equal to 77 to evaluate everything in that range. We don’t have that result in that perfect equality order. So we have to evaluate a range of values in order to find 77.
Things get a little bit weirder if we throw some more inequality predicates at it. So at this time we are looking for where ID is less than or greater than or equal to 78, 77 and less than 78.
And then where reputation is greater than or equal to 56106 and less than 56107, which by weatherman my way over here we can see that’s 7. All right.
So let’s look at the query plan for this. And this is going to throw you off a little bit maybe if you’re not used to seeing these sort of things. We’re going to go back to seeing two seek predicates down here.
We can see where SQL Server is evaluating the predicate on reputation with the greater than or equal to and then with the ID column looking for anything that is over 77. But then up at the top we have a residual predicate that almost searches for the same thing, but that’s where we throw in the less than 78.
So again, crossing those ranges, crossing those boundaries of values means we have to evaluate residual predicates sort of by scanning data after we seek to data. Things get more challenging when you throw or at SQL Server, especially in join clauses.
Or is very difficult for the optimizer to come up with a good query plan with. Right now the optimizer doesn’t have a way to turn that into like a union or union all type thing where you say, you know, where, you know, this predicate matches or this predicate matches with a union all between them to unify the results.
So we end up with a query plan that looks roundabout like this, which is very strange looking. We have all these filters and constant scans that are producing rows. And SQL Server has to join all those results to a scan of our nonclustered index.
And actually has to scan through it twice because we have to join to this and join to this. Well, I technically, you know, but if we look at this, we have just a predicate now. We don’t seek to anything.
Right? No predicates in there. That’s less than or equal to 77. Greater than or equal to 56106. And if we look at the join, we have outer references listed here, which means it was apply nested loops.
And we took all of the rows that came out of this and we joined it to these conditions down here. Each one of these constant scans is going to represent one side of the OR clause. So you can’t really see it in the query plan, but one of them is going to represent the predicate on ID and one of them is going to represent the predicate on reputation.
So that’s fun. We can apply multiple OR predicates combined with AND predicates as well. And if we do that, we get a pretty similar execution plan.
Right? And it’s going to be kind of the same story here. SQL Server is going to evaluate the range of greater than 77, greater than or equal to 77, less than 78. And then over here, it’s going to evaluate reputation greater than 56106 and less than 56107.
Where we start to run into problems with SQL Server queries is when we start applying functions to columns. Now, if we didn’t have these functions here, we would be able to seek pretty normally to all the values that we want. But because we do have functions wrapped around those columns, we have an index scan again.
And if we look at the details of the index scan, we can see these predicates here on ID and reputation. This comes back to the concept of sargability or being able to search cleanly into an index defined rows. The way that I like to conceptualize this, and this is not like, you know, God’s honest down to the scientific detail truth about how databases work.
But when you apply predicates to an index that can be matched cleanly, SQL Server storage engine can read data pages and apply those predicates pretty easily as it reads them. As soon as you wrap a column in a function like this, you sort of change the layer at which that can happen. The storage engine is way down here and running functions through like the expression service happens way up here.
So SQL Server has to apply the function to the column, which happens up here. It can’t happen down here with the storage engine. It would be cool if it could, but at this point in time, SQL Server doesn’t offer that.
Good stuff, right? This is also sort of true of string columns. One thing I should mention up here is that sometimes if you write predicates that are not sargable enough, like SQL Server won’t even be able to apply them when it scans an index.
You’ll have a separate filter operator that goes and filters out rows after you’ve done a bunch of work on an index. So if we create this index, which this one here leads with, well only has display name as a key column. If we create that and we run these three queries, only one of them is going to be pretty reasonably fast.
The other two are going to be slow because we have to match across a whole bunch of bytes in a row rather than just matching the leading bytes in a row. If we look at this execution plan, or these three execution plans rather, we have an index seek up here, an index scan down here, and then just a little bit cut off another index scan down here.
This index seek has a seek predicate where we look for greater than or equal to Jeff at Wood, little d, and then less than Jeff at Wooie with an uppercase E. The SQL Server forms a range where you can find like the little d and then the big E, right? So like if it ended with a, if Jeff at Wood ended with a big D, and no people out there are going to be laughing at me talking about little d’s and big d’s.
But if we ended with a big D, we would still find Jeff at Wood, and Jeff at Wood, but we don’t find Jeff at Wooie. For the other two queries, we don’t have that seek ability because the other two queries have leading wild cards in the searches. So this one here and this one here, our scans look a little bit different.
We just have regular predicates to look for all the Jeff at Wood, and then we have one down here where we just look for trailing Jeff at Wood. And both of these result in an index scan that takes about two seconds. So when you’re designing queries for indexes, there are a lot of important things to think about.
But mainly, you want to tune indexes for queries. And then you also want to tune queries for indexes. So indexes, great things to, you know, key are, you know, where clause columns, join columns, things like that.
Not if you’re doing, you know, these kind of searches, they’re kind of useless for all that stuff. And then you want to design queries that can use indexes, and you want to avoid things like this because it’s not that you can’t use indexes when you do this. It’s just that you can’t use them as efficiently because you can’t hit them where you store them.
You have to run this function before you can apply those predicates. Anyway, this ended up being a little bit longer than I planned on. I don’t know if I talked slow or something.
And also I got some weird green screen creep happening down here under my arm for some reason. I’m not sure why. Gremlins, ghosts, goblins, ghouls, all sorts of things in there. Anyway, thanks for watching.
I hope you enjoyed yourselves. I hope you learned something. I hope you don’t put commas first in your select list. And I will see you in the very next video. Have a good one. Bye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Your volume now definitely better matches what I get from videos from other sources, so I’d call this a win.
Yeah, it’s really a lot better now. There are some newer videos where it’s just about perfect after some more tweaks. Thanks for the feedback!