Why Multiple Seeks Are Slower For Some SQL Server Queries

Why Multiple Seeks Are Slower For Some SQL Server Queries


Video Summary

In this video, I delve into the fascinating world of SQL Server query optimization by revisiting an intriguing blog post from Paul White that has stood the test of time for over a decade. I share my unique take on his example, illustrating how subtle changes in query syntax and indexing can dramatically affect performance. Through detailed analysis and practical demonstrations, I explore why certain index seeks might not be as efficient as expected, leading to surprising execution times. This video is packed with insights that will challenge your understanding of SQL Server’s optimizer and encourage you to rethink how you phrase your WHERE clauses for better query performance.

Full Transcript

Erik Darling here with Darling Data. Still sound like I’m dragging myself out of a cave a little bit, but that’s alright. I’m never going to have to hear my own voice anyway. One of the joys of being on this side of the camera, the microphone. You never have to listen to me. Plug my ears. Today we are going to talk about something exciting, titillating. And it relates back to this blog post by Mr. Paul White, who very, very kindly uses the pastel crayon. Still not sure what to call it. It’s not exactly a drawing. It’s not exactly a painting. I guess it’s a crayoning that I did of him. It’s all his social media avatars. So that’s quite nice. I never, never, never, never thought that I would be a famous artist. But, I mean, kind of funny that this came up when it did because this post is turning, what, 13 this week? 13 years old. Paul’s been smarter than all of us for 13 years. Can you imagine how heavy a burden that is? Being smarter than everyone?

Like, like, 15 years? Same. But the main thing that we’re talking about here is when is a Sikh not a Sikh? And, well, I would love to just read the article to you. I don’t think that that would be a very nice thing to do. The word is Paul’s copyrighted material. So we’re going to try to avoid any litigiousness in this romantic week that we’re all suffering through. All right. I’m going to get right down to the end here. Because at the end, where the answer is. And the answer is, when is a Sikh not a Sikh? That is, when it is 63 Sikhs.

Well, that is correct. And what we’re going to do is, so, I mean, like, Paul’s post has an example in it where, you know, like, if you run these queries a lot, like, the loop that does this thing down here is a lot faster than the loop that does this big in clause here. I’m going to show you a little bit more of a profound single query example. No loops required here. And to get to, just so you know what the setup here is, I have one index on the post table on post type ID that includes score.

And this is really just to get some faster stuff going on the outer side of the cross apply that I’m doing. And then on the votes table, I have two indexes. I have one that’s just on post ID. And then I have one that is on post ID. Well, I mean, just keyed on post ID.

But it includes user ID, bounty amount, vote type ID, and creation date. And then we have another index down here that is keyed across post ID, vote type ID, and creation date. And descending, that’s why that’s called pvcd there, because its creation date is descending.

That’s probably not the best notation for telling someone that a column is in descending order. But we’re just going to deal with it, because that’s what I felt like doing. But this one also includes user ID and bounty amount.

I forget why. I think the query changed a little bit, but I didn’t change the indexes much. So just deal with it for a minute. But what I want to do is run these two queries first.

And while these two queries run, I’m going to talk a little bit about what’s going on with them, because that seems like a nice thing to do for the folks at home who may not just be able to look at a query and figure out exactly what it’s supposed to do.

So I’m using cross-supply. Again, the reason that I use cross-supply for a lot of things is because a lot of interesting things happen on the inner side of nested loops joins.

And cross-supply is a great way to get the optimizer to say, yep, nested loops join. That’s it for me. Just a little bit of a demo writing shortcut there for all the folks at home. And so what we’re doing is we’re selecting some stuff from posts, and then we’re cross-applying out to the votes table.

The votes table, I mean, they scrub all the user ID stuff out of there. So, like, I mean, there are probably other interesting queries that could be written, except, like, the user ID column in the votes table is largely null, except the thing for, like, bounty stuff.

So it’s just not a very, unless you, like, null join demo, it’s not a very interesting user experience. That’s why I’m going from posts out to votes, because that makes sense.

Every, well, not every post, but many posts have many votes. A good, strong correlation in there. And what we’re looking for in the votes table for this first query, I mean, they’re both the same query.

They really just have different index sense in there, because that’s what I want to show you primarily. And so what we’re doing is we are looking to rank posts by which vote type ID was newest.

I mean, that’s kind of dumb, but we’re going to go with it. And then we are filtering down to where vote type ID is in, 1, 2, 3, and where V.creationDate is greater than 2008-0101.

These are very unselected predicates, right? Like, vote type ID 1 is, like, awarding. Like, when the person who asks a question recognizes an answer as being the answer, and they put a little green checkmark on it.

Vote type 2 is upvotes, and vote type 3 is downvotes. Most of the votes in Stack Overflow are either, yes, that’s the answer, yes, that’s a good answer, yes, that’s a good question, no, that’s a bad answer, no, that’s a bad question.

So vote type 1, 2, and 3 really just are the majority of the table. And we’re using the row number function partitioned by vote type ID, which is not in the key of our index, so we’re going to have to sort to deal with that, and then ordered by creationDate, which is also not in the key of our index.

So we’re going to have to index. We would need an index if we didn’t want to sort that, but, you know, we’re using the index up here that’s just on post ID. None of that stuff is in the key. None of those other columns are in the key of the index, it only includes.

So we’re going to have to sort there. And now in the second query, we are doing the exact same thing, except we’re using the index with all of those key columns in there, right? And I think what’s very interesting about these two execution plans is that they end up taking just about the same amount of time.

This one takes about 15 seconds. This one takes about 16 seconds. That’s interesting, right?

Because we would think that with a great index that allows us to not only seek to all the data that we care about in our index and not have to sort the data that we need sorted for our windowing function, that we would be in great shape, but we’re not.

Not in great shape. Now let’s dig a little bit more into these plans, right? Let’s look more closely at these things. So the index seek over here takes about eight and a half seconds. And then the sort here adds, oh, let’s see, what’s eight and a half minus, well, 13 and a half minus eight and a half.

We can forget the halves. Then that’s 9, 10, 11, 12, 13. So we spend five seconds in the sort there, right? So 8.6 seconds seeking into the votes table, additional five seconds in the sort, mostly because the sort spilled, right?

Spilling sort, oh, boy, watch out. 10 dB, ha! But it’s there. Don’t be afraid of it. And then what’s super interesting, at least to me anyway, is when we look at the second query plan that took about 16 and a half seconds, we spend, we don’t have to sort in here anymore because the index fully supports the sort order that we needed for our windowing function.

So we don’t sort and spill, but we just spend a lot of time in this index seek, right? If we compare the two index seeks, sort of, I mean, not exactly side by side, but at least visible on the same screen together, there’s one index seek, there’s two index seeks, we lose just about all the efficiency of not having to sort here, seeking into this index.

Question is why? The answer is, when is a seek not a seek? Well, in this case, when it is three seeks. So let’s bring this way over here so it’s visible on the screen and my big body is not blocking things.

So if we look at this, we can see that under that one seek predicate, we have three seeks.

One, two, three, right there, right? All three of those are seek operations. Three individual seek operations. Right?

Crazy. And if we reframe this a little bit better, you can even see that we’re doing three separate seeks for vote type ID one, two, and three. Crazy, right?

Insane. How is that possible? Well, let’s go a little bit further and let’s run a copy of this query that looks a little bit more, like, syntactically, like what that multi-seek is doing, right?

So let’s execute this. See what happens. What craziness will ensue here?

What baffling things will happen next? We don’t know. We just don’t know. It’s going to take about 15 seconds. Not to spoil it for anyone at home.

All right. There we go. 15 seconds are up. And here is a more sort of accurate visual representation of what that multi-seek looks like.

It’s actually three separate seeks like this. Right? So the way that this query is written is with three separate select queries, union all together, because we’re not going to get any, like, duplicates in here, because we’re looking for vote type ID one, vote type ID two, and vote type ID three.

So we’re getting three separate results. But this is a more accurate visual representation of what a multi-seek query is doing, because we have the three separate seek operators. And we can see, you know, these ones take about, oh, that’s four and a half seconds.

That’s four seconds. This one’s about five seconds. So, you know, we end up, we can more accurately see how the three seeks don’t exactly work out from, like, a timing perspective, because, again, very non-selective predicates, you know, even for, like, you know, each separate post ID, vote type ID one, two, three, and creation date greater than 2008-01-01.

That’s basically, like, the dawn of time for stack overflow. So we don’t, like, that’s, like, everything in the table anyway, right? So, like, we’re getting everything. So these are three big seeks, and they each take between four and five seconds apiece.

So that’s where the 15 seconds goes in the seek and the other plan. Now, coming back to a video that I recorded the other day, I want to show you something magnificent and interesting.

I recorded a video about how a SQL Server is not always very smart about integers. And guess what? We can play a very similar trick on the optimizer with this query, as we did in the other query.

Ready? You ready for this? So we’re going to say, hey, SQL Server, I don’t want the vote type IDs in one, two, and three. I want vote type IDs greater than zero, and I want vote type IDs less than four.

And guess what? When we do this, well, you’re going to notice that this finishes a little bit faster than 15 seconds.

This actually finishes in about eight seconds, which is about what that original query finished in if we didn’t have, this is about how fast the original query would have been if we didn’t have the sort operator in there.

We no longer spend like 15 seconds in this index seek. We have a very fast seek here, don’t we? That’s much better. Then we no longer have the multi-seek issue that we had in the other query plan where we said in.

So when we are tuning queries, you want to pay very special close attention to how we are phrasing our where clauses. We may want to try phrasing our where clauses involving integers in several different ways.

Don’t think that in or like between or less than, equal to, or anything like that is necessarily the best way to phrase your question.

Always try different things because you may find that you get much faster, better query results when you play along with the optimizer’s stupid games. Right?

Anyway, I got a phone call I got to get on. So I’m going to stop this recording. I’m going to say thank you for watching. If you enjoy this sort of hard-hitting, edgy, in-depth SQL Server content, feel free to give this video the old thumbs up.

If you want to get notified when I post these things, you can always subscribe to the channel. And I hope you enjoyed yourselves. I hope you learned something.

And yeah, I don’t know. That’s it. Thank you for watching. Have a good day. I mean, I’m going to have to record something else today because I’ve got a lot going on. But anyway.

Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.