Thank You For Your Business
In yesterday’s post, we looked at how row goals affected performance when data matching our join and where clauses didn’t exist.
Today we’re going to look at something similar, and perhaps a workaround to avoid the same issues.
Here’s our index setup:
CREATE INDEX whatever ON dbo.Votes(CreationDate DESC, VoteTypeId) GO
Now we’re gonna wrap our query in a stored procedure.
CREATE OR ALTER PROCEDURE dbo.top_sniffer (@top INT, @vtid INT)
AS
BEGIN
SELECT TOP (@top) v.*
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @vtid
ORDER BY v.CreationDate DESC;
END;
Video Killed MTV
You know what? I don’t wanna write all this stuff. Let’s do a video.
Thanks for reading!
Video Summary
In this video, I dive into an interesting scenario involving parameterized `TOP` in SQL Server stored procedures. I explore how SQL Server handles rare data searches and the impact on query performance when using parameters like `TOP`. By creating a specific index and running various queries, I demonstrate how SQL Server’s caching mechanism can lead to suboptimal plans for rare data lookups, showcasing both the challenges and potential solutions. I also touch on some lesser-known behaviors, such as the parameter sniffing issues that arise with `OPTIMIZE FOR UNKNOWN` and dynamic SQL alternatives, providing insights into optimizing your stored procedures for better performance in real-world scenarios.
Full Transcript
Howdy folks, Erik Darling here with, well, with Erik Darling Data. And I wanted to record this as a video because I am far too lazy to blog about this today. It’s Sunday and I don’t want to do all that work screen capping and highlighting stuff and all that other things that go along with a great post. So I’m just going to record this video because I think that’s going to be a little bit easier on everyone. So what I got is something kind of funny to show you about what happens when we parameterize top. In other words, when we write a query, a stored procedure, or whatever you want to call it, and we allow people to pass a parameter like this to top like this. Now, I found some interesting stuff in this. Hopefully you find it interesting too. Now I’ve got an index here. You can ignore that little red squiggle. It’s because I already created it ahead of time because I know how much everyone hates waiting for indexes to create. And it’s on creation date descending and vote type ID. Now that is there mostly to satisfy this part of the query, my where clause on vote type ID and my order by on creation date. I know that for a bunch of the columns I’m selecting here, I’m going to have to do a key lookup. That’s not really what this is about. What I want to show you is what happens when SQL Server has to go looking for rare data. Now in the votes table, the way things break down is there are kind of a lot of rows with a vote type ID of one. There are about 3.7 million. There are far fewer with a vote type ID of four. There’s only about 733. The way that they break down in the index is fairly even. So when we look at the year of the creation date, remember that’s the leading column in the index. Oops, jumped around a little bit there. We can see that things break down fairly evenly across all of the possible years here. There’s not like any great big gaps. We have one in just about every year in the index. Everything’s pretty even there. Now when we run the plan, I’m just going to recompile this thing real quick to make sure I’m getting something fresh out the bag. And I look for the top 5,000 rows for vote type ID one. This runs fairly quickly. This runs in around about 23 milliseconds total. And it does about 15,000 logical reads. So that’s good.
And when we look at the execution plan. What I want to show you first is that when we look at the parameters that got passed in, top just like any other parameter was cached. So we have a parameter compile value of 5,000. And we also have a parameter runtime value of 5,000. If I get rid of this, and we go look at an execution where we have a 1 in as top, exactly what you think is going to happen happens. We hit F4. We go look at the plan properties. Got to jump around a little bit to get to the right one. And when I look at the parameter list. Now I have a compile time value of 5,000. And a runtime value of 1. So just like any other parameter in a store procedure in SQL Server, it gets cached when we run the query the first time.
And then it gets reused when we run the query subsequent times. Now, obviously, this could backfire if we recompile here, and we run the plan looking for a top one first. And then the next execution, some ding dong comes along and looks for a million rows. This is the obvious one. We’re going to reuse that tiny little plan that SQL Server came up with to find one row, and we’re going to use it to find a million rows. That’s the obvious bad part of parameter sniffing. The kind of less bad part, or the kind of more interesting bad part, is what happens when we look for data that is a little bit harder to find, that rare data. So now I’m going to look for a top one for the vote type ID of 4. Remember, this is the one that only has about 773 rows in the table. But when I look for top one, it’s fast.
Because it finds a row really early in the index that it can give back. Again, this is going to be early in the index, it’s order descending. So it finds this row from 2013 and says, good, we’re done. If I say I want the top 5,000 where vote type ID equals 4, this is going to be a little bit less snappy. Now we’re going to spend about two seconds, or actually close to three seconds, scanning the entire index and looking for data.
You can see that we spent an elapsed 2,700 milliseconds, close enough to three seconds for me. And we did, let’s see, that’s 146,534 reads. So we had to scan that entire index looking for that kind of spread out, kind of rare vote type ID for.
We had to start at the beginning, which is going to be the year 2013. And we had to go all the way to the end, which is 2008. So that’s one example.
Now what’s kind of interesting that I found is if I recompile things, if I clear out the plan for that store procedure, and I look for the top 5,000 first. Now even though I know that 5,000 rows don’t exist, that’s not really an issue here because the top will exit when the nested loops join says, I don’t have any more rows to give you. The top says, okay, well I wanted 5,000, thanks a lot.
So if we run this first, the plan changes a little. And it changes in that it goes parallel now. SQL Server says, I think I might have to spend a little bit more time reading this. I’m actually going to go parallel.
I’m going to use multiple threads to find these rows because they’re spread out everywhere. So that’s kind of a cool thing that SQL Server knew to do. And then when I rerun, of course, looking for a top 1, it’s going to reuse that parallel plan. And things are going to be fairly snappy still, right? It’s still okay. We’re still running in under a sec.
We’re still pretty well under a second there. Now, something else interesting that I came across when I was messing with stuff is that if you use optimize for unknown and you use a variable for top, you get kind of a funky guess just like you do with most other unknown guesses. Now, what I’ve done for this run of the store procedure is I’ve quoted out the vote type ID part of the where clause because I don’t want you thinking that the row estimate is coming from the vote type ID that I’ve chosen to pass in.
That’s not where it’s coming from. If I recompile the query without that where clause in it, and just for good luck, I’m going to make sure it’s extra double recompiled. I’m going to run this now looking for the top 5,000 rows.
And when I go look at the execution plan, rather than guessing 5,000, my top has a guess of 100. So when you use optimize for unknown with a parameterized top, you always get this guess of 100. I tried lots of different numbers, and the only time that it was never not 100 is if I recompile it through a recompile hint on that.
So that was a little funny. Anyway, if you use top in a parameterized sort of way, there are some things you might want to explore if you’re running into problems. If you’re recompiling the plan, if unparameterized or even non-parameterized, dynamic SQL might help.
Now, I say that, and I know people are going to crawl out of their skin because unparameterized dynamic SQL can lead to SQL injection attacks. I know that it’s a lesson to a degree because you would be using an integer with top, and likely just using a function like rtrim or something to produce a stringified version of the number to concatenate into the query.
But the risk is still there, so be careful. You might want to say, okay, if I know that under certain conditions I want a different plan for things, do I need to separate code into different store procedures?
Or do I have an optimized for a value? Like if I, rather than doing optimized for unknown, can I optimize for a value like 5,000, 10,000, 2.1 billion, whatever Adam Mechanics phone number is, and get a better plan that way? Do I need to change indexes?
Like maybe the order of the key of the index has been appropriate for the query? Do I need to have some included columns to get rid of that key lookup? Do I need to make an adjustment there? So there’s a lot of stuff that you could look at as ways to solve it, but this is just to let you know that that issue exists, and maybe if you keep reading my blog posts and stuff like that, maybe we’ll talk about all the ways that you can fix it coming up.
Anyway, I hope you enjoyed yourselves. I hope you learned some stuff. I mean, who knows? Maybe you didn’t. If you’re smart, you didn’t learn anything. I hope you were at least entertained if you didn’t learn anything. Anyway, I am Erik Darling with Erik Darling Data, and thanks for watching.
Bye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
Related Posts
- SQL Server 2022 CTP 2.1 Improvements To Parameter Sensitive Plan Optimization
- SQL Server 2022 Parameter Sensitive Plan Optimization: Does Not Care To Fix Your Local Variable Problems
- SQL Server 2022 Parameter Sensitive Plan Optimization: Sometimes There’s Nothing To Fix
- SQL Server 2022 Parameter Sensitive Plan Optimization: Does PSP Work With Dynamic SQL?
Apart of the great demo you gave us can you specify what kind of tool are you using for zooming in and drawing?
I use SnagIt for all that stuff.