Why Logical Reads Are A Bad Metric For Query Tuning In SQL Server
To summarize the video a little bit:
- High average or total logical reads isn’t a guarantee that a query is slow
- Looking for high average CPU and duration queries is a better metric
- You may see logical reads go up or down as you make queries faster
- For I/O bound workloads, you’re better off looking for queries with a lot of physical reads
The more query tuning work I did, the more I came to realize that logical reads fall into the SQL Server 2008 mindset, like PLE, fragmentation, page splits, and other metrics that don’t necessarily indicate a performance problem.
Thanks for watching!
Video Summary
In this video, I dive into the often-overlooked issue of logical reads in SQL Server query tuning. I challenge the notion that reducing logical reads is a key metric for improving query performance and argue instead that physical reads are far more indicative of real bottlenecks. By walking through several examples, including queries with high logical reads but fast execution times, I demonstrate why focusing solely on logical reads can be misleading. The video also explores how to identify and address queries with excessive physical reads using tools like the SP Pressure Detector and Query Store metrics, emphasizing that making these queries more efficient will not only speed up your workload but potentially reduce cloud costs by minimizing CPU usage.
Full Transcript
Erik Darling here with Darling Data, and in this video we’re going to talk about the utter pointlessness of using logical reads to tune queries. I know that’s going to be a difficult pill for some of you out there who love logical reads to swallow, but the thing is that that’s why God invented, I don’t know, pick a drink you like. It makes pills easier to swallow. Now, in my entire query tuning life, no one has ever come up to me and said, hey, thanks for getting logical reads down by 10% or 20%. You really, really made a difference on that, doing fewer logical reads. It’s amazing. Most of the things that people say, hey, good job on, are like, this query went from 30 seconds to one second, because now end users don’t have to spend 29 seconds of their day staring at the end of the day. And then, you’re going to be staring at a spinny wheel waiting for something to happen. And a lot of my early performance tuning, my earlier performance tuning days, I would look at logical reads quite a bit before I really realized how much they were leading me astray, because I would find queries they were doing.
I would be like, let’s look at logical reads, and I would find queries that were doing a lot of logical reads, but happening very quickly, because logical reads happen from memory. If you’re already reading stuff from memory, it’s going to be pretty fast. Physical reads are a different thing, because that’s actually going out to your crappy storage, and that’s no good either, right? Going out to storage is the enemy of your database workload. You want to avoid storage as much as possible, especially in the cloud. Now, the other end of that, speaking of the cloud, is we have end users you can make happy by getting queries to go faster.
And then we have the idiot suits at your company, who got swindled into moving to the cloud, because it was going to be cheap and easy and fast and reliable and secure. And it kind of never was great at any of those things. The cloud is sort of like a pizza place that keeps raising its prices, except every time you order a pizza from them, the thing that shows up is a half-eaten ham sandwich. The prices keep going up, and you keep writing those checks. If you want to spend less money on the cloud, getting your queries to go faster is a great way to do that, because then you have less sustained CPU usage.
You don’t have queries that take 30 seconds. You have queries that take one second. Even if they use more CPU to take one second, you’d rather deal with that than anything else, because that’s a bleh. That’s not a bleh. And so you do that, and not only do you make your end users happy, but you can also reduce your cloud infrastructure and spend less money on it. So let’s look at this query, just to sort of illustrate the point about how high logical reads are not necessarily the mark of a slow query.
Run this thing, and it will do 10 million reads against the post table. Let’s put some commas in there. There’s a comma at the end, but… So Microsoft cares about grammar, but not about your sanity.
And we can… Let’s put a comma in here, too. Look at that. So we do 10 million logical reads against the post table, and 46,000 logical reads against the users table, but this query only takes 827 milliseconds to finish.
You would… Now, I know mentally it probably feels like, 10 million logical reads, how can it only take 827 milliseconds? This is coming from memory.
If your problem is stuff not being in memory, your problem is physical reads, not logical reads. Logical reads is memory. Physical reads is out there, wild blue yonder. Let’s look at two queries back to back.
Now, I’m going to drop clean buffers between these, not because it makes a gigantic, profound difference, but it nudges the numbers for both of them up just a little bit higher, which is good, because you get to see things at their worst, right?
So we talked about going out to disk. No one likes going out to disk. And let’s look at how fast these ran. Let’s look at some of the metrics in here.
So if you were query tuning and you are only looking at logical reads as the success metric for your query, I’ll tell you one of the first mistakes is making one single metric that you can’t rely on. Your goal and your finish line for query tuning is a bad idea.
So this first query does about 7,000 logical reads from the user’s table and about 15,000 logical reads from the post table. And it takes about 880 milliseconds to finish. The second query, well, there are some work tables and stuff in there, but there’s not too much action on those if you look closely.
This does 15,000 logical reads here and 21,000 logical reads against the user’s table. If you compare that up here, it’s like 3x the logical reads, but it’s a faster query. It’s 881 milliseconds versus 167 milliseconds.
So reducing logical reads doesn’t always make a query faster. Increasing logical reads doesn’t always make a query slower. Cool. We’ve got that figured out.
Now, let’s look at a metric that would come into consideration, and that is physical reads. So I’m going to drop clean buffers, and I’m going to run this query, and then I’m going to drop clean buffers, and I’m going to run the second query.
And the first query is going to run demonstrably slower than the second query. We have this first query, which does, and this is the real fun part. This query does 5 million logical reads, and the second query does 5.1 million logical reads.
But look at the difference in CPU and elapsed time. All right, so we did, I don’t know, like not quite a million, nearly a million more logical reads in the second query, but it finished a lot faster.
That’s because the first query had to go out to disk, and we did that many read-ahead reads. That’s a lot of read-ahead. We had to physically go out to disk and bring stuff in.
You may have caught glimpses of read-ahead reads in some of the other statistics I.O. output, but this is the good one to concentrate on. So you can have queries that do a high number of logical reads that happen quickly when they’re from memory, but happen slowly when they’re from disk.
So if you want to concentrate on a metric that involves reads, finding queries that routinely have to do a lot of physical reads is a good place because those are the ones that are most likely going to be missing some index in there that would make life better.
Maybe not 100% of the time, but a lot of the time, queries that routinely have to do a lot of physical reads usually have to scan some big old thing because they don’t have a better index to use.
So let’s look at what that looks like in life. So if I were trying to diagnose I.O. stuff on a server, I would, of course, start with my wonderful, lovely, talented store procedure, SP pressure detector, and I would look at weight stats.
And if these page.io underscore something something weights were a high percentage of my server’s uptime, then I would be concerned about physical reads because that’s queries going to disk, get stuff, and bring it into SQL Server.
Same thing with page.io.exe, right? So the sh weights are for reads, the ex weights are for writes. If I had to bring a lot of data from disk into memory all the time, I’d be concerning myself with physical reads, not logical reads.
Now let’s come over here into this window because this one is a little more interesting. Let’s throw 10 seconds on that. And let’s just run this thing.
So I’m going to, as quick as I can, start that and then run that. And, oh, wait, I ran the whole wrong thing. I got all excited to highlight stuff and look what happened to me.
I’m just going to have to wait a few seconds there. There I was on a real roll. All right.
So in the real results that I care about, which are down here, which are SP pressure detector sampled for some amount of time, 10 seconds, as it says in the sample seconds column.
In that 10 seconds, I spent 51 seconds waiting on page.io latch.sh. Okay. I mean, fine.
There’s a little bit of a blooper reel moment there, but this is the kind of stuff that I would pay attention to. So if I’m looking at the server uptime as a whole, and a lot of page.io latch underscore sh and or ex waits are happening, and I just don’t have enough memory, I’m going to concern myself with queries that are doing physical reads, not logical reads.
All right. So if you’re looking at your server, and you say, wow, the server’s been up for 100 hours, but it’s done like, you know, like 50, 60, 70, 80, 90, or even more hours of page.io latch weights, and your problem is physical reads, not logical reads.
Physical. Going to disk. That’s slowing your queries down. So that’s one way to look at stuff, right? So you can run SP pressure detector, look at everything, or you can, you know, look at a sample of your workload and see just how much page.io latch is coming out of that, coming out of there.
Now, if I look specifically at query store for, oh, get over here, you silly goose, for this query, and I look at the metrics for it, we have a whole bunch of stuff in here about, sorry, a little bit further.
There we go. We have a whole bunch of stuff in here about average duration, total duration, last duration, and then these two columns are important. We have min duration and max duration. So what this would tell you is that sometimes this query runs for 713 milliseconds because data is already in memory, and sometimes this query runs for 8.2 seconds because data is not in memory, then we could come over and look at some other metrics, and this is where I think things are going to get even more interesting for you.
So logical reads never change, right? In total, we’ve done 120,000. The last logical reads were 40,000.
The min logical reads were 40,000. The max logical reads were 40,000. Okay, close enough in there, right? We’re off by like 400, but not even 400 megs, like, I don’t know, some small number of megs.
I don’t want to do math in front of everyone. It makes me nervous. But the logical reads never change. What changes dramatically are the physical reads. So average, 25,000 megs of physical reads.
Total, 77,000. Last, 39,000. Min, zero. Max, 39.
So for workloads that exhibit high amounts of page I.O. latch weights, which indicate you’re not reading data from memory, you want to be looking at physical reads, not logical reads. And then, you know, we’re going to look at the quickie store in a couple different ways.
Like, say, this will show stuff by average CPU. And this, the non-expert mode, cuts a few of the columns out. So the results are a little more succinct.
We can come and look at this stuff. And we can see stuff like average duration. And this is going to be, so this is ordered by average CPU.
But if we put a few of these columns together, right? So there’s that. And there’s that. We kind of zoom in on this.
There’s not really a great correlation between how many logical reads you’re doing and, like, how much other work you’re doing. Like, these two up here, like, there’s 300,000 megs of logical read difference between the two.
But the one on the bottom is almost twice as fast as the one on the top, right? We have 53 seconds versus 21 seconds. If you look in this bunch right here, this is a particularly interesting bunch.
There’s 13,000 logical reads here. But the average duration is 15 seconds, right? So, like, this is just not a good indicator that, like, you’re really blowing anything out of the water with logical reads.
And then down here at the bottom, there are a few in here that are using almost as much logical reads as the top. But, like, these numbers just don’t come close to, like, these numbers. All right?
So there’s a big gap in there. And that’s why I stopped looking at logical reads because I kept finding queries that were generally fast, even if they did a lot. Another good way to look at query store data is by duration. Now, this will order by average duration.
So if we come over here and we look at, we line those columns up the same way again, right, we are doing things that are kind of all over the place. Average logical reads, 2679. Average duration, 54 seconds, right?
And not all of these numbers line up well between what’s slow and what’s doing a lot of logical reads. If you want to care about a read metric, care about physical reads. Look at those query plans.
Figure out how you can do fewer physical reads. How you can make better use of the memory in your server because that’s a much, much better goal. All right? Physical reads are the absolute devil.
Logical reads are stuff that’s already from memory. So please, when you’re tuning queries, stop using logical reads as the sole metric for figuring out if a query tuning adventure was successful because that’s not going to be what users feel. All right?
You want to look at how long the query takes, how much CPU the query uses, how many physical reads the query is doing because those are things that will not only make solving those problems will not only make your users happy, it’ll make your servers happy. It’ll maybe even make the idiot suits who are spending all of your bonus money on cloud infrastructure happy. So, cool.
All right. So this video is a twofer. You got a lot of good information. I got a good blooper reel moment for TikTok, I guess. That’ll be fun.
And maybe pause on the moment where I realize my face recognizes that I didn’t have the one thing I wanted to run highlighted and I ran like six things. That’s okay. Anyway, it’s good to know that everyone hurts sometimes.
Anyway, thank you for watching. If you enjoyed this video, including the blooper reel highlight, I would adore you for hitting the thumbs up button. If you like this sort of SQL Server content or you’re just really into blooper moments, you can subscribe to my channel because I guarantee you, well, not in every video, but every once in a while, there’s a pretty good blooper moment.
And when I’ve been recording for like 10 minutes already, I’m probably not going to start over. So you’re just going to watch me deal gracefully with it. And graceful failover from the blooper reel.
Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that you’ll stop just looking at logical reads as success for making a query faster because there are many other things to consider in your environment. I hope that you will pay more attention to physical reads and pay more attention to queries that do a lot of physical reads because making those queries do fewer physical reads will make those queries go faster too.
I think that’s it. Yeah, I think we’re good for today. All right, cool.
Cool. I’m going to go do something else entirely that has nothing to do with computers. So you can leave it up to the imagination what that is. I hope you don’t have too vivid of an imagination because it’s probably not going to be as vivid as you’re imagining if it is.
Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
“Count the clock that tells the time” is the best advice I ever read. Harlan Ellison should have been a performance tuner. Instead he turned out to be a great writer, an irritating SOB and a fantastic role model for idiots like myself.
I always count what the user experiences because they’re my customer.
PS: Billy Shakespere wrote much the same this that was even before punch card so maybe he was just a few hundred years early.
https://storiedbeginnings.blogspot.com/2009/11/count-clock-that-tells-time-by-harlan.html
Ha ha, nice! What other books of his would you recommend? I’ve never picked one up, but I do try to keep my reading list up.
Picking books up is kind of necessary if you intend to put it down with great force. Especially on an empty cranium.
But Harlan could churn out short stories that cut right to the point for those with short attention spans.
How about an audio production of one of his stories?
https://www.youtube.com/watch?v=dgo-As552hY