A Little About Using Windows Performance Recorder
Video Summary
In this video, I delve into using Windows Performance Recorder to uncover deeper performance issues in SQL Server queries. After running a set of demos where certain queries took significantly longer than expected, I couldn’t pinpoint the exact cause based on traditional query plan analysis and weight stats. To get to the bottom of it, I turned to Windows Performance Recorder, which provided valuable call stack information that led us to discover the impact of read ahead reads, specifically trace flag 652, in this particular scenario. This video is a great example of how sometimes you need to go beyond the query plan and use additional tools like performance recording to truly understand what’s happening under the hood.
Full Transcript
Erik Darling here with Darling Data. And today’s video is an interesting one. So this video is going to talk about using Windows Performance Recorder to track down like a deeper level of performance problem in SQL Server. This is actually based on a demo that I had recorded a little before I went on family vacation and it bothered me the whole time. And when I got back, I started looking at it again. And, uh, like, really couldn’t figure it out. And so, like, I showed a few people and they were like, I don’t get it either. So I was like, okay. So I used Windows Performance Recorder and found some stuff. And then it was actually, uh, my, my good dear friend, Forrest McDaniel, who reminded me about a trace flag that we could use to fix this one. But, uh, this is an interesting one. So I hope everyone’s buckled up. Anyway, uh, if you look down at the video description, there you’ll see all sorts of helpful links where you can hire me for consulting, buy my training, become a member of the channel, and support the stuff that I do here to bring you this content. Uh, ask me office hours questions for my office hours episodes. Those usually come up Monday where I answer five of your questions. And of course, if you enjoy this content, but you don’t want to pay me to do anything, you can always like subscribe, tell a friend and help this channel get bigger and better in other ways. Uh, of course, I still have some speaking events coming up. Just finished, uh, pass on tour New York City, but there is still events in Dallas, uh, coming up in September where I have a half day, pre-con and a regular session. Uh, and then there is the one in Utrecht where I have just about the same thing. Then of course, there is pass data community summit coming up in Seattle, November 17th to 21st. Uh, I have two days of T-SQL pre-cons with the wonderful and talented Kendra Little. And I look forward to seeing you at both of those, not just one of those. You should go to both of those because that’s how you maximize your knowledge and learning. So with that out of the way, let us get to the database party.
Now, just to sort of, uh, bring things back to where, where they were, right? Uh, I had this set of demos where, um, there were a couple of queries that I ran. And in one of the queries, I used this, uh, option use hint, disable optimizer row goal. And what ended up happening in these query plans was, uh, the first query, uh, with row goals, uh, enabled ran for about five and a half seconds and 5.6 seconds. Good enough there. But the other query ran for 40 seconds. And all 40 of those seconds were sort of spent in a place that didn’t make any sense to me. So I’m just going to shuffle these query plans around a little bit. And the biggest time suck in these query plans was right in this branch. So this one took 3.386 seconds, which isn’t bad, right? This is the fast query plan. This one took 37 seconds, which is bad, right? That’s, that’s like a 10 X slowdown for this one operator. And I couldn’t for the, at first, when I listened to this happen, I couldn’t for the life of me figure out what was going on with it.
Because when I looked at the weight stats for the queries, so if we, uh, get the properties over here and, uh, we open up weight stats, there was not a whole lot of weight stat evidence about what might be going on in here. Right? Like, like, like the weights for the, the quick query were of course, you know, not really anything interesting, but then the weights for the slow query, let me just make sure that’s the right one there. The weights for the slow query really didn’t add up to anything either. Like, like, like just looking through these, like nothing in here said, this is why I took 30 seconds, right? This is none of, none of these weights are doing very much.
Uh, like there’s very, very little weight wise going on in here. So what I did was I fired up my friend, the windows performance recorder, which since this is a VM I had to do over here. So, uh, what I’m going to do is show you sort of what I do, uh, to get windows performance, uh, windows performance recorder stuff. Now, of course, the first thing you have to do is actually like run it to collect things, but I’ve already done that because I don’t want you to sit there and wait for a long time for stuff.
So, uh, once I have, uh, once I had things recorded for the two queries, uh, what I usually do is go into the CPU stacks and then I’ll go into SQL Server and I’m going to make this bigger and do some zooming. So you can all see what, what, what I get up to. So the first thing you want to do is select everything in here and then you want to say, look up symbols.
So this is going to go out to Microsoft and this is going to look up symbols and this is going to resolve the names of any public symbols that it can. This is going to take a couple of seconds to run. Uh, maybe you should have done this before I did the video, but you know, I’ve got pretty fast internet, so I’m not too, too worried about it.
So after, uh, while, while this is running, what I’m going to do is there are a few things in here that I like to change, uh, for the group paths and the fold percentage. So I usually just delete that and hit enter. And then for the fold percentage, I put in zero here and I hit enter.
And this will sort of expand everything to what we care about. Now, uh, looking at the stuff in here, I’m just going to slide this over. So it’s a little bit more clear.
Oh, that didn’t, that didn’t help much at all, did it? So like, if we look at what’s going on in here, we have, um, you know, like this locate column internal thing, but there’s not really like a lot of time spent in this one where we care about stuff. Where this be, where this sort of becomes more useful is when you start comparing one, uh, set of call stacks to another, but we’re good, but we need to sort of open up the other one and do stuff there first.
So the CPU stacks here just show some stuff like locate column internal. I think I actually have to squish that a little bit so that I can, uh, use zoom it to zoom in on things. Otherwise, uh, it doesn’t really cooperate with me, but for some reason, zoom it doesn’t want to cooperate when this thing is up anyway.
So we’re just going to skip that part, but we don’t really have anything too, too interesting going on in here. All right. This is the, this is the fast plan.
So let’s go look at the slow one and let’s open, crack open this bad boy. And we’re going to go back into CPU stacks and we’re going to go back into SQL Server and let’s make this as big as we can. And what we have in here is, uh, some interesting stuff.
Once we get rid of the group paths and once we get rid of the, once we do the full percent zero. And what we’re going to see is that we have a new thing in here. Uh, and we, the couple of new things that we have in here are read ahead helper and read ahead.
And this is where, I’m sorry. There’s a, I think this one too was interesting. So if we were to compare, uh, one set of call stacks to another, um, like this is really where we had way more stuff going on.
Right. This is where like we spent way more CPU time. And this is the only way that I can figure out what happened.
Cause again, just looking at the like weight stats and other stuff for the query, there was no sign at all that this stuff was coming up to the forefront. But then if we sort of bring these percents over, excuse me, uh, we’ll see that these numbers, uh, you know, were like much like a, like these things like were present in this call stack trace where they weren’t in the other one really. And they have like fairly high percentages, uh, as far as like, like stuff that executed.
Um, there’s all, all sorts of other stuff that you could look at in here, like make, look at flame graphs of things, but I don’t really, you know, I don’t, I don’t really, I’m not, I’m not enough of a developer to understand those fully. But, uh, so this is how I was like, huh. And then like, you know, as I was sharing this stuff with the people who I was just like, what, what, this is a weird thing.
Uh, again, my friend Forrest McDaniel was like, Hey, there’s a trace flag to disable read ahead reads, uh, 652. I think Paul White’s blogged about it, but, uh, Forrest remembered it. So Forrest gets full credit on that one.
But if we, uh, run this query and we turn on this trace flag 652, 7418, uh, just changes how operator times show, uh, their time in the query plan. Uh, uh, trace flag 7418 changes things. Um, so in a, in a row mode plan, the operator times are cumulative going from right to left.
Uh, but in batch mode plans, uh, they are per operator. So 7418 makes row mode plans act like batch mode plans and just has the operator times, uh, show as per operator. So this one isn’t part of what solves a problem.
652, which disables read ahead reads for the query. This is what solves the problem. So if we rerun this query with those trace flags enabled, and we look at the execution plan without read ahead turned on. And like what I don’t want you to think is that read ahead is a bad thing.
Uh, read ahead is generally a very, very good thing. It turned out to be a negative thing in this plan because of how many of them we had to do. Like there’s two kinds of killer latencies.
Typically in computer systems, there’s the stuff that of course has a long latency, right? You might see that for like lock weights and other things that, uh, you know, when they drag on for too long, other things start having problems. But then there’s the type of latency where it’s sort of like the death by a thousand cuts swarm of mosquitoes latency where you have to do one thing.
And even though that one thing is really fast, if you have to do a lot of it, well, you can end up in a pretty tough spot. So if we look at the execution plan for this with read ahead disabled, this actually ends up running, uh, pretty quickly even compared to the fast plan before. It’s about 500 milliseconds faster.
Not that like that’s the meaningful difference, but really it was with the, the difference between this one and the, uh, row goal being disabled query. That was where it got really interesting. So, uh, if you, when you’re a troubleshooting query performance, you might find all sorts of good things to look at, uh, contextually for the query.
Uh, the query execution plan is always going to be like your first source of truth for like, you know, where, uh, time was spent in the, in the, in the query plan. And then like getting the actual execution plan for that, then you might find where, um, you might look at weight stats for the query plan and see interesting things in there. But sometimes, or like, uh, you might even look at the sys.dmexec session weight stats for your session and see if there’s anything in there.
Because like I, I’ve talked about a few times before, Microsoft screens certain weights out of actual execution plans. It doesn’t report on them. So you might look at the sys.dmexec session, uh, weights, uh, dmex session weight stats view, and you might look in there to see if there’s anything additional that would be useful for troubleshooting.
Other times you have to go deeper and figure out like, you know, stuff that a query plan would never report on. And call stacks are one of them. And windows, windows performance recorder is a much easier way to just record general call stacks for when a query is executing.
Than, say, like something like using the debugger and having to like attach to a thread and like, you know, like click through and wait until you like, like set breakpoints and stuff. Because that, that’s where it gets really difficult. And of course, the nice thing about windows performance recorder is that if you find interesting call stacks in here, you can actually use something like window bug to set breakpoints on the interesting things you find in here.
So you’re just not hunting through this haystack of things and like, you know, just like hitting enter and letting the debugger make progress after each new step. So this was an interesting one. Um, again, readaheads generally not bad.
Like don’t go disabling readahead. I don’t want you to take that away from this video. I don’t want you to think that readahead reads are awful. Generally, they’re very good things and they help performance quite a bit. But, uh, in this case, they ended up being the problem for this one particular query and tracking down what the problem was, was to me very interesting and very fun.
And of course, uh, I, I do always have to thank my, my dear friends for helping me remember the things that I cannot possibly keep, uh, all contained in my head at once. So anyway, thank you for watching. I hope you enjoyed yourselves.
I hope you learned something and I will see you over in, uh, the next video, which should be an office hours episode because the next video will be on a Monday. Today being Friday. All right.
Cool. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.