How I Use sp_BlitzCache To Find Poor Performing SQL Server Queries To Tune

Video Summary

In this video, I delve into various ways to utilize SP Blitz Cache, a powerful stored procedure originally created by Jeremiah Peshka and maintained by many happy users over the years. I discuss how to navigate through different sort orders and parameters within SP Blitz Cache, such as average CPU, unused memory grant, and executions per minute (XPM), to identify queries that need optimization. Additionally, I highlight the importance of considering query plan instability in the Plank Cache, especially when compared to the more robust Query Store, and share practical tips for managing memory grants and identifying non-inlineable scalar UDFs or frequently executed code snippets.

Full Transcript

Erik Darling here again. Darling Data. No one from Microsoft has yet tried to acquire me based on the terms and conditions laid out in my last video, so I guess I’ll just soldier on here and keep pumping these things out. I hope that someone out there says, boy, that fellow looks useful. Let’s hire him. Let’s do some stuff to our SQL servers. Or let’s just watch all my videos and learn all my tricks. Who knows? Who knows what will happen? It’s a cliffhanger. So this video is different ways that I tend to use SP Blitz Cache, originally written by the lovely talented, hyper-ultra-cool Jeremiah Peshka, and of course maintained over these many years by thousands of happy people. Okay? Maybe like a dozen happy people. I don’t know. I guess depending on how much of the query plan XML they had to dig into would dictate if they are indeed truly happy people or not. Because I can tell you, when I was adding a lot of XML parse into this, it was some of the roughest days of my life.

So, I’m going to talk about, again, demo server hasn’t been a lot going on in there, so there’s not really going to be any big fireworks if I run this to get results. So I’m just going to talk through some of the different ways and sort of caveats and gotchas that I use SP Blitz Cache to figure out. Now, I’m just going to say it out loud and clear. The Plank Cache is a terribly unstable place. oftentimes when I go to look and see what’s in there, there’s a little warning that pops up in Blitz Cache about Plank Cache instability. This is one thing that I will show you.

It’s going to be a line like this here. So because my Plank Cache is apparently relatively stable, there’s not a lot going on in my server. You can kind of tell by this, like a lot of just the system queries that have been running on here. My kid put a Band-Aid on me. She pretended to give me a shot, actually two of them. So that’s why I have Band-Aids on me. I just forgot to take them off. And you know what? I rather like them, so I’m going to keep them on. So depending on how unstable your Plank Cache is, this might move up higher in the priorities.

But this line here will tell you, sorry, this line here will tell you what percentage were created within a few time frames. They’re not terribly scientific. It’s just last 24 hours, last 4 hour and last hour. And a lot of the times when I see this on servers that are like really active, you’ll see something like a, like, you know, some really high percentage of plans were created in the last, like, you know, one in four hours and like almost a hundred percent were created in the last 24 hours.

And it just doesn’t give you the type of real historical view of which queries ran on your server and caused problems at various points. It’s all sort of very, I don’t know, it seems very short term to me just for a lot of reasons. The Plank Cache clears out for, you know, memory pressure restarts, some settings changes, you know, people, you know, dorking around with stuff.

And particularly a lot of the servers that I look at, you know, tend to have pretty severe memory pressure issues. So we’re lucky if the Plank Cache even has like, you know, a decent amount of plans that are an hour or more old. So it’s a little, it’s a little tough to love the Plank Cache these days, you know, back when it was all we had made the best of it.

But now that, now that we have Query Store, I tend to spend more time in there than anything. Now, one of my go-to sort orders for Blitz Cache, and even for, you know, you looking at the Query Store with my store procedure, SP Quickie Store, is average CPU. And the reason for that is because when I find queries that have a high average CPU, I have pretty good confidence that those are queries that I can make meaningful adjustments to to fix.

The thing is, sometimes when you sort by average CPU, you find stuff that doesn’t matter a lot. It might only execute once a day, it might execute once a night, you know, it might be some background task that doesn’t really have any impact on end users, aside from the fact that, you know, it took up resources that the end users could have used while the query was running.

And that can, you know, that can be a little frustrating, you know, especially, you know, if you’re working with people who don’t know what, like, you know, sort of, like, can’t point you to which queries they want fixed. It’s up to you to go and find them. And if you find, like, if you know your initial search is a bunch of meaningless queries that don’t really do a whole lot to affect the sort of general user workload, then you haven’t really found queries worth tuning.

You know, you might, you might have it, like, people say they’re not doing anything bad, but they’re actually doing a lot of bad stuff. But, you know, again, you tend to want to focus on the things that, like, users are really feeling up front. So, average CPU is my favorite sort order, but sometimes you need to add some caveats to that.

Now, one good caveat to add to that is that some queries need to have some minimum number of executions in order to be considered, you know, part of the regular user workload and things that are worth tuning. Sort of generally, the more frequently something runs, the less chance there is of it having, like, a super high average CPU. Like, it very well could, but, you know, a lot of the times when I go and look for this stuff, you know, like, as I bump the minimum execution count up, the average CPU tends to trend down.

Like, there are going to be some outliers in there, but for the most part, that’s what happens. So, there is a, you do have to sort of strike a balance between, you know, what you, like, finding long-running queries and finding long-running queries that, you know, end users are the ones executing. So, the other thing to keep in mind, too, is something that I brought up in a number of videos is that, you know, not every query is going to be properly parameterized.

You might have, if you have unparameterized queries and SQL servers interpreting, or even if you have queries that are parameterized but not strongly typed, so things like strings end up getting, like, all different lengths when they get passed in. SQL Server is going to be interpreting those as brand new queries, and you’re going to get brand new query plans, almost all of them, unless you, you know, happen to match something exactly. You might see, like, a couple of you get reused, but, like, those would only have a, might only show a single execution, but there would be many sort of variants of that query that all got.

Like, like, the same or a similarly, or I’m not even going to call the execution plan bad. I’m just going to say that the query didn’t perform well. So, there might be, like, you know, what looks like one query that didn’t perform well, but it’s actually just one query that’s a variation of, like, 50 queries that all got, you know, all took a long time, but they just had, like, a couple few literal values different in them.

So, you know, sometimes you don’t want to set that minimum execution count too, too high because you’ll miss out on a lot of the stuff that, you know, could be painful to end users, but just may not appear to be because of the execution count that you see from the plan cache. I don’t know. Usually, when that’s the case, though, you’ll see, like, five or six queries that all, like, look like the same beginning of a select statement or something.

And then as you get further into the query text, you’ll find the literal values that caused the SQL Server to interpret them as different plans. But, you know, again, all things you learn over time. One particularly interesting sort order that SP BlitzCache has is unused memory grant.

So unused memory grant is good because the plan cache actually has something that QueryStore doesn’t in that the plan cache tracks how much memory a query was granted and how much memory a query used. The QueryStore only tracks how much a query used. It doesn’t track what the grant was.

I would imagine this has something to do with all the memory grant feedback stuff, maybe. I’m not sure. But I don’t know. It’s just, it seems weird to me for QueryStore not to have that.

It’s just, I don’t understand what led to that design choice. So, but these are good things to track down. You know, by default, SQL Server will give 25, well, 20 to 25% of your server’s max server memory setting to a single query.

Right? The more memory you have in a server, the higher your max server memory setting goes, the higher that memory grant can go. And a lot of the times just due to, you know, the bad estimates or, you know, lots of reasons why SQL Server might not come up with a great estimate.

You could overestimate the number of rows by quite a bit. And overestimating the number of rows will inflate your memory grant. So I’m actually working with a client now that has a couple terabytes of memory in their server.

And there are regularly queries that ask for an excess of 300 gigs for a memory grant, but only end up using like 200 or 300 megs of it. In their case, it’s a slightly easier thing to take care of in one swath because you can change resource governor to change that max query percent to a much lower number that’s more suitable to query, to servers with a lot of memory on them to reduce query memory grants down sort of globally. But if you’re on standard edition, you’re screwed and you have to use like the max grant percent hints if you can.

Query store hints will be really useful for that because, you know, we’ll have a little bit more control over stuff. I don’t quite trust the memory grant feedback mechanism to take care of things the way it should in all cases. So, you know, we have that to contend with.

Another good way to look at your plan cache. And this is a great way to find non-inlineable scalar UDFs, cursors, other things like that that have a really high frequency of execution. That sort order there is XPM.

That’s executions per minute. I don’t know why it’s XPM and not EPM. I could ask Jeremiah that someday. He’s the one who came up with that rambunctious little shortcut.

I’m not quite sure. You know what? I like it. XPM sounds cool. XPM sounds exciting. So that’s a good way to find really frequently run code.

You might also just find a lot of queries that like run hundreds of thousands of times for like a millisecond. Could be an application bug. Could be something that could be cached somewhere.

Who knows? Lots of weird little things like that you might run into. One thing that I do find myself using quite a bit though is using Blitzcache to only look for certain SQL handles in the plan cache. And this actually comes back to something that I covered with SP Quickie Store where you might be looking at the blocked process report or you might be looking at the XML deadlock report.

And, you know, when queries lock and deadlock, you know, it’s not always obvious from the query text in those reports what the plan would be and why they end up deadlocking and locking so much. So, but in those reports, they have SQL handles in them for the queries that, you know, SQL Server thinks were involved in the blocking and deadlocks. It can sometimes be misleading, but that’s a story for another day.

So it’s always, I find it useful to look in the plan cache to find by SQL handle queries that come out of there. It’s been my experience for some reason, you know, well, it’s been my experience that for some reason, like finding things by those SQL handles and query stores, then a lot of, a lot more misses than hits. I’m not sure why, it might, might, might be the auto capture mode or it might be some other, you know, internal thing that is keeping queries out of there.

But I find that I have much better luck finding an execution plan in the plan cache for a lot of the blocking and deadlock report queries that come up. So I end up using that quite a bit. And, and, and, you know, finally, of course, if you only care about like plans from a database, then, you know, you can filter zoom right into that one single database and have a long and fulfilling day of laughing at query plans.

That’s usually, usually the stuff that I do with it. There are some other kind of interesting parameters too. I guess if I had to pick some others that might, might get you all excited.

Skip analysis is a, is a good one, especially if servers are really overloaded or you have really big execution plans or, you know, you just want quick results. Skip analysis will skip all of the XML stuff. So you won’t, you won’t do all the XML shredding and parsing to find particular issues in there, which is sometimes fine, honestly, because, you know, sometimes you don’t need it.

Sometimes you just need to get results back and you can figure things out on your own. But that’s a good way to get results much faster is if you use the skip analysis parameter. So that’s generally the stuff that I end up doing with Blitzcache.

Blitzcache is a very good store procedure for, I think, a lot of the, the warnings it gives you about what’s going on in execution plans are really valuable. You know, especially for, you know, some of the stuff that is not quite as easily visible to, to people who aren’t used to, used to looking at execution plans. You know, all the warnings that show up in the warnings column are really good and sort of high level overview of stuff that ain’t good.

It’s also, it’s also a good set of stuff to, to have in there because there are, there are query anti-patterns that it, that it picks up on that people might know can be bad. And seeing it show up in the warnings, they might do a little bit of research and dig deeper into just why they’ve made a bad choice in the way they’ve written the query. Table variables, scalar UDFs, things like that.

All good things to know about. So, I just ate lunch before this one, so I’m a little sleepy. I’m going to have some, I’m going to have some espresso before I do this next video and then, I don’t know.

It’s also hot in here. These, these recording lights are, I feel like a rotisserie chicken. Sometimes I feel like I kind of look like one.

I get this pink hue. It’s unnatural pink, pinkish, orangish hue to me that, a little disturbing. I don’t know.

Anyway, I’m going to go make some espresso. I hope you enjoyed yourselves. I hope you learned a thing or two about SP Blitz Cash, about the, the, the plan cash, about, well, I don’t know. Love, life, dancing, caffeination.

I will, I will see you in another video. Don’t forget to do the old like and subscribe. I, I, I record as often as humanly possible.

All right. Cool. Thanks for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.