Why You Shouldn't Use The Plan Cache To Figure Out CPU Load In SQL Server

Why You Shouldn’t Use The Plan Cache To Figure Out CPU Load In SQL Server

Thanks for watching!

Video Summary

In this video, I delve into why the CPU per database query can be misleading when trying to identify which databases are using the most CPU on a SQL Server. I share my frustration with the endless stream of regurgitated advice on LinkedIn and other platforms, particularly around SQL performance tips that seem to lack substance or context. The video explores common pitfalls of this query, such as plan cache disturbances from various factors like restarts, memory pressure, and recompile hints. I also highlight how recompile hints can make the query even less useful by removing plans from the cache. To address these issues, I suggest adding more detailed information to the query, such as the oldest and newest plan ages, total execution counts, and worker time statistics. Additionally, I share a humorous yet instructive story about my interaction with ChatGPT, which turned out to be hilariously wrong about optimized for ad hoc workloads queries showing up in DMExec query stats. This experience underscores the importance of validating AI-generated information before relying on it.

Full Transcript

It’s your friend Erik Darling here with Darling Data. And in today’s video, we’re going to talk about why that CPU per database query is pretty stupid. I didn’t realize until I signed up for LinkedIn how much awful content there was out there about like, I mean, well, SQL Server in particular, because I can sniff that out. But just data stuff in general, like every day, my feed is just absolutely inundated. I’ve been fascinated with these chat GPT posts that are like, like, top 10 rockstar SQL performance tips, avoid select star, dude. And you’re like, what? Like, I, if I ever catch one of these people in person, I’m going to smack them. Like, it’s just the dumbest, most regurgitated advice that I’ve ever heard in my life. And I don’t know, I don’t understand how it gets so much engagement. Like, I feel like LinkedIn has an algorithm that looks for, its own, like, like, like, its own, like, like, AI nonsense and boosts it. Like, anything that has, like, like, crappy emojis to as bullet points. It’s like, the world needs to see this. So check out this AI advancement. You’re like, oh, God. This is why I start drinking early. Anyway, before we talk about that, let’s, let’s do the housekeeping stuff.

If you feel monetarily motivated to support my channel, you can sign up for a membership. There were, there were some comments in the comments recently about how someone couldn’t see a way to sign up for memberships. And it really, so if you’re watching it, unlike, you’re watching it, like, using the YouTube app on a device, you won’t see it. Apparently, you have to be using a web browser. So what I’ve done to circumvent all these issues is I scourge the depths of Google, and I found the URL that you need to use, which is apparently your channel URL slash join. As a database person, I feel particularly stupid that I did not try to join. But that, that link will go in all of the videos going forward. And I also am going to backport that to older videos so that everyone can use the join sign up link and be happy.

If you, if you, if you find yourself in a situation where, oh, you don’t have an extra four or so dollars a month in your pocket that you feel like sharing with little old me, you can, you can support my channel in other ways. You can like posts and you can, videos, video posts, and you can comment on them. And you can also subscribe to the channel and join now 4,300 data darlings out there in the world who also subscribe to my channel and get pleasant little tinkly notifications every time I publish something.

If you are in need of SQL Server help, if you need a health check, performance analysis, hands-on tuning, if you are having a SQL Server emergency, or you want me to train your developers so that you have fewer SQL Server emergencies, as always, my rates are reasonable. If you want just some training, you can get a whole bunch of it for about 150 bucks US. If you go to that URL or click on the link in the video description and you use the discount code SPRINGCLEANING, you can get all that for 75% off, which is pretty, pretty good.

Once again, this image is generated by ChatGPT. I did not make spelling mistakes. I think it’s funny how bad ChatGPT is at things. And I will talk some about something that I caught ChatGPT being wrong about this morning when we’re talking through the demos. But if you want to see me live and in person, and you are the type of person, you can see the dates and places there.

And you are the type of person who needs bribes to show up to things. I have Darling Data pins, which are a little hard to see because of the glare and the bright lights. And I also have these hot SQL action pins, which are very hard to see because of the glare and the bright lights.

But I can assure you they are absolutely magnificent. I also have sparkly Darling Data stickers. Ooh la la. Look at that. Look at the sparkles.

So if that didn’t hypnotize you into coming to see me, I don’t know what will. So I have pointy bribes and I have sticky bribes. And I hope at least one of them works on you.

Anyway, now with my brand new redesigned ready to party slide, let’s do this. Got sick of all the white space and decided to spruce things up a little bit. So I want to make sure everyone out there knows that says let’s party.

These are not fencing videos, so it does not say let’s parry. And these are not baking videos, so it does not say let’s pastry. We are going to party.

We are not going to parry or pastry or any mix of the two. All right. Anyway, at some point in your long, glorious database career, you probably will have seen a query that looks something like this.

Where you go and look at sys.dmexec query stats. And you look at sys.dmexec plan attributes to get the database ID out. And you group the CPU by the database ID.

And then you do some fancy math to figure out what percent of the CPU on the server is occupied by query plans that do this stuff. Right?

Or that did stuff in a database. This can be very weird for a lot of reasons. And we’re going to talk about all of them because it needs to be talked about lest you think that this is a good idea. I guess if there’s nothing else for you to do, you’re that bored in your life.

I guess if you’re like the average no-lock enjoyer, this might be an okay query for you to run to get a crappy idea of which database uses the most CPU. This can be wrong for a number of reasons. If anything has disturbed the plan cache, either in total or in part, you know, restarts, memory pressure, changing settings, queries with recompile hints, someone running DBCC free proc cache, updating statistics and having plans get invalidated, and all sorts of other things, they can really hide stuff.

So this really isn’t a great query. Actually, another thing that can make work look weird is if queries originate in a different database than the database they’re querying that can also cause some issues. So this is, you know, not a very good query practically for most people to use.

If you’re on a server and you really have no idea which database uses the most CPU, that’s a little bit silly. On my server, rather on my laptop, it’s a rather stable place. You know, I don’t, at least at the moment because I’ve been preparing for this, I don’t have a lot to do on here that would mess things up.

But we could add a few things to this query to make it a little bit more useful. I still probably wouldn’t use this as like the ultimate arbiter of truth about which database uses the most CPU, but you could at least spruce things up a little bit so that there’s some better information in there.

Stuff like getting the oldest plan and the newest plan and the total query execution count for each database can get you somewhat better information about what’s going on, like on the server as a whole. Or at least give you some contextual information to help you, you know, figure out if this is actually a good piece, a bit of information for you.

Excuse me. Don’t know where that came from. I may have swallowed a fly. So for this, we have the total CPU.

We have the total query execution count. I don’t know that this is necessarily the order I would present this in if I were trying to, you know, make this a pretty report, but it’s pretty okay, at least for this video.

And, you know, it might help a little bit to figure out, you know, it’s like, well, do I have a few really hard-hitting queries? Do I have a lot of queries that just kind of do stuff?

Maybe help you figure that out. But then this plan cache age column up here, that’ll tell you the difference in, well, days, hours, minutes, seconds, and milliseconds between the oldest plan and the newest plan in your cache.

If that number is very big, you might have a little bit more confidence than if that number is very small. But there’s one thing that can always mess this up, and that is recompile hints or stuff getting removed from the plan cache. So if I run this, this should remove one of the plans from the cache for the Stack Overflow Clean database, at least if I did my math right.

And now Stack Overflow Clean, which used to be second up here, is now way down here with almost nothing on it. So a plan getting removed from the plan cache can really pull that out. Where that’s also true is if you use recompile hints.

So if I run this query with a recompile hint, right, and this thing will take about four seconds, but it uses a pretty good chunk of CPU. That was the big chunk of CPU that was in the results before that’s not there now.

And then we look here. Oops, you know what? I forgot. Okay, actually, this is illustrative. So this is all the crap.

I actually meant to show you this just later. But this is all the crap that you get in the plan cache that this query will count as being responsible for the databases using CPU. So there’s just like a whole bunch of like background system stuff that can happen in there.

But focusing in on the query that we care about, which is the one that has that funny looking GUID in the text, if we run that query with a recompile hint, it will not be in the plan cache. And we will not know that anything ever happened with it, right?

So if we come up here and we look, and please don’t fail me, please don’t fail me now, and we look at this, Stack Overflow Clean, even though we just ran a query that took four minutes of wall clock time and about 23 seconds of CPU time, that doesn’t show up in here. So recompile hints alone can make this query pretty useless.

If we come down here and we run this without the recompile hint, we just run this thing. And that’s another four seconds. And if we look at the query plan, I’ll show you what I mean about the, oops, go away, tooltip.

We don’t need you. We need the properties. And we look at the query time stats. I said query time stats. There is about 22 seconds of CPU right there, which with a recompile hint is totally unaccounted for.

If we come and look at this query now, where we’re searching for that funny looking GUID thing in there, now it’ll show up and we’ll see the execution count and we’ll see the worker time. But without the recompile hint on it, that’s fine.

But with the recompile hint, that all disappears. So coming back up here and looking, we should see our friend up in here. Now Stack Overflow Clean has an additional 22 seconds of CPU time associated with it.

But again, if we remove that query from the plan cache, it’ll all disappear. So this is not a great query for actually figuring something out. Again, if you have absolutely nothing better and you just, and you want like the worst, again, like the average NOLOCK enjoyer style of data, return data.

If you really just don’t care that much and you just want like a bad idea of which database uses the most CPU without anything else contextually around it, you can use that query. Then you can probably get a bunch of wrong information back, just like the average NOLOCK enjoyer.

Now, I promised you a funny story about ChatTPT, so here it is. While I was working up to this, one of the things that, you know, because I love picking on optimized for ad hoc workloads, one of the things that I was looking into was if the compiled plan stub queries would show up in DMExec query stats, which, you know, they did.

But since it was something that I had to like sort of answer for myself because I had never specifically looked at it, I decided to say, hey, ChatGPT, do queries, if I turn on, in SQL Server, if I turn on optimized for ad hoc workloads, do the queries that get a plan stub, will their CPU, like, will their resource usage show up in sys.dmexec query stats?

And ChatGPT was like, no, won’t do it. They won’t be in there until the second execution when they get a full-blown plan, and then they’ll be in DMExec query stats, and then you’ll be able to see them.

And I’m sitting here staring at SQL Server, completely contradicting that. And I was amused by it, and I was like, hey, ChatGPT, can you write me a query that would prove that? So it prints out this query that looks at, you know, sys.dmexec query stats and cache plans, and, you know, sort of like up above, it gets the database ID out, and then in cache plans, it matches the plan handle there to sys.dmexec query stats and the database ID for both of those.

And it was like, yeah, here you go. And so I ran it, and of course I had a bunch of hits between cache plans and query stats. And so I came back, it was like, hey, ChatGPT, your query’s returning rows.

I think they do show up in there. And ChatGPT was like, you’re right, I was totally wrong about that. Cache plans, compile plans, steps do show up in there.

Here’s a query that’ll prove it, and I reprinted the query that it just showed me when I asked it to prove that they wouldn’t show up in there. So for God’s sake, stop caring about AI.

It’s always wrong. It’s always wrong about these things. You should think of any AI as being sort of like, pretend you have an administrative assistant, and you might ask an administrative assistant to write up a short summary of something for you, or take some notes, but then you would also validate it.

If you asked your administrative assistant to say, hey, come up with a report that shows me these things, you would probably want to make sure that you wouldn’t grab that report and then run to a board meeting with it. You would probably check the numbers first.

So if you’re out there in the world really sweating AI and thinking that it’s going to change your life and it’s going to take your job, it’s not quite there yet. You don’t have a lot to worry about.

It’s actually a rather sad state of affairs. So anyway, that’s my amusing slash depressing ChatGPT conversation from this morning. And of course, ChatGPT is what backs Copilot.

You know, there’s other ones out there like Claude and stuff. And Claude is a little bit better, but Claude got this one wrong too. So, you know, AI.

What can you do except laugh at people who think it’s good for anything? All right. Well, that’s about enough here.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. For the record, I will not be publishing any of the queries from this because I still find them equally useless. I should actually, before I wrap up, I should give you a couple ways that are better at doing this.

If you’re on Enterprise Edition of SQL Server, you can use Resource Governor, assuming that different databases have different logins associated with them. And that would be one way to sort of get a rough idea of which workload groups for which database are using the most CPU because it does track that. The other thing that you could do is if you are on a reasonable version of SQL Server, and by reasonable, I mean it’s at least 2016, you could also use Query Store.

And Query Store is a bit more historical than the plan cache. And you could go to each database and, you know, SP, whatever, in each, for each, around each, some preposition each database. And you could hit the Query Store DMVs to figure out which ones have the most kind of CPU usage in there.

That would give you a much closer idea to reality because, you know, Query Store usually generally has, like, I think by default, like 30 days of data in it, which is probably a much better indication of which databases are using a lot of CPU. So the plan cache query these days, real stupid, old and busted. You know, I think, you know, resource governor and Query Store are to much better options.

So with that finally out of the way. Once again, I hope you enjoyed yourselves. I hope you learned something. I hope that you will join the 4,300 other data darlings who subscribe to this channel and get notified when I drop videos because that’s – it’s probably a pretty good group.

It’s probably better than the group of people who generate AI content for LinkedIn and all that. People who think that they can use that plan cache query to figure out which databases have the most CPU work on the server because ship of fools, isn’t it? Ship of fools.

Anyway, thank you for watching. I have other videos to record, as you can see by the numerous tabs up top. So I’m going to close this depressing one and we’re going to do something else. I don’t know what’s next.

I haven’t decided yet. You’ll find out when we get there, I guess. All right. Thank you for watching. My rates are reasonable. Thank you.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.