Memethread
Thanks for watching!
Video Summary
In this video, I delve into a critical issue related to QueryStore and its impact on CmemThread weights in SQL Server. Initially, my investigation led me to incorrectly attribute the problem to the plan cache, but further analysis revealed that it was actually due to QueryStore being enabled. I walk through setting up various configurations, including turning QueryStore on and off, optimizing for ad hoc workloads, and forcing parameterization, to demonstrate how these settings affect CmemThread weights and query store size. The video highlights the potential pitfalls of relying solely on QueryStore without proper management, emphasizing that optimized for ad hoc workloads does not address this specific issue. I also provide practical solutions and insights into managing query store effectively to avoid excessive memory grants and plan cache bloat.
Full Transcript
Hello! Erik Darling here with Erik Darling Data and I am here to apologize and make right on a video that I recorded yesterday about CmemThread and the plan cache. I was devastatingly, horribly, horribly wrong. And I didn’t realize it until later that I had a setting on on my server, on my laptop, on my database called QueryStore turned on. And the CmemThreadWeights were not the fault of the plan cache, they were the fault of QueryStore. And I’m going to show you exactly what happened. So right now, I’ve got some queries up on the screen. This query here will tell me if I have optimized for ad hoc workloads turned on, if I have parameterization set to forced or not, and the state of my QueryStore thing for the Stack Overflow 2013 database. I’ve also got some queries here to turn QueryStore on and off and to clear it out. So my capture mode is going to be read-write, my max storage size is going to be 100 megs, my capture mode is going to be auto, meaning I don’t capture absolutely everything. I only capture stuff that SQL Server thinks is important. And I’ve got size based cleanup mode turned on, meaning when QueryStore hits 100 megs, size based cleanup should kick in and clear everything right out. I’ve also got some queries here to look at how many cached plans I have, and to look at if I’m hitting CmemThreadWeights, and to look at the current size of QueryStore for the Stack Overflow 2013 database. I’ve got some other utility queries down here. This is to clear out the plan cache and to clear out weight stats on my server so that I can refresh things and between runs to show you differences. I’ve got my O stress command here set to run 20 copies of my randomizer store procedure, which I’ll talk to you about in a minute. And that does some stuff over there. And then I’ve also got queries to turn optimized for ad hoc workloads on and off and just set parameterization to simple and forced. My randomizer query looks like this. It grabs some weird values from the sys.messages table, six minute cursor, six minute cursor, and basically iterates over that cursor. And depending on what seed value this thing has, let me stick a semicolon there to make sure that I have everything correct. I would hate to not terminate a statement properly.
But depending on what number comes up here, I’m going to select from a different table in the Stack Overflow database, depending on that. So if it’s one, badges, two comments, three posts, four users, five votes. And I’m going to pass the two little funny values that I get from sys.messages up here, which looks something like this, into the end clause down here. So that’s what that randomizer store procedure does. It’s pretty simple. Pretty simple stuff, right? Good stuff there.
If we come back over here, what I’m going to do is I’m going to look at what I currently have going on. And in my data’s, various data’s, I have 169 plans in the cache. I have no seam emptied weights. And query store, because it’s off right now, has a size of zero megs. Now what I’m going to do with query store off, and this is to set up a baseline, clear that out.
And what I’m going to do is I’m going to kick off that O stress command, and I’m going to look at what happens. Now we’re running that query across 20 threads, just to simulate like, you know, sort of a heavy ad hoc workload. And as we hit F5 on these, the plans in the cache will start going up, and Cmem thread weights will stay relatively low.
Right now we have 10,000 plans in the cache and no Cmem thread. So if I keep running this and looking at things, I don’t know why that changed context, Tommy. That was a little bit weird. But if I keep running these, now I have 51,000 plans in the cache.
21 things have waited on Cmem thread, and that’s not a lot. I go over here and I keep running these. 80, about 73,000 plans in the cache. Still relatively low Cmem thread.
Only 70 things have waited. I’m not worried about this kind of Cmem thread weight. If I just keep doing this, Cmem thread is not going to go up a whole lot. That’s the baseline. That’s to show you that it’s not the plan cache’s fault that our ad hoc workload is generating a bunch of Cmem thread.
It’s not the plan cache. Whoever says it’s not the plan cache out there, I agree with you 100%. It’s not the plan cache’s fault. Now, I want to show you what happens that’s different when I turn query store on.
So I’m going to hit this, and I’m going to set query store to on for this database. And so we are now capturing data. If we go back up here and rerun this query, we will see that query store is set to read write.
And so the desired and the actual state are both in read write mode. So cool, we have query store turned on. We should be rocking and rolling at this point.
Let’s go rerun our workload again. F5 there. Not F5. Hit enter there. Sometimes I get confused.
But now when I start running this, Cmem thread weights are going to start jumping up really, really quickly. So I hit this, and we’re at 12,000 plans. And as we go up, Cmem thread weights are going to go up and up much faster than they do without query store turned on.
So you can see that I’m hitting some weights here. I’m already up to 1,000 Cmem thread weights. And you can see the size of query store steadily increasing.
We’re up to 63 megs. We’re up to 69 megs. And as we keep going up here, Cmem thread is going to keep increasing. Cmem thread is going to keep going up and up and up.
So this was not happening with the plan cache. This is only happening with query store turned on. So now I’m running all this stuff. Now we’re up to 4,200 Cmem thread weights. And now the query store size is bigger than the max size that I set it to.
If I keep hitting this, query store is going to keep getting bigger and bigger and bigger. And our Cmem thread weights are going to keep going up and up and up. Right now the current size of query store is 166, 173.
This is just going to keep going up. And Cmem thread weights are going to keep going up with it. Query store is now at about twice the max size that I set it to.
Size-based cleanup should be kicking in and clearing stuff out. I just don’t think it can keep up right now. So I’m sticking all these queries in.
I’m hitting this. And you can just see everything piling up. Query store is now nearly 250 megs. And Cmem thread weights keep on going up and up and up and up and up and up.
Now, this can be a real problem for a lot of people. If you have an ad hoc workload like this, you might be very concerned that, you know, the query store is going to overfill, that, you know, cleanup is never going to quite catch up.
And now you’re going to have all these funky Cmem thread weights going on in your server. At this point now, query store is nearly three times the size of the max size that I set it to. And cleanup doesn’t really seem to be doing its thing.
We’re getting up really close to 300 megs there. So what I’m going to do is I’m going to kill off of this. And I’m going to talk about a couple options that I thought I would walk through to try and fix this.
Now, the first one that I tried was optimized for ad hoc workloads. Now, if we look at, to turn optimized for ad hoc workloads on, if you’re in Azure SQL DB, you can do this with a database scope configuration.
If you’re on regular SQL Server, you can. It would be nice if that made it there. I tried it last night and I just kept getting errors. And then I looked more closely at the documentation. It was like, oh, yeah, only for Azure SQL DB.
So right now we can only turn this on at the server level for SQL Server 2017. So I’m going to do that. I’m going to turn on optimized for ad hoc workloads. And I’m going to give one more check-in up here to see what things look like.
And you’ll see that now that our query store size is nearly 400 megs, size-based cleanup is not doing its thing. So we’re going to do its job for it.
And hopefully this will work. Hopefully. We might be sitting here for a while. Oh, there we go.
That only took 13 seconds. Good for us. Okay. So let’s see how big query store is currently. All right. Zero megs. Good. We cleared all that stuff out. And let’s go check our stuff over here. So now we have optimized for ad hoc workloads turned on.
That is at 1. Sweet. Now, with that at 1, let’s go clear out all that stuff that we had before. And let’s check back in.
Let’s make sure our queries are returning close to zero. All right. So four plans in cache. I can live with that. That’s some system stuff going on in the background. See memthreads at zeros across the board. And query store is off.
So we’re at zero there. Because we have a baseline with the ad hoc, with the workload running without any special settings and see memthread being low, I’m not going to repeat it with query store off and ad hoc turned on.
There’s no point. That’s not helping us here. That’s not helping the effect that we’re having. The effect that we’re specifically having is that with query store turned on, we hit a lot of cmemthread weights.
So I’m going to turn query store back on. So now if we go check in on this, we should see query store set to read write. And we should see the value in use set to 1 for ad hoc workloads.
Okay. With that in place, let’s go back over here. Let’s hit F5. Let’s see. We have 10 plans in the cache now. Good heavens. We still have no cmemthread.
And, of course, query store size is zero. Because why would it be anything else? We just turned it back on. Let’s hit F5 there. Or enter or up arrow, enter or whatever you want to call it. Whatever.
O stress. And then I’m going to start running these. And when I run these, plans in the cache are going to start piling up. And so is cmemthread.
And so is the size of query store. And if I keep doing this and we keep looking at stuff. And, again, this just keeps bizarrely changing context on me. We’ll see kind of the same pattern start to happen.
Where my mouse refuses to click and work in the right way. I don’t know what it is. I’m going to stop spending $100 on mice that stop working after a few months. It’s pretty ridiculous.
But if I keep running these queries, we’re going to keep seeing. We’re going to see the number of plans in the cache pile up. We’re going to see the cmemthread weights pile up. And we’re going to see the size of query store just keep going up and getting bigger and bigger. The two important things to take away from here is that optimized for ad hoc workloads is not fixing this specific problem.
Where we’re hitting cmemthread weights. And the size of query store is going up way above the max that we set it at. Size-based cleanup is also not doing its job in cleaning this thing up fast enough.
So we’re actually facing the exact same problem. Optimize for ad hoc workloads isn’t helping us here. I’m not going to let that run for as long because we already know kind of what’s happening.
So let’s get rid of query store. We already killed our workload. Going to get that off. Okay, cool. Now let’s turn off optimized for ad hoc workloads. I want to make sure that I have a fresh start here.
And let’s turn on forced parameterization. So forced parameterization is turned on. And let’s clear out all of this stuff. And let’s see what we have here.
Oh, wrong button. It’s a morning of wrong buttons. All right. So four plans in the cache. No cmemthread weights. Query store is, of course, at zero megs because we have it turned off right now. So I’m going to go turn query store on.
Bada bing, bada boom. Let’s go make sure that we have all the stuff we want here. Perfect. So read, write, read, write. Parameterization is set to forced.
All right. Let’s check in on these numbers. Make sure we don’t have anything too weird. We don’t. Nine plans in the cache. No cmemthread. Query store is at zero because we just turned it on. And we’re going to go hit whatever damn, whatever you want to call that button, we’re going to hit it.
I’m going to run this thing. Run it. Come over here. Highlight you. And let’s start running this thing.
All right. So what we’re going to notice with forced parameterization turned on, at least if this demo works as it has worked all morning, which it very well may not. Who knows?
As soon as you’re recording, every bad thing happens. Everything goes wrong when you start recording. Anyway. So what we see here is the number of plans in the cache is skyrocketing. We’re up to 100 and something thousand, 157,000, but very low cmemthread.
The other thing I want you to notice is that the size of query store is zero megs. It’s not that we’re not collecting anything. It’s just that when SQL Server is doing a pretty good job of only putting one copy of the parameterized plan in there.
So as we go through and we look at this, we can see that query store size is not ballooning. Cmemthread weights are staying relatively low compared to other times when we ran this. But the number of plans in the cache is sort of astoundingly high.
Now, I’m going to keep running this. And basically what we’re going to see here is that forced parameterization does help us avoid the majority of the cmemthread weights and the ballooning size of query store. For some reason, that’s not helping us with the plan cache.
With the plan cache, we’re still not getting very good plan reuse. So that’s an issue, but that’s an issue for another day. This video is pretty specifically aimed at cmemthread weights as they relate to the query store and how big the query store gets and how it doesn’t clean up and all the other kind of crazy stuff.
I had a much, much worse time with this last night when I was originally running through the demo trying to get it worked out. At one point, query store was 700 megs, and the only way I could get it to turn off and clear out was by restarting my entire laptop. And that was not pleasant because in my head I’m thinking, my God, if you’re doing this in production, you don’t want to have to restart production just to get query store to clean out.
Anyway, my name’s Erik Darling, that one, with Erik Darling Data. And thank you for watching. I hope you got some useful information out of this.
And I will see you, I don’t know, maybe in a week or so because it’s the weekend and I feel like doing other things. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
Hey Erik-
What version and build are you running with here? I didn’t hear it at the beginning of the video and in looking at SSMS I believe it’s SQL Server 2017 RTM, but just wanted to confirm.
Thanks,
Erin
Yep, it’s 2017, but CU15. SSMS says RTM at the bottom no matter which CU you’re on ?
Microsoft SQL Server 2017 (RTM-CU15) (KB4498951) – 14.0.3162.1 (X64) May 15 2019 19:14:30
Thanks!
Ooh, could be! I checked for weird error log messages and stack dumps, but I didn’t see anything.
Thanks!
Ok, thanks for clarifying, I suspect you’re running into this issue:
https://support.microsoft.com/en-us/help/4501205/fix-non-yielding-scheduler-occurs-when-you-clean-up-in-memory-runtime
Which isn’t fixed in a 2017 CU yet (hopefully soon).