SQL Server Performance Office Hours Episode 48
| Hi Erik, Happy New Year! On my SQL Server production system, the top waits include CMEMTHREAD and NUMA waits, accumulating a few hours of wait time. I couldn’t find much information online, but I believe that this may be related to parallelism or bad MAXDOP settings. What should I check next? My MAXDOP and CTFP are set to reasonable defaults, and I want to determine if these are causing unseen background issues. |
| Hi Erik, Happy New Year! On my SQL Server production system, I see that the VDI_CLIENT_OTHER wait is at the top of the wait list. Is this connected to slow AGs? Can I ignore it safely? |
| We increased MAXDOP from 4 to 8 and the query stopped spilling to tempdb and actually ran faster. Isn’t more parallelism supposed to make things worse? |
| We replaced a CTE with a temp table and the query got faster, so CTEs must be inherently slower. Should we avoid CTEs for performance? |
| We captured the execution plan when the query was fast and when it was slow — they’re identical. Since the plan didn’t change, this can’t be a query problem, right? |
To ask your questions, head over here.
Video Summary
In this video, I dive into some interesting SQL Server questions submitted by the community. Erik Darling from Darling Data shares insights on topics ranging from CMEM thread weights and trace flag 8048 to the performance implications of parallelism and CTEs versus temp tables. The discussion is both technical and practical, offering real-world advice for managing SQL Server environments. Whether you’re dealing with AG-related issues or optimizing query plans, there’s plenty to learn that can help improve your database management strategies. I also explore the nuances of parameter sensitivity and data distribution, providing a deeper understanding of how these factors impact query performance. By sharing this knowledge, my goal is to empower SQL Server administrators and developers to make informed decisions about their workloads and configurations.
Full Transcript
Erik Darling here, Darling Data, home of the most reasonable rates on the internet. It is Monday, unfortunately, and I apologize in advance, but at least on this Monday you get some office hours. You get to hear me answer five community submitted questions. If I ever get a palm tattoo, that’ll be, maybe I’ll get the number five tattooed on my phone. I don’t know. Some Roman numerals. It’s a binary for five or something. I don’t know. Something dorky, right? Five user community submitted questions to me, yours truly, about SQL Server or whatever. I do say it’s SQL Server questions, but people have asked me about things that I never dreamed anyone would care. ask me about. So, that out of the way. Down in the video description, you’ll see all sorts of helpful links. If you haven’t clicked on these links in a while or ever, you might be surprised to see that I have redone a lot of things in there. There are all sorts of updated consulting packages, some of them at lower price points for those of you who need different types of consulting. There are different types of consulting than things that a typical bucket of hours might cover. Down in the training section, too, there have been some updates there. I have bundled, I have re-bundled the everything bundle to actually be all of my SQL Server performance content into one helpful package. And there are now, you now have the ability to buy that with a subscription. So, you can choose a yearly subscription or a monthly subscription.
If you want to check out that training. You can also, for as few as four US American dollars a month, throw a little support towards this YouTube channel for all of the incredibly high quality content that I provide here. And you can also find the link to ask me office hours questions and of course, the usual shindig where we like subscribe and we tell our friends about the wonder and glory of the darling. And the new day to YouTube channel. I will be leaving the house quite a bit. There are going to be well between one and three more of these showing up in the near future. I will be performing my I’ll be prosecuting my my advanced T-SQL training at all three of these events. I have pre cons at all of these data to Nashville data Saturday Chicago and SQL day Poland. starting in March of 2020.
So, I will be there dressed festively, you know, festooned to the gills and all sorts of things. Poland might be the first place where my all over Adidas gear is finally appreciated. So, see about that.
But anyway, it is, I mean, I don’t, I don’t know. I think, you know, like it, New Year’s starting to wear off a little bit. I’m not really, I don’t know if I’m feeling the 2026 file.
So, I’m going to figure out something new for this. I got to figure out a good background image here. Anyway, let’s get out of this.
And let’s go over to our wonderful Excel file. Our Excel file of hopes and dreams and wonders. Where we will attempt to give the people what they desire. Anyway, here we go.
Let’s put up our wonderful zoom it highlighting here. Our very highly accurate zoom it highlighting. Hi, Eric. Happy New Year. Happy New Year to you too. On my SQL Server production system, the top weights include CMEM threaded NUMA weights.
Accumulating a few hours of wait time. Your top weights have a few hours of wait time? There’s a lot of questions I have here.
Like, I mean, obviously, I mean, I don’t know. You’re looking to a server startup, but how long has your server been up where a few hours is concerning? You need to ask me about it.
Anyway, I couldn’t find much information online. That’s weird because I’ve posted about CMEM thread weights. That’s a weird one.
But I believe this may be related to parallelism wrong or bad maxed up wrong. What should I check next? Well, good question. So, trace lag 8048 can be helpful in reducing CMEM threaded NUMA weights.
Way back in SQL Server 2016, Microsoft had this series of blog posts called It Just Runs Faster.
And, you know, some of them were like, all right, cool, yeah, it does run faster. But other ones, like there was one about DVCC CheckDB running faster. And everyone was like, how’d they do it? And they were like, we skip a bunch of checks now.
And you’re like, oh, okay. Oh, that’s not cool. So, you know, how do you run faster? Well, I got in a car. Like, I took a bus and a train.
That’s why I ran faster. So, but there was one of these It Just Runs Faster posts was, so like back in the old, like pre-2016, if you had like more than eight CPUs or something, and you ran into this CMEM thread weight, there was a, I’ll talk about what CMEM thread is in a moment.
There’s a trace flag called 8048. You can enable it globally, like now on your server, but the most effective way of turning it on is when the server starts up, because there’s a whole bunch of memory stuff that is already sort of allocated and accumulated that 8048, to turning 8048 on in flight won’t help with.
So if you make it a startup trace flag, it affects way more things. But the whole, the reason, like CMEM thread is effectively like, like memory structures being, and like having like exclusive latches taken on them, and other things trying to take latches can’t get latches on things.
So CMEM thread is sort of like a memory safety weight. So it’s not really to do with parallelism. You’ll most, I see a lot of, there’s been a lot of stuff around it with sort of like the plan cache and query store.
And it’s essentially just like, you know, like, like just think like, like in the context of the plan cache, just think of it like, like something’s putting a plan in the cache and other things are trying to figure out if a plan is in the cache and stuff like that.
So it’s like the whole mess of latches. So you could try trace flag 8048 to see if that alleviates it, regardless of if you are on 2016 or newer.
The whole deal with 2016 was that like, like if you hit certain like thresholds, or you like reached a certain level of like contention, then SQL Server would try to sort of like on the fly, change the way that memory is partitioned from like a NUMA node to a CPU, so that there was less contention between like different things, different accessors of things in memory.
So like, like it kind of worked. But if you’ll just like keep like approaching the threshold or keep like approaching like the contention that it would worry about, then you never actually get there, then this like the it just runs faster apart doesn’t like kick in.
So what you really have to do is, or rather the easiest thing to do is just in your hopefully, which you’ll hopefully have access to SQL Server settings.
If you go into like the SQL Configuration Manager, add minus T8048 as a startup trace flag, reboot your SQL Server whenever you feel like, because, you know, who cares?
And you’ll be good from there, or you’ll be better from there. But, you know, I mean, C-Mem Thread and NUMA, like you really have to sort of be hitting quite a bit of that.
There were some spin locks at one point that made sense to look at, but I’ll be damned if I can remember those hieroglyphics off the top of my head. But, yeah, yeah, trace flag 8048 should straighten you out.
But, you know, what you’ve presented me with here are, you know, is this like a question that I can generally answer?
I can’t tell you if turning on trace flag 8048 and alleviating some or any of the C-Mem Thread weights that you’re currently happening, currently experiencing rather, will like massively improve your SQL Server performance experience.
It might make you feel better, right? It might be like, you know, like a CX packet thing where people are like, I changed MacStop to one.
CX packet’s gone. Oh, I can sleep at night. Like, I don’t know if it’s going to like make your life better. That’s that, you know, if you really want to get rid of it, just wanted to know some SQL Jeopardy about stuff, there you go.
But, I don’t know if it’s going to make your life better there. I don’t know if it’s really going to improve anything. Let’s see here. Oh, we got another happy new year. Oh, we got two happy new years.
Hi, Eric. Happy new year. On my SQL Server production system, I see that the VDI client Otherweight is at the top of the wait list. Is this connected to slow AGs? Can I ignore it safely?
I have never seen that thing matter. I can’t for the life of me figure out why that would matter. I think, like, I guess it could be like slow AGs. And that, like, if you, like, I remember maybe it being connected to, like, the initial direct seeding of databases.
So, like, again, go out, go really going back in time machine episode. 2016 introduced a feature for AGs called direct seeding, where, like, anytime you added a database to SQL Server, it would be like, hey, new database.
You’re going in the AG. Now we’re just going to take a backup single thread and throw you on the other server. And for big databases, that could not be fun. But, so, like, I guess maybe for, like, direct seeding could be related, but just, like, general AG activity, no.
I would ignore that one, unless you are actively direct seeding databases. And then maybe, maybe not. Okay.
Let’s see here. We increased max stop from 4 to 8, and the query stopped spilling to tempDB and actually ran faster. Isn’t more parallelism supposed to make things worse? I mean, all right.
So, it’s a fair question. Isn’t more parallelism, you know, for a single query, there is usually, like, a point where there is a limited return on investment for more parallelism, and certainly a point where there would be no more return on investment for increased parallelism.
The point at which you hit that in your max stop setting, or max stop hint for the query, rather, will vary from query to query. You know, if the more, the more rows your query processes per thread, probably the more it slows down, kind of.
So, you know, if you have more threads available to process rows and break up those units of work into smaller bits, you will, you will, you will, it will eventually hit a point where you find, like, the perfect top for every query, but, you know, more parallelism is not generally bad or worse.
You do have, I mean, more parallelism could be generally bad for, like, the workload. If you were just like, you know what, every query max stop zero, let’s just see what happens.
Like, that wouldn’t, that’s probably not a great idea. But, yeah, you know, these are the kind of things that I like experimenting with, because, you know, sometimes you’ll find, you know, a little sweet spot for max stop when you’re working on this stuff.
But, no, in general, more parallelism won’t make things worse. And you may find that in your case, the reason why you stopped spilling to 10 dB was maybe the memory grant was a little bit more effective per thread there.
Maybe you had some imbalanced parallelism where one or more of the threads were getting way more rows than the other threads. Because, of course, when SQL Server, every query plan starts off as a serial plan.
It’s only if your query meets the cost threshold for parallelism that parallel plans are explored. And, of course, if SQL Server finds a cheaper alternative parallel plan, then it will use that, right?
But what happens is the memory grant for a query is assigned to the serial plan. And the serial plan will split up that memory grant amongst your dot threads if a parallel alternative is chosen.
So, it is entirely possible that the reason you stopped spilling to 10 dB was because the row distribution across threads got better and the memory grant per thread was more effective.
So, that is most likely the explanation for why it stopped spilling to 10 dB. But to answer your second question, no, more parallelism doesn’t make things generally worse for a query.
You may find that it does not improve things as much as you would hope, but, like, past a certain point, like if you went from 8 to 16 or something.
But, you know, that’s probably good enough there. All right. We replaced a CTEwith a temp table and the query got faster.
Many such cases. So, CTEmust be inherently slower. Should we avoid CTE for performance? Well, I mean… It’s an interesting question for me.
I generally dislike CTE because too many people like them. That’s really…
But CTE in SQL Server have limitations or rather a lack of featurehood in them compared to other databases.
But let’s see. How far do I want to go on this? Let’s see. All right. We’re going to go all the way. So, yeah. So, generally, like if, you know, a CTE is not materialized in any way in SQL Server.
So, dumping data into a temp table, you get a materialized result set and the optimizer can make generally much better choices off a materialized result set than whatever query was in the CTE and sort of inlining that into the remainder of the query.
Especially if things get quite large and complex, then the optimizer tends to break down in all the usual places. So, you know, replacing a CTEwith a temp table can in many, many, many, many, many cases get you improved query performance.
But if the query performance is fine with it, then there’s no need to go tinkering with things. I used to think that it would be great if Microsoft would give us the option to materialize CTE.
But right now, the only facility available for materializing CTE would be a spool. And spools in SQL Server are a complete nightmare. So, you know, we don’t want to have any more spooling in our query plans than we currently do.
And so, what my thinking is that materialized CTE in SQL Server would be pretty crappy unless Microsoft were to update the code for spools beyond its current state, which hasn’t changed in SQL Server 7.
But of course, Microsoft is busy providing us with groundbreaking or just breaking products like Fabric. It actually hasn’t broken any ground, but sure has broke a lot.
So… Yeah. All right. Hmm.
Hmm. Hmm. We captured the execution plan when the query was fast and when it was slow. They’re identical. Since the plan didn’t change, this can’t be a query problem, right?
Well, I mean, you know, you’re… Let’s see. You’re maybe right that it’s not a query problem, but what it most likely is is a parameter sensitivity problem or if you want to think about it slightly differently, it’d be a data distribution problem.
All right. Because your query ran fast when it had to process, let’s say, this amount of rows and then…
Actually, let’s say this amount of rows. And then your query ran slow when you had to process this amount of rows. Now, this two could be literally two. Could be 200. Could be 2,000.
And this five, well, this five could be 50,000 or 500,000 or 5 million rows, right? So what you have is a parameter sensitivity issue. What you might want to try doing, just to see it, just to make sure…
Like, I’m not saying this is your long-term fix. Of course, there are many ways to fix a parameter sensitivity issue or a data distribution skew issue. But you could try running your query with a recompile hint and seeing if, like, the plan changes and seeing if some of those plans capture lots and lots of rows and some of them capture very few rows.
It could also not be that, right? It could also be, like, the query got blocked when it ran one time, right? Because if you’re looking at wall clock time and you’re like, oh, that query took, like, one millisecond and you’re like, oh, that query took 10 seconds, but the query actually only used, like, you know, that one millisecond of CPU time or something, right?
And something else got in the way, right? So it could be blocking. It could have been some other resource that was under contention at the time when it ran. You know, it could have been CPUs overloaded. It could have been memory grants overloaded.
It could be that the data was in memory the first time it ran and not in memory another time when it ran and it slowed down because you had to read a bunch of data from disk into memory. So, you know, the execution plan not changing.
I mean, you know, everyone loves a nice stable execution plan, right? Yeah. But, you know, it could be lots of stuff that was different at the time that the query ran when it was slow.
It is your job to figure out what was different, whether it was something within the query, like the number of rows that the query had to process, or something external to the query, like the query was blocked or the query was under some other type of resource contention.
So, there we have that. Anyway, that is, let’s just make sure here, one, two, three, four, five questions asked and answered.
That will wrap up this office hours. Wow, nearly 20 minutes. That’s a good one. That’s a good office hours. There was lots to say, I guess.
I hope I didn’t repeat myself too much. All right, let’s go. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something tomorrow. And for the remainder of this week, we will be back to talking about the vector-y things.
We have some interesting new stuff to talk about. We have some new material. We will be doing some groundbreaking material, unlike Microsoft working on fabric.
So.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.




