SQL Server Performance Office Hours Episode 63
To ask your questions, head over here.
Chapters
- 00:00:00 – Introduction
- 00:02:34 – What Metrics Do Monitoring Tools Tend to Mislead People the Most With?
- 00:07:46 – Why Increasing MAXDOP Can Help with Spills
- 00:11:58 – Conclusion and Next Video Topic
Full Transcript
Erik Darling here, your old friend, your old monitoring tool mogul, ErikDarling here with Darling Data, and it is officially a Tuesday now, we are on the Tuesday office hour schedule, and so this is where I go ahead and, well, I do my best to answer five, not one, not two, not any number less than five community submitted questions. For free, because I still have better answers than AI, I think, most of the time. Down in the video description, if you would like some answers that are better than AI about your specific, I don’t know, database, query, whatever, you can hire me to do that sort of consulting.
It’s a wonderful process. I highly recommend it. And you can also buy my training, which is, again, well, not to keep beating that horse, thank that one. No way that one’s winning the Kentucky Derby.
You can also become a supporting member of the channel, and you can find a link to ask me office hours questions, just like the nice folks who have asked me five questions this week have. And of course, if you don’t have a question, or you don’t have any money, you can do something else that’s nice.
You can like, subscribe, and even if you’ve got one or two, you can tell a friend about the channel. And maybe they will also like and subscribe and tell a friend. And you see how that cascades into something just absolutely wonderful for everyone.
If you need free SQL Server performance monitoring, I’ve got it. Believe it or not, that link is also baked into the video description, so crazy times, right? Totally free, totally open.
Open source. Just runs a bunch of good T-SQL to capture a bunch of good data. Puts it into pretty charts and graphs for you. It’s got a whole dashboard and cool stuff.
And if you’re, you know, really into the robots, and no one can talk you out of talking to robots, there are built-in MCP tools in the server so that you can begin to have the robot friends look at your performance data and maybe tell you what to do.
What’s going on. I can’t vouch for whether they will be right or wrong. All I can tell you is that they will have been pointed at the data that I have collected. And, you know, whatever they surmise from that is on them.
I have no control over that part. So, anyway. Stuff coming up. I actually get to take a couple of these off the chart soon.
Pass on Tour Chicago. That is soon. It’s like the end of this week. Holy crap. I have to go do something.
And then I will be in Poland May 11th through 13th. Flying directly there from Chicago from little Poland to actual Poland. Then I will be home for a little bit prior to Data Saturday Croatia.
And then home for a bit longer until past Data Community Summit in Seattle, Washington November 9th through 11th. At least, you know, as far as what I can tell you. I have advanced T-SQL pre-cons at Pass on Tour Chicago, SQL Day Poland, and Data Saturday Croatia.
We’ll just have to see what awaits us for Past Summit in Seattle, Washington. And, yeah. So this is still ChatGPT’s take on May in New York.
I don’t quite get it. But, you know, the robots, huh? Anyway.
Let’s answer you some questions here. Let’s see. Is SOS scheduler yield always CPU pressure or can it indicate other problems? So I don’t want to say that SOS scheduler yield always means CPU pressure.
But it does indicate CPU stuff. Because it is queries voluntarily yielding time on a scheduler. So other queries can get some time on there.
Be fair to everyone. Feel the warm embrace of the CPU collectivism, I suppose. And then, so, like, if this gets up, you know, real high compared to server uptime or, you know, you know, it often, like, when you say that, it quite often does accompany incorrect or at least ineffective parallelism settings.
So high CX weights and that stuff. Because, of course, parallel queries use multiple threads. CPUs harder than single threaded queries.
You know, you could also have a vaguely or, I don’t know, perhaps monstrously untuned workload that puts a lot of CPU pressure on the system. But, yes, if you’re concerned, if you’re seeing SOS scheduler yield as a very prominent weight on your server, then certainly I would investigate CPU intensive queries.
You know, prominent being, you know, either for sampled periods of time. It is quite high. Or perhaps, you know, if you’re looking at the server since startup.
And you’re seeing, like, you know, SOS scheduler yield is, like, near or equal to or higher than, like, server uptime. Then that’s probably where I would start. Let’s see here.
How do you tell? When blocking is actually the root problem versus just slow queries piling up. Well, suppose we could come back to weight stats on this one. If you’re experiencing high LCK weights.
Either from the perspective of total amount of time that queries are spending on those weights. Or if the average milliseconds per weight on those LCK weights are on the high end. Then that would be a good indicator.
If I were you, I would most likely be looking at it. To start, you know, assuming that you don’t want to, like, get emotionally invested in anything. You could start by turning on and analyzing the block process report.
I’ve got a free store procedure that does that. It’s called SP human events block viewer. You could also look at the system health deadlock extended event. Or you could, or sorry, the system health extended event deadlock stuff.
SP blitz lock will do that. And you could also spin up a dedicated deadlock extended event. Which I do prefer because at least when you are able to set one up in a way that creates and stores data in a file target.
Then you have a bit more fidelity in the deadlock data you are collecting. So first stop, look for high LCK weights. Either total hours of uptime or high average milliseconds per weight.
First stop, you know, turning on the block process and XML deadlock stuff. Looking through those. But of course, you know, you could also just maybe, I don’t know, do something crazy.
Like point a completely free open source SQL Server performance monitoring tool at your server. And have all of that stuff get collected for you. Ta-da! Look at that.
Wouldn’t that be nice? All right. Is there ever a case where having fewer indexes actually helps performance overall? Yes, I would say that there are many cases.
As long as you still have good indexes that fit your queries. I’ve got a store procedure called SP index cleanup that will, again, totally free, totally open source. That will help you identify unused and duplicative indexes.
And get those all merged in together so you can clean those up. There are three primary things that, let’s just give it the blanket term over indexing. I’m not saying this, like, because I think there’s a certain number of indexes that will, like, cause a problem.
I don’t want to put any magic numbers in your head. But what I would say, like, under the blanket term over indexing, the primary things you’re going to see are, you know, buffer pool competition. Because you have all these different objects competing for space in the buffer pool.
Most SQL Servers that I see. Do not have an adequate amount of memory to deal with the amount of data that they are lugging around. You would also see an effect on the transaction log.
Every index that gets modified as data in the base table gets modified has to write their changes to the transaction log. So the more of that you have to do, the busier your transaction logs become. And, of course, from the point of view of lock escalation.
Because the more indexes you have, the more locks you’ll have to take. And the more, well, I mean, only attempts at lock escalation. But, certainly, you know, having to lock all those different objects, even if it doesn’t result in a successful lock escalation attempt, can be unpleasant.
So, sure. So there’s all those things. But also, like, you know, sort of consider, like, the workload patterns.
You know, like, anything that’s remotely data warehousey or staging tabley is almost always going to be slowed down to some manifestly, you know. And that’s a thing. dismal degree by having more indexes to insert data into on let’s just call it a bulk load that’s always that you know that’s one thing to consider if your workload is not just standard stock and standard OLTP you know I guess there’s there’s maybe some other like let’s just call it optimizer stuff you know the how it might take its time or perhaps you know run out of its budget for steps to optimize a query maybe not pick the best like a better available index you know the optimizer just like you and me the more choices more chances you give it to screw up the more choices you give it the more likely it is to have an issue there all right what metrics do monitoring tools tend to mislead people the most with that’s an interesting one I don’t I don’t have a a firm answer on this I think my main thing with monitoring tools is that they they tend to sort of like just sort of saturate you with information and expect you to be able to parse it apart I wouldn’t call it necessarily misleading I would I would maybe call it more like misdirecting so like if you’re if you’re looking at a performance issue and your monitoring tool is just like well 1300 people logged in and 1500 people logged out watch out for that you know it’s just like it’s not really getting you to where you’re going you know like you know monitoring is monitoring though right monitoring is just collecting a bunch of metrics storing them and letting you decide monitoring is not analysis monitoring is not you know getting you helping you get to a root cause necessarily it’s just sort of collecting data for you to do something with that’s why when I was building my performance monitoring tool there were some things that I wanted to do with it that would make your life easier you know like putting the MCP tool server stuff in there was one of them because it lets you have you know one of your robot friends look at the performance data and you know help you find stuff in there you can ask whatever questions you want of the data that gets collected so like you know like maybe it’ll look at it and maybe it won’t find the particular thing that you’re interested in but you can always point it to that right prompt engineering I believe they call it right something like that uh and you know that’s also why you know like all the you know over time I built the tool up to really help you like when you see a spike on a chart or a graph be able to get to the queries that were most likely involved with that spike in the chart or the graph so that you have an easier time of just you know like figuring out what was going wrong in the server at times where you might have been concerned about it so again I don’t think it’s necessarily that they mislead you I think that they they tend to misdirect you with over collecting and um stuff like that all right oh let’s see uh finally something this is an interesting one we increased Mac dot max stop not Mac dot Mac duple uh and spills went away this seems backwards why would that help well there’s actually uh I think a fairly simple explanation for this let’s say you have a query that runs at max.4 and I don’t know let’s say one gig of memory right because all plans in SQL Server start as serial plans uh that’s when a memory Grant gets assigned to them and if SQL Server later chooses a parallel plan that memory Grant is divided amongst your dop threads that are executing in the plan um it it doesn’t even necessarily have to like it could be one of two things right it could be like that four gig memory Grant uh was not enough for uh split into the one gig grants across your four threads that that could have been too small uh to begin with right for any one of them but then also you know not all parallelism is perfect not all parallelism is created equal so if one of your threads gets a lot more rows it still gets that same amount of memory SQL Server doesn’t say oh well uh you know you got 10 million rows and everyone else got one million rows we better give you more of that memory Grant you know let’s let’s be fair here uh so uh you know you could either have like all your threads spilling or just one of your threads sort of getting an unfair number of rows and spilling and if you increase max stop you spread those rows out further so that perhaps the same memory Grant um might uh might alleviate some of those spills uh that would be that would be like the first place I go with it um sometimes increasing dot does sort of get you out of parallel skew problems so that’s where I was going with that but anyway uh that would be my guess um you know all of those threads uh having fewer rows on them so the memory Grant goes a little bit further that would be that would be my sort of thing there anyway that is five questions right two three four five six yeah there we go uh thank you for watching I hope you enjoyed yourselves I hope you learned something and I will see you in tomorrow’s video where we’re going to talk about a little kind of a fun little thing that I did with dynamic SQL recently um and so there’s that anyway thank you for watching
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.