A Reintroduction To sp_PressureDetector
Video Summary
In this video, I wanted to celebrate my recent accolades from Microsoft by reintroducing SP Pressure Detector, one of my open-source scripts that has been approved by Microsoft Customer Support Services. This script helps identify performance issues in SQL Server by monitoring various aspects such as CPU, memory, disk, and blocking. I spent a lot of time updating the README file to provide detailed information on how to use the script effectively, including explanations for each parameter and their default values. The video walks through running SP Pressure Detector with all parameters set, highlighting key sections like weight stats, virtual file statistics, tempdb usage, memory consumers, and query memory grants. By sharing this knowledge, I hope to help others better understand and troubleshoot performance issues in their SQL Server environments.
Full Transcript
Erik Darling here with Darling Data. And I got a nice email from someone in Microsoft Customer Support Services, that’s Microsoft CSS, for those acronym-loving folks playing along at home, thanking me for my open source scripts, because apparently they were able to use them to help them troubleshoot an issue today. So from here on out, I’m just going to say that my scripts are Microsoft Customer Support Services approved, or CSS approved, Microsoft CSS approved, at least until the lawyers catch up with me, at which time I’ll fight a lawyer live on TV. It’ll be like Musk versus Zuckerberg, except everyone will be rooting for me instead. So the purpose of this video, in celebration of my recent accolades, in celebration of my recent accolades from Microsoft, is to sort of reintroduce SP Pressure Detector. And the reason that I want to do that is because I realized, I was looking at my GitHub repo, and what I saw was something that I had forgotten about ages ago, is that my GitHub repo has a README file. And that README file was hopelessly out of date and sort of a bit malnourished looking.
So I spent a lot of time redoing the README file to document as much stuff as I possibly could in there. And I was rather happy with it, much happier than I am currently with my green screen, which is showing sporadic shadowing. I guess I got to stand a little bit closer to you. Hope you don’t mind. Hope I don’t smell too bad. It’s Friday. You know what happens on Fridays. See, if you look at your internet at all times. Oh,降 όikkajå Anyway, the best place to start with any of my scripts is with the help parameter.
Let me actually make this a little bit bigger so it’s a little bit more readable. Like putting a query inside of a CTE. I’m going to zoom in so it’s more readable.
All right. So the best place to start with any of my store procedures, if you’re unsure how to use them, if you’re unsure about what the parameters are or what they do, is to use the at help parameter. It is a very carefully designed set of information to help you get the most out of my free, open source, Microsoft approved store procedures.
I usually start off with a little introduction. All right. Tell you kind of what the purpose of the script is, what it does.
I don’t like anyone to feel like they’re left in the dark about the intent of the script. And then some scripts have a little bit more in them than others. The more complicated ones like SP human events and SP quickie store will have some example executions in there.
But this one, so usually I don’t have to mess with… with settings too much or parameters too much in this one. It’s pretty good to just run right out of the box.
But I do realize using this… After using the scripts live for like client troubleshooting and for training, that sometimes it would behoove me to add a little bit of configurability in there so that there’s less sort of jumping around and things.
So they’re mostly there for me. Except for one thing, which is there for a nice person who left a YouTube or blog comment about the CPU utilization threshold. But we’ll talk about that more in a minute.
So this section here will tell you what the parameters available for the store procedure are and what a reasonable value is to set that parameter to and also what the default values are. So what to check will define if you want to check for everything or just CPU pressure or just memory pressure.
If you want to skip looking at queries that are currently executing on the server, you can use that. If your server is under such duress that getting query plans back causes hangups with SP pressure detector, you can skip getting plan XML in the queries that get executing queries.
If you want to… So minimum disk latency applies to the section of the code that goes and looks at the DM virtual file stats, DMV, and you can set your own custom latency there.
By default, I look for disks or files that have over 100 millisecond read or write latency, which I think is pretty a fair starting place to look at for things that might be not great with disks, but you can set that higher or lower depending on your needs.
The second one down here is a CPU utilization threshold. Now, in the full result set, there are some XML clickable columns, and there are two of them that have to deal with CPU.
One of them goes out to this funny little set of XML that records CPU utilization in SQL Server, and in there, there used to be a hard-coded filter of 50 for looking at… for getting data out of these XML bits for when CPU utilization passed a certain point.
This is now configurable. So if you want to bump it up to 70 or 80 or 90, depending on, you know, depending on what your server is used to running at.
Some folks are awesome, and they realize that if their CPUs are consistently spending under 30%, that they are probably giving Microsoft too much money, unless they have some, like, occasional workload bits that just fly off the handle and push things up to 100%, which is fine.
You know, you just know that most of the time you’re paying Microsoft too much. So there’s that. So if you want to only look for when CPU is over 80% or over 90%, then you are free to use that parameter to do that.
There’s also an option here to skip the section of SP pressure detector that looks at weight stats. Especially if you need to sort of run this, like, rapid fire and, like, you know, see when and where things are changing, you might not want to look at weight stats every single time, right?
Because, like, like, weight stats might not change that much over, like, the second and a half that you wait between executions. And on top of that, you probably won’t remember all the numbers anyway, unless you’re paying very specific attention to one.
So those are the parameters and the full explanation of the parameters. This is, of course, fully free MIT license. And if you go over to the Messages tab, you’ll see the full MIT license when you enter the help section.
So I’m not going to execute this with every single different parameter. I’m just going to execute this with everything so that I can walk through all of the different sections in here.
Now, my laptop hasn’t been up to a whole lot lately, which is okay. You know, I’ve been busy with stuff. I don’t have 24 hours a day to write and run demo workloads on here.
But this first section up here will show you the weight stats that I consider important for performance. There are a whole list of other ones.
Sometimes they’re useful to look at. Sometimes they’re not. SP Pressure Detector focuses in on a few very specific areas of weight stats around parallelism, CPU, disk, memory, and blocking.
So those are the sections that I focus on because those are the sections that represent pressure for SQL Server. In this section, you’ll see hours uptime.
That’s how long your server has been up. You’ll also see hours of CPU time. This is only available in Enterprise Edition at the moment. And I was going to say something important here.
Standard Edition is kind of a waste. If you’re on Standard Edition, you might as well just be using Postgres. It’s kind of funny.
There’s the weight type for all of the ones that we’re interested in. There’s a description of the weights in there. So if you’re not sure what a weight is, there’s a description column. You have how many hours we’ve waited on that weight.
It’s useful to compare this to total uptime and total CPU time. So if your server is in a crazy 24-7 workload, it might be a little bit more.
It might be a little bit. If your server has been up for like 3,000 hours, but you only have like 1,000 hours of CPU time, then it’s usually a little bit more wise from a performance analysis point of view to compare hours wait time to hours CPU time and hours uptime.
If your server is just constantly flying off the rails, then hours uptime and hours wait time are pretty good to compare each other to. We’ve got the hours of signal wait time.
I don’t put a percentage in there because I’ve never found it terribly useful. Either the numbers are close or they’re not. For SOS scheduler yield, they’ll pretty much always be one-to-one. There might be slight differences in there, but for the most part, the timing for schedule yield will be the timing for the hours of wait time on schedule yield would be pretty even with the hours of signal wait time.
We have how many tasks have waited on that wait, and then the average milliseconds per wait. Now, this is useful because we want to know the number of tasks that waited on something, fine, but we also want to know on average was there a lot of latency in waiting for that.
This can be particularly useful with lock waits, so we know ballpark how long queries are waiting to get locks on average. The next section down is hours of uptime.
Hours of uptime. And I just read the column name. I injected a word in there too. This section is the one I was talking about with the virtual file statutes.
Virtual file statutes has the parameter where you can set the latency number for it. I came back with a whole bunch of tempdb files in here. Then you can see the average write stall on those was over 100 milliseconds.
So that’s why these got picked up. If I change that parameter to be a lower number, I might see some other stuff in here. The read stall on the file is fine. The write stall, you know, it is what it is.
It’s tempdb. It’s the only thing that you can do is to get the data out of the file. Unless this is like in the seconds all the time, I’m not going to worry too much about it. All sorts of stuff uses tempdb.
You don’t know when these averages sort of got driven up. It could be checkdb. It could be, you know, some weird one-off event that will never happen again. You just, it’s impossible to tell without digging deeper.
There’s also the total amount of data in gigabytes read and written, the total read and write count, and that’s about the end of it there. I don’t spend too, too much time talking about disk.
Unless you’re in some platform as a service offering in Azure, where you were recently forced to wash Microsoft’s feet because they gave you transaction log throughput equivalent to an SD card from 2003 plugged into an SD card reader plugged into like a USB-A port, 200 megs a second.
Then I’ll talk a lot about incomplete thoughts that are very expensive. Next section down is a clickable. If you want to get some basic information about tempdb, it’ll tell you how many files you have, the min and max size of each file.
So if you have one file that’s like a gig, or like if there’s one, if the min size is like a gig and the max size is like 20 gigs, you might need to do some adjusting there. The growth increment in gigabytes and the total number of CPUs you have.
So you get a sense of if the number of data files you have lines up with the number of schedulers you have on the server. There’s also some information about space used in tempdb, how much is free, how much is taken up by user objects, version store objects, and then internal objects, and then what the current tempdb activity is.
So if there’s a session using tempdb, then we will get information about that back, along with some other information about the size of the allocations. I tend not to go below gigs in here because like 800 megs is just not sufficient for me to worry about.
This is not doing it for me. Next section down is memory consumers. So all the things in SQL Server that may consume memory, the buffer pool, different caches, query memory grants, that will all show up in here.
You’ll get this top line, which will show you the buffer pool. You’ll have this second line, which will show you how much memory has been stolen from the buffer pool. And then the next section down is sort of the top five. I mean, I don’t even have five additional things in here right now.
I only have four, I think. Well, let’s count that. That was squint one, two, three, four. I do have five. I did it.
Check me out, ma. So I do have five in here. And these are pretty small, but they are ordered by which five have the highest memory consumption. Next section down, we’ll grab XML to show if SQL Server was reporting any low memory conditions.
Sometimes worth looking into those if your server is deemed to be under memory pressure. Seeing lots of resource semaphore weights, resource semaphore query compile, which I’ll show you in the weight stats up top. Or if you have a lot of page IOLatch weights, you might find that your server is under pretty consistent memory pressure because you are constantly swapping data in between RAM and disk.
The next section down will give you some information about, well, a few different things. It’ll give you the total database size. It’ll give you how much physical memory is in the server.
It’ll give you what max server memory is set to. I have mine dropped down a little bit right now because I was trying to show, like, lower memory conditions. 96 gigs fits a lot of the Stack Overflow 2013 database in it, so this is brought down a little bit lower so I could cheat a little bit in that demo. Don’t tell anyone.
It’ll tell you if you have locked pages and memory enabled, which is a setting I highly recommend, especially on servers with more than 64 gigs of memory in them. So it’s generally a pretty good idea to have that in there. Let’s you bypass all sorts of virtual memory address space and go right to allocating physical memory.
It’s a pretty handy setting, I think. Also, I’ve seen it prevent non-yielding schedulers in a lot of some recent cases, too, where schedulers were getting all mixed up and messed up because of the constant swapping between disk and memory. Having locked pages and memory enabled magically fix that.
You’ll get some information about target server memory, how much memory SQL Server likes, how much SQL Server wants, how much total memory we have available. How much memory we have in total, sorry. How much memory we have available.
So this is how much memory we have available to loan out to queries for query memory grants. It’s about 75% of your max server memory setting. This is how much we’ve granted out to queries.
Nothing running on my server right now, but, you know, if there were, you might see some memory grants getting dished out and doled out, and you might see that get populated. We have how much the queries have used out of that grant. Over here, we have slightly different stuff.
So this would be how many queries have been granted at memory grants. This would be how many queries we have waiting to be granted a memory grant. If there have been any timeout errors, if there have been any forced grants.
SQL Server can sometimes force queries to run with a lower memory grant. That’s that serial desired memory part of the query plan. The total reduced memory count.
So how many times queries have been forced to run with a reduced grant, which is strangely five when this is zero. Maybe I had a demo that went behind the scenes and messed something up there. I don’t know.
Like min grant percent does something with that. This section down here, well, not too many fireworks. If there were something running asking for a memory grant, that would show up in here, which my server is blessedly slow right now because it’s Friday. Friday, as some folks say.
This is the section that I was talking about with the CPU details. So if we look in here, we’ll get some information about what our CPU configuration looks like. You’ll notice that offline CPUs is right at the top.
If you have installed the wrong version of SQL Server, they can only use 20 cores. Or if you have terribly misconfigured your standard edition VM for SQL Server to use like one socket per core for 8, 12, 16 sockets and a whole bunch of CPU cores are offline. I want that to be the top of the list there.
The rest of this is just sort of stuff that I can get. So I decided to keep it nice and informational in this output. This CPU utilization column, this is the one that I talked about having the parameter for up here a little bit.
That’s the CPU utilization threshold. If you want this to be higher or lower than 50, you can now configure that via the magic of parameters. Down here is general state information.
How many workers are currently active? How many workers are currently active? How many are currently used? How many you have available? How many threads are, I think, waiting for CPU is an important one.
How many requests are waiting for threads is a good one to look at. How many workers are currently active? The total active request count. That’s another good one to keep an eye on.
See these numbers all jumping up together. We can be reasonably sure that our server is getting pretty well hammered on CPU. If you have any requests that are stuck waiting for things, this will show up with a number in it.
How many tasks are currently blocked will show up in there. How many active parallel threads you have. And then these last two columns are, if you have a lot, if you have like, let’s say, 20 queries running.
And five of them are pegging away at CPU. And 15 of them are, like, those five queries are running. And then you have like 15 queries that are runable.
That’ll show up here. And that’s a pretty good way to figure out if you have CPU pressure on your server. That’s queries like, I’m ready to use a CPU, but I can’t use a CPU because everyone else is, I’m at the back of this line here. Like they’re going to see Taylor Swift and they’re the last ones to show up.
And they’re just in the back of the parking lot, hoping. Hoping that someday they’ll get inside. If there are any current thread pool weights on the server, they will show up in here.
This is much spicier when there’s a demo involved. But right now there’s no demo involved. So we don’t have anything.
But one thing that’s important to keep in mind is that SQL Server does not, like when it spins up, you get a certain number of worker threads. So you might see some threads based on how many CPUs you have, unless you override that by being goofy, changing the default setting. But if like SQL Server doesn’t create 506 worker threads and then leave them there.
SQL Server will grow and trim the worker thread set as it needs. So you might see some thread pool weights. As long as they’re relatively short, I’m not too, too concerned.
Like if there are like three or four in there that have been waiting for like a few milliseconds, I’m not worked up. But if I start seeing queries waiting like 100, 500, like a full second to get worker threads to run on, that’s when I start to get a little bit more nervous. And then down here, this final section, if there were anything fun running on here, we would see a bunch of queries ordered by, well, not ordered by, but a bunch of queries with CPU usage.
Down here we get query text, query plans, how long they’ve been running, a bunch of useful stuff, a bunch of useful metrics about the queries. Oh, that’s a nice graphical bug in SSMS over there. You can’t really see it.
I went the right way first time. Yeah, look at it. Oh, green screen. What have you done to me? Let’s just lean in here. Have a little chat. You and I, this is how I stand at the bar. To lean on an elbow.
Why I’m very popular. That’s nice. Graphical glitches. What is all that mess?
Maybe it’s not SSMS’s fault. Maybe it’s me. Maybe SSMS just wants to have nice forehead wrinkles like I do. I don’t know.
I don’t know. I couldn’t tell you. Anyway, that is the basic output that you will get from SP Pressure Detector. If you see anything here that you have questions about, I’m going to head over to my GitHub repo, code.ericdarlingdata.com.
That will get you a short URL to my GitHub repo where you can ask me questions. If you email me a question, I’m going to tell you to go use GitHub. Because then everyone can see it.
You can really think hard about the question you’re asking before you ask it. It’s really easy to send in private email to someone and ask questions that may not look good in the light of day. You might think really hard about the question you’re asking.
You might find yourself starting to write something and then realizing you can go investigate things a little bit on your own. Keep in mind that since these two procedures are all completely open source, completely available to you, free for download. Even Microsoft customer support is free to download and use them.
You can read through all of the code and you can see exactly what it does. And you can try to figure some stuff out for yourself. You might even be super nice and kind and open up a pull request with a contribution of your own.
I’m very grateful for the folks out there who do that. Anyway, this was a reintroduction to SP pressure detector. All in one video.
Kind of short and sweet. At least I tried to make it short and sweet. We’ll see how we did. Thank you for watching. Hope you enjoyed yourselves. I hope you learned something. If you like the video, well, hit that little thumbs up button down there.
It always makes me feel joyful inside when people like my videos. If you like this video quite a bit, you can even subscribe to my channel so you get notified when I release new ones, which I try to do pretty often.
Schedule has been a little weird lately. Summer vacation travel, kids, work. So I haven’t been able to record quite as much as I wanted to, but I do it as much as I can.
Anyway, that’s enough of that. Again, thank you for watching. And I will see you over in the next video where I will talk about something else entirely.
keep a screen to watch a maneira it’s��� very radical and funny.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.