A Little About SOS_SCHEDULER_YIELD Waits In SQL Server

Video Summary

In this video, I dive into the fascinating world of SOS scheduler yield in SQL Server, providing insights that can help identify when your server is under CPU pressure. I share a personal anecdote about my vacation and how it influenced today’s content, making the explanation more relatable. We explore the states of queries—running, runnable, suspended, and sleeping—and use analogies like going to a bar to order drinks to explain these concepts in an easy-to-understand manner. The video also delves into practical examples using my demo server and a smaller VM called Server Tuning, showcasing how SOS scheduler yield accumulates under different query scenarios. By the end of this session, you’ll understand why high SOS scheduler yield is often indicative of CPU contention and learn strategies to mitigate it.

Full Transcript

Erik Darling here with Darling Data, the darlingest data there ever will be. And in today’s video, we’re going to talk a little about, because I like doing those little about videos, we’re going to talk a little bit about SOS scheduler yield. Fresh back from vacation, tan, rested, ready, haircut in full effect, feeling good. I tried to record this video a number of times and just been sabotaged by various things that were really too annoying to go into detail on, so we’re going to skip that. And we’re just going to talk about our good friend SOS scheduler because I think it’s an interesting way. It can be a pretty decent indicator that your server is under CPU pressure if SOS scheduler, just like a number of correlating factors in there, but like if SOS scheduler, is under CPU pressure if SOS scheduler is under CPU pressure pressure when SOS scheduler yield is constantly way at the top of your wait stats, and the total number of hours of SOS scheduler yield is always like at or near your server’s uptime or even over your server, especially over your server’s uptime, it’d be a pretty good sign that your query’s under CPU pressure because what SOS scheduler yield sort of indicates is that queries are waiting for CPU attention. The more time queries spend waiting to get CPU attention, the less time that means that they’re actually like running and like being productive little queries. But SOS scheduler yield basically means is that your query is ready to go. It’s ready to run, ready to rumble. I didn’t say the whole line so Michael Buffer can’t sue me, but it’s basically your query is good to go. It’s not waiting on like reads, writes, locks, pages, memory, disk, anything else, CX stuff. It’s just waiting for a CPU to say, come on, come exhaust your quantum with me and we’ll have a grand old time together. So the three, the four main states that your query can be in, there’s running, which just means your query is on a CPU doing stuff. It’s active, it’s going, right? Runable is when you see SOS scheduler yield weights. Suspended is any, basically any non-SOS scheduler yield weight, like any other weight type you see is going to show a suspended status, right? If you’re SOS scheduler yield, you runable, you’re not waiting on any other resources. Any other resource weight, your query will show it suspended while you go get, like you’re not allowed to get near a CPU until you have all the stuff you need. And then there’s sleeping, which means your query is finished and you should probably close some connections, fix connection pooling, whatever you did to make your query go to sleep and not, not close things out, hold locks and block a whole bunch of people and you know, that whole thing. Anyway, SOS scheduler yield weight, I usually like to explain this as like you going into a bar and trying to order a drink. Now, you’ve got this sort of list of points up here and one thing that I want to point out about this list of points, point about the points, is that it has been a real long time since I’ve been carded for anything. You know, I started drinking in bars in my early teens because my voice changed early, basically had the same voice since like 1993. I could, I started being able to grow sideburns really young and I started smoking cigarettes really young. So I quit obviously because responsible thing to do, but you’re a 13 year old with a deep voice and sideburns. You walk into a bar and you smack down a pack of cigarettes and a Zippo and you just order like whatever cheap domestic beer. No one really questioned you. I just kind of act like a jerk. Bartender’s like, well, he’s been here before. So some advice for the young kids out there, all the youths who watch my SQL Server video. But it’s sort of like, you know, you are the query.

You walk into a bar. The bartender is a CPU. Your job is to have everything you need to get that bartender’s attention and say, this is what I want and have the bartender start working on it. So various things might get in the way, needing an ID, needing money, bouncers, or a long line of patrons who are also trying to get to the bartender to get their attention, to get them to make their drinks, right? And the last thing you want is a bunch of drunk people waiting a long time for drinks.

That’s a recipe for a fight, right? Cutting in line, elbowing, that scene. So one thing about SOS Schedule Yield is it’s always going to have high signal wait time because it is a CPU signal. It is your query waiting for the CPU to say, yep, come on over, do your quantum dance with me, and you and me will get these results out to SSMS faster. Who would return results anywhere?

Export to Excel, I guess. Dump it into an Excel file like everyone else does. So when you have a lot of queries running and not enough CPUs to service them all, you may end up with really long lines of queries waiting to get CPU attention, which is when you’ll see SOS Scheduler Yield Waits really spike up, which is when your server might be under a lot of CPU pressure. Some interesting things that I find slow queries down and also keep CPUs very busy are things that have frequent executions, right? Like scalar UDFs, at least when they’re not in lineable, they get called once per row that they have to process, cursors, loops, stuff like that.

Those things will generally keep CPUs very busy and interfere with query processing stuff. Just keep the lines really long. It can be annoying. One thing that you don’t want to do in response to, well, actually you should step back a little bit. So one thing that you might also see if SOS Scheduler Yield Waits like really get high is you, not like, you know, casual drug use high. I mean, like time high. High on time. One thing that you might see eventually is as these lines grow is you might start hitting thread pool weights. And thread pool means that you’ve run out of worker threads.

And some people have the tendency to throw a whole bunch of, to manually increase the number of worker threads that SQL Server creates, is willing to create as queries execute. And what you’re going to find then is that SOS Scheduler Yield Waits are going to get driven even higher because you’re going to have way more worker threads vying for CPU attention. So that’s usually not a good response to thread pool and high SOS Scheduler Yield Waits because it’s the analogy I always use. It’s like adding like thousands of screaming kids to a classroom without adding more teachers to help the screaming kids. It’s a bad idea. So what we’re going to look at is sort of how SOS Scheduler Yield Waits accumulate in a vacuum. And then we’re going to look at how they accumulate when we have a bunch of queries lined up waiting for them. Now, for the initial stuff, we’re going to be using my normal demo server, SQL Server 2022. It’s about 100 gigs of memory. It’s about 16 CPUs on my fabulous new Vinovo laptop workstation beast. And then we’re going to use a much smaller VM called server tuning. And server tuning, I didn’t want to connect. I’m sorry. It’s not what I actually wanted. Go away. I just wanted to look at the settings. Server tuning is a much more modest VM.

It only has about 14 gigs of memory and two CPUs. So what we’re going to look at is when we throw a whole bunch of single threaded queries at that and look at how SQL Server sort of manages them. So we’re going to stick this into a new window. And this store procedure, well, this is just temporary store procedure. I actually have to switch this out. I’m not sure what happened.

I installed, like this used to run for about 10 seconds with the old number, but I like installed a bunch of like system updates and stuff. And all of a sudden, all of my, like it takes way more iterations for that to get up to 10 seconds of execution time. So I don’t know what happened, but it happened. And apparently there was like some like CPU fix that made them faster or something.

I don’t know. Maybe Windows 11 just got better at dealing with the P core and E core thing. I, I genuinely don’t know. But what this is going to do is some is sort of like the effect of like, if you have a scalar UDF or if you have a loop or a cursor or something where we’re just going to like, this is like attack, like swarm of gnats, stinging the crap out of you, biting your butt.

And what we’re going to do is dump some weight stat data about SOS schedule yield into a temp table, run the procedure, and then diff it. The reason why I use temporary, well, store procedures, but temporary store procedures here for this is to avoid async network IO weights. It can get annoying.

So let’s kick this, run this whole thing. And this should take about 10 seconds. If I lift up the right arm, we’ll see the time ticking away right about here. And 10, 11, close enough. So in 11 seconds of just incrementing this up, we have about 2,700 weights on SOS scheduler yield, and only about three milliseconds of total wait time on it. Two of those three seconds, probably because of some rounding stuff, were on signal weights. So that’s kind of what I was talking about. SOS schedule yield always going to be on the high side of signal weights. But what’s interesting to me, I think, here is how much work we’re able to get done in this loop while it’s executing before we hit an SOS schedule yield. So this is actually a very productive loop, right? It’s like we get a lot of, like, iterations done before the query’s like, okay, step off, my good, right back on, right? So cool, right?

It’s good stuff there. Now let’s look at when we have just one big slow. This should also run for about 10 seconds. I had to make some adjustments to this one too, to have it run for 10 seconds, because, you know, my CPUs just got awesome overnight. So we’re going to do the same thing here. We’re going to run this big query that just does a bunch of cross joints to a tiny table with a few extra union alls. We’re going to make it run at max.1, and then we’re just going to get a max of all this stuff. So let’s just kick this whole thing off. And this should also run for about 10 seconds, right about here. 10, oh, 10 on the nose. So this one here, just about the same number of waiting tasks in that time on, or waiting, like, waits on SOS schedule yield, and also just about the same wait time and signal wait time, right? Cool. So single threaded query running for 10 seconds, completely unfettered by anything else doing anything. There’s no other, nothing else is using CPU on my home laptop. Generally racks up a small amount of SOS scheduler yield, because we’re just off and right back on, right? We’re not, like, waiting in a big, long line to get back on, to get CPU attention, to get back on a CPU to run. So single threaded, right? Let’s do this same thing, except let’s, oh, wait, we’re in the wrong tab there. Sorry about that. Let’s do the same thing, except we’re going to run this at DOP 8. And this is just to show you that parallel queries, because we have multiple CPUs engaged, we’re scheduling DOP worker threads, they’re going to rack up a whole lot more waits on SOS scheduler yield. And that’s really the point of this thing. So we’re going to do all this. This should also run for just about 10 seconds. 10 on the nose. Look at that. Amazing, amazingly timed query. So what do we end up with? Well, we end up with a lot more waiting tasks, right? We have 20,000 there rather than like 2,300, 2,400 with the single threaded query.

And we also have a lot more wait time. And of course, because we have more wait time, we have a lot more signal wait time waiting to get CPU attention. So all those little parallel queries will drive up both the number of waits on SOS schedule yield, because we have DOP threads able to like get scheduled on CPUs and they all have to drop on, hop off, hop on, hop off, do their thing. And yeah, that’s about that there. So parallel queries are going to necessarily drive up SOS scheduler yield because they have more active requests on CPUs. And those active requests all have to get co-scheduled. So you have DOP threads getting on for four milliseconds about, you know, exhausting the quantum, stepping off, stepping back on. And depending on how lucky you are making, like hopefully all your parallel threads end up in different places and you’re not trying to like co-schedule stuff around, that would be fortunate. But yeah, that’s parallel queries. You’re going to see more SOS scheduler yield waits. Now, what we’re going to do is we’re going to, well, I already have that done, actually. We’re going to use SQL query stress by the lovely and talented Eric EJ on Twitter and GitHub and everywhere else. He took over maintaining SQL query stress when Adam Mechanic said, I’ve had enough of you SQL Server people. I’m going to go use Postgres like a normal human being.

And yeah, so Eric EJ, thank you for all, thank you for all you do. So I have this query already in SQL query stress and I have this window with SP who is active ready to go. I’ve sort of trimmed down the column list so we can just get to the stuff that I really want to show you. But if I run this and this is running at DOP1 and I keep hammering away on who is active, because this server only has two CPUs assigned to it, what we’re going to see is usually there’s going to be two requests active, right? So these two are running, these are doing stuff and the rest of these are runnable, right? So they’re like ready to go. They have everything they need to execute, but they just don’t have CPU attention. And if we run this a bunch of times, sometimes there will be one, sometimes there will be two. But in general, what we’re going to see is the same pattern where a bunch of queries are like, please give me CPU, please, I need it. Why are you making me wait for the CPU? And one or two queries just actively able to use CPU. Now, some of this does depend on timing and query placement on CPUs.

Like when there was only one active in there, it was probably just very, very bad luck with like where SQL Server chose to throw one of those queries. I can kill this off. I think I’ve pretty much exhausted the point on that. But this is when you see very lots of queries that are runnable, right? Like this, like if you have like, you know, constantly have like a long list of queries running and that most of that long list is runnable and very few things are actually executing, it’s a good sign that there’s some pretty serious CPU pressure, at least in that moment on your server. You can correlate that to more, to very high SOS schedule yield weights to figure out if it’s something that constantly happens or if it’s just something that happens once in a while when like things really heat up and you have like a really busy time on the server. So, you can look at SOS scheduler yield weights in general. I prefer SP pressure detector. I forget what, you know, something’s weird there. But if you run this and we look at the weight stats, I think I updated it recently to fix an issue. But if you look at weight stats, well, there’s SOS scheduler yield. Despite my, despite my fantastic demos, it is not very high on the list. The server has been up for a while, but I just haven’t been doing much to it. We have no hours of wait time, but what can you do? I suppose if I really wanted to burn my CPUs out, I could, I could do that. But I’d like this, I’d like this to last a little while. Not in a rush to spend another like five grand on a laptop because I burned the CPUs out after a few months. So, if anyone out there has like lots and lots of Azure credits they want to donate to me, I’d be happy to show you what CPU burnout looks like on Microsoft servers. So, when you have very high, so let’s say you’re in a situation where you constantly just have high SOS schedule yield weights, or when you hit your busiest times on the server, you have very high SOS schedule yield weights, you have lots of queries that are runable, but not run in, they can’t get CPU attention. The options that I would generally prescribe, well, getting faster CPUs is kind of tricky. Most folks are in the cloud or on a VM where, you know, you can’t really pick and choose that stuff. Back in the good old days when you get to build your own servers, you could be like a real gamer type and get like a lot of really fast cores.

So, you could have like, you know, like three, four, five gigahertz cores rather than like the older generation, like one to two point something gigahertz cores. That could help like queries get faster, like do more work within the time that they got on a CPU. You could also just get more CPUs when expand the number of CPUs available to work on queries. The more of those you have, the less time queries are going to be spent, the less time queries are going to spend vying for CPU attention in the crowd, waving 20s and 50s and 100s, trying to get the bartender to make them a drink.

You could also make sure that, you know, you have your various settings are in the right place. Parallel, parallelism settings can help with SOS schedule yield weights. Turning max stop down can sometimes help with that because you have fewer parallel threads spinning and trying to get CPU attention. Turning cost threshold for parallelism up can sometimes help because then you have fewer queries that are eligible for parallelism. It’s not always a great fix because, you know, you just like long-running serial queries are just going to, it’s unpleasant, right? You can cause a lot of performance regressions changing those settings. Those are global settings. You probably, your better bet would probably be to add hints to specific queries that are messing you up at those times.

You could also do some good old-fashioned queried index tuning, right? The faster queries can run naturally, the better off you are, the less time, less CPU time they’ll need. It’s all good for you. It’s all up. And again, stuff that stresses single CPUs, long-running serial queries, user-defined functions that can’t be inlined, cursors, other types of loops. And another big one that came up, actually it was something that we talked about with Joe Obisch on the last episode of Bit Obscene, is scheduled tasks and jobs that all occur simultaneously. You have a bunch of jobs that run every hour, 10 minutes, but on the hour you just end up busting your CPUs for no good reason. All right. Anyway, I have a dinner reservation. I’ve been talking for 20 minutes, parched. And I’m just happy to have finally gotten through this video without sabotaging. So if you liked the video, give it a thumbs up. If you like this sort of SQL Server content, subscribe to my channel. You’ll get notifications when I post these videos.

It’s good for you. Strong teeth and healthy bones, they say. And as always, I hope you enjoyed yourselves. I hope you learned something, and I hope that you’ll keep watching. So thank you. Thank you for watching. Thank you. You’re a great person. Follow me for more tips about underage drinking.

Probably don’t. Not a good idea. Don’t let your kids watch this. Unless you’re cool with that. I don’t know. Anyway, I’m gonna go now. Goodbye. Bye.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.