Why Some Long Running Queries Don't Have Wait Stats In SQL Server

Why Some Long Running Queries Don’t Have Wait Stats In SQL Server

Thanks for reading!

Video Summary

In this video, I delve into a fascinating aspect of SQL Server query performance that often goes unnoticed—how serial plans with eager index pools can hide the true nature of their execution. I explain why some queries mysteriously run for a long time without showing any significant wait stats or weights in the plan. By comparing a serial and parallel version of the same query, I demonstrate how the parallel plan provides more transparency compared to its single-threaded counterpart, which can be misleading due to the eager index spool operator. This video aims to highlight the importance of scrutinizing execution plans, especially when dealing with large tables, as eager index spools can significantly impact performance without obvious indicators in wait stats or query costs.

Full Transcript

Erik Darling here. Surprise! With Darling Data. Recording a little bit on the early side today because I have a lot of stuff to do later and I don’t want to interrupt my great streak of recording a video every day because, just to be honest with you, the YouTube videos, they get a lot of action on them. They get a lot of views, they get a lot of thumbs going in the right direction, straight up in the air, and they get a fair amount of comments. And, you know, written blog posts covering the same material have not nearly generated the same amount of engagement or interest. So, you know, I think investing more time in the videos and the recording stuff is, you know, kind of, I think the direction of the video is, you know, the direction that I have to go in order to get people to listen. I don’t know, maybe it’s getting to see this handsome mug smiling at you that keeps people in there. Who knows? It’s certainly not the slightly remaining, remainders of bed head that I haven’t quite worked out yet. Actually, it’s more like a headset head because I was wearing my headset earlier and now I have this, like, dent here. And then everything kind of poofed around it. Just in case you were wondering.

Why my head looks a little extra lumpy today. I got headset head. Or headset hair. Anyway, in this video, we’re going to talk about why some queries sometimes mysteriously run for a long time but don’t really tell you why. Now, there are going to be other peripheral reasons why you might see this, but they’ll be represented by weights. So, like, if a query gets blocked, you’ll see that that query spent a long time waiting on an LCK to take a lock and do something, right? If a query spent a long time reading pages from disk, you’ll see, like, page.io latch underscore some two-letter combination weights. Now, there’s a lot of stuff that when a query waits on things, it shows up for a reason. There are a few reasons. We’re going to talk about just one of them today. There are a few reasons why a query might run for a long time but not actually show you any particular weight that it waited on.

Of course, there is the scenario where, you know, you might have a query that runs a long time but the whole time is just burning away on CPU. Now, in a parallel query, you’ll at least see a lot of CX weights that represent that time. In a serial query plan, you know, the SOS scheduler weights just might not be crazy enough to account for all of the time that we spent, you know, doing a little merry-go-round on the CPU and that can be rather confusing.

But in this one, we’re going to look at how serial plans with eager index pools in them kind of hide the work that they do, like where that time in the query is spent in a weird way. So, I’ve got the same query, essentially twice run here, and one of them is limited to max.dop1 and the other one is allowed to go as dop as I allow, which on this server is 8. And if we look at the query plans for these, and I’ve already, I’ve taken the opportunity to run before, before recording the video so that I made sure everything looked right, made sure everything looked right when I did this.

Both of these queries, despite one being serial and one being parallel, run for nearly the same amount of time. Right? That’s not a very big difference for a serial execution plan versus a query that ran with a degree of parallelism of 8. Ocho, as they say in my favorite restaurants.

So, like, where are the differences? Well, they’re slight, but they’re there. And I think an interesting one is over here.

So, the serial scan of the post table took just about 5 seconds even. But, for some reason, a parallel scan of the post table only got, took 200 milliseconds more. That’s weird, right?

Why would a parallel scan take 200 milliseconds more? That’s kind of strange. But, the funny thing about both of these is that at least this one is honest, right? This is a serial execution plan.

We have a serial operator here. There is no indicator of parallelism because this query didn’t go parallel. This one is, of course, a filthy stinking liar. Right?

And if we zoom in, I’ve talked about this a little bit before. Or if we zoom in on the properties of the index scan on the post table, we get this kind of funny thing where all the threads, all the rows end up on a single thread. Right?

How 17 million rows end up on thread 3 this time. If we ran this query 8 times, they’d probably end up on a different thread every time just based on however SQL Server is feeling at the moment. So, like, the first, you know, first sign that the query plan is lying to us a little bit is in that, you know, a parallel scan takes 200 milliseconds longer than a single threaded scan.

And, of course, this is a limitation imposed by the eager index spool. You’re only allowed to build an eager index spool on a single thread. You can read from it in parallel.

Like, if we look at the properties of the eager index spool over here, we can see that all 8 threads had some stuff happening over here. Right? So, that’s just one sign of the dishonesty.

But that’s not really what the video is about. I’ve talked a lot about eager index spools. So, what I want to talk about in this one is how the serial plan is hiding what it did exactly. If we look at the properties of the select and we look at wait stats for the parallel execution plan, we can see that we spent 266 total seconds.

Remember, it’s milliseconds. So, you chop off those last three digits for savings and you get to seconds. So, about 266 seconds waiting on exec sync.

Now, in the context of this query, exec sync is happening while these threads synchronize and build this eager index spool. Right? So, we had eight threads that needed to synchronize, build that spool up in tempDB, load the 17 million rows in it, and then have SQL Server read the 4,390 rows that we cared about from it.

Which, you know, is at least honest. Right? Seeing that 266 seconds was spent in there, that’s, you know, eight threads times whatever the time was.

Maybe that’s eight times 38 or something. I don’t know. I’m not terribly good at math. But the single-threaded plan up here tells us absolutely nothing.

Right? If we look at this, we have three wait stats registered to this query plan. And not a single one of them shows us anywhere near the almost 38 seconds that this query executed for.

Right? We have 44 milliseconds of memory allocations. We have 12 milliseconds of SOS scheduler yields.

And we have 8 milliseconds of reserve memory allocations. That does not, that’s nowhere near 30 seconds. Right?

These are all definitely milliseconds. And so, and I find this quite dishonest. Now, there are other times when either a serial or parallel plan might exhibit this type of thing, but you would still have a wait stat associated with it. And this is something I’m going to record a different video about, how wait stats for sort spills and hash spills at least surface themselves in some ways in either query plan wait stats or wait stats in general.

But I’m going to not spoiler that whole thing just yet. I’m going to leave a little something on the table for when I record that. But, so what’s interesting about these sessions is, and this is something that I tell people to be careful and to keep an eye out for all the time, is when pure select queries do a bunch of writes.

Right? So, this is indeed session ID 115. Oh, that’s the wrong armpit.

There we go. Oh, wait, maybe that was the right armpit. Yep. You know what? I just can’t get out of the way of that thing. Trust me, this is session ID 115. We had two select queries, and both of those select queries combined did quite a few writes.

Now, we weren’t inserting into, I mean, we weren’t instructing SQL Server to insert anything into a temp table, but it did build those spools up in tempDB. So, that accounts for the writes here. But, like, if you were doing, like, select and an insert, or select and something like insert into a temp table, or select into a temp table, or something else that were, like, there was obviously a reason for writes to occur.

It would be quite strange to see, or it would not be strange at all to see a select query involved with writes. But these are just two queries that selected data and returned it. There’s absolutely no reason for them to write anything, except those spools.

You know, another reason for select queries causing writes could be stats updates. It could be spills. It could be query store related.

There are other reasons why you might see it happen. But, always, like, it’s something that I really tell people to be on the lookout for, because, especially if that query is slow, then those writes could absolutely be the root cause of why it’s slow. So, you know, figuring out what caused the writes is, you know, an exercise in executing the query, getting the actual execution plan, like we did in this tab.

But it’s certainly one that’s generally worth doing. So, what do I think about this? Well, this is more just like an FYI video.

I really do think it would be helpful if Microsoft did add some weight stats to account for things like this, because it can be quite opaque to end users. Exactly why a query was slow. Now, like, let’s just take a step back, and let’s just pretend that we found these queries either in the plan cache or query store or some other monitoring tool where we didn’t have the benefit, the absolute luxury of an actual execution plan.

You know, we would see this. And, you know, if you’re typical, if you’re one of the normies out there, you might start looking at query costs, and you might say, oh, 82%, oh, 87%, and you might completely ignore these. Because these, with a cost of 11%, remember the actual execution plans, we spent about 30 seconds apiece in those eager index spools.

And this is, that’s where the majority of the time was spent. But, you know, you would say, oh, but the cost is so low. It’s only 11%.

How bad could it be? How bad could it be, man? Well, this is how bad it is. 30 seconds apiece. So, you know, always be on the lookout for this stuff. Eager index spools are a particularly nasty query plan operator to run into, especially if they are latched on to rather large tables.

Off small tables, they’re not really of consequence. But, you know, big tables, they’re definitely of consequence. So, just, you know, be very mindful of that.

And if you’re looking at, you know, if you’re running any script, whether it’s spquickiestore or spblitzcash or, you know, whatever dustbin junk drawer script you have that interrogates one of those or the other, maybe you’re using the query store GUI to investigate things. If you have, if you see a select query that is causing a bunch of writes, pay special attention to operators in the query plan that might be responsible for them.

Whether they’re spools or whether they’re memory-consuming operators like sorts or hashes that might be spilling. Because if it’s ending up on the sucker board of things that you need to tune, those might be good places to start. So, anyway, I have to go start the rest of my working day now.

Hopefully, wear some headphones to mask the headphone scalp that I have currently. Thank you for watching. Hope you enjoyed yourselves.

I hope you learned something. I hope that you will, you will deign to give this video a thumbs up or leave a nice comment. You don’t, you don’t have to say anything about the hair. We can skip the hair this time.

I don’t know. Maybe just, maybe some, say something nice about my keen fashion sense, I guess. If you like this video and you like this sort of content, you can, you can subscribe to my channel and you can join nearly 3,653 other happy YouTubers out there who get notified, who get a knock right in the head every time I drop, every time I publish one of these videos. Wouldn’t that be, wouldn’t that be nice for you?

So, anyway, time to go work. Thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.