A Follow Up On HT Waits, Row Mode, Batch Mode, and SQL Server Error 666
Video Summary
In this video, I delve into the intricacies of SQL Server query tuning and the importance of batch mode versus row mode execution plans. Starting off by discussing my recent accolades from BeerGut Magazine, I highlight how these awards reflect both the unique challenges and successes in managing a consultancy with a static team. The primary focus shifts to explaining the differences between hash join weights in batch mode and their absence in row mode query plans, emphasizing why reducing batch mode weights can lead to faster queries by minimizing the number of rows processed. Additionally, I explore real-world scenarios where multiple distinct aggregates in row mode queries can lead to significant performance issues, such as hitting system-generated unique value limits, which can result in errors even when only running a SELECT statement. This video aims to provide practical insights and solutions for optimizing query performance across different SQL Server editions.
Full Transcript
Erik Darling here with Darling Data. Recently voted by BeerGut Magazine to be the SQL Server Consultancy with the best employee retention rate, which was coupled with kind of a strange additional accolade in that it was also the slowest growing SQL Server Consultancy on the planet, because I haven’t hired anyone in five years. which I’m never going to, so it’s not that I wouldn’t enjoy working with someone, it’s just that I can’t imagine anyone in their right mind ever wanting to be managed by me. It would be a nightmare for you. Nightmare for anyone. So this is a bit of a follow-up video to the last video I recorded about the HTDeleteWeights. And I need to do two things here.
I need to show show everyone the row mode version of the query that I talked about. And I also need to expand a little bit on what exactly was going on with the, well, a more, a little bit closer to what was actually going on in reality with a client query that I was tuning. Now, the HTDeleteWeights, or the HTWeights in general that I was talking about in the last video, I don’t think that they are necessarily a scalability problem. I don’t think that it was their fault that the query was slow. I think that much like any other wait, the CX weights, LCK weights, page latch, latch weights, page IO latch weights, anything like that, you can use them to figure out what’s going on when your queries might be slow and what you might need to do to fix them. They are not on their own. It’s not like spin lock contention or anything weird like that. It’s just, you know, something that you might see in a query plan. You might wonder, what the hell’s going on here?
And I wanted you to know what to look for, which is the hash joins and hash match aggregates and running in batch mode and how you might be able to reduce those weights to get a faster query, namely by having fewer rows go into the joins and the aggregates, your batches going into them, because the fewer batches and stuff you have going into them, less of those weights you have, the faster your query goes. So that’s the first thing. The second thing is the row mode version of that query plan will of course not have any of those HT weights in them. You will just have a whole lot of sort of weird parallel CX type weights in them because obviously HT weights don’t show up in row mode plans and parallel exchanges, right?
Like, you know, repartition streams and, well, there’s another repartition streams. There’s a gatherer streams over there, which is, you know, maybe not the most interesting thing in the world. I don’t think there’s a distribute streams in this because we don’t go from a serial zone in the plan out to another parallel zone in the plan anywhere.
So there’s no need to distribute streams. We just need to repartition streams and gather streams at the very end. So this whole thing runs for, I don’t know, 30, 40 seconds more.
Well, actually, well, thanks operator times. That’s 236. That’s 241. That’s 232. These are all 232. That’s 234 over there.
Do we have anything else interesting going on in here? No, not particularly, I don’t think. Anyway, parallel exchanges and row mode plans can make operator times look really weird. And I think one good sign that you should start getting batch mode involved in queries, which is what I did to make that query faster than it was before, was get batch mode involved because, you know, when you’re processing lots and lots of rows, aggregating lots of rows, batch mode is usually way, way better than row mode, unless you’re on standard edition, because Microsoft has taken you out at the knees by limiting batch mode’s degree of parallelism to two, no matter what.
You can put all the max.p hints you want. You can say max.p 1 billion, and you still get a maximum degree of parallelism of two for your batch mode queries in standard edition. So thanks, Bean Counters.
I guess the $200 billion a year wasn’t enough. You need to squeeze people on standard edition even harder. Then you squeeze people on enterprise edition.
That’s just really sweetie. Real great folks. Anyway, so this is the row mode version of the query that I ran in batch mode where I talked about the HT to lead weights.
The weights for this one are, you know, whatever. There’s a lot of stuff in here that you might see and might be a pretty good indicator that, you know, your query was, like, contentiously parallel or something.
You might want to do some stuff to try and help SQL Server out. Maybe, you know, temp table, early grouping. Like, I showed two different ways to sort of address the HT weights in the batch mode query.
You could do the exact same thing for the row mode version of the query. Like, you could, you know, do what I did and dump some stuff into a temp table first. Or you could do what Joe Obisch did and have sort of two separate grouping queries to reduce the rows that end up falling from one parallel thing to another.
Either one is a valid choice. I don’t see anything wrong with either one. It’s just about whatever works better for the query that you’re trying to tune.
So, in real life, in real actual life, the query that I was running had some other stuff going on in it that made it really, really bad in row mode. Namely, it had a couple distinct aggregates in it. We had one here where we’re counting distinct post IDs and one down here where we’re counting distinct comment IDs.
And in the query plan for that, boy, howdy, let me tell you, things got real ugly. So, in row mode, when you have multiple distinct aggregates, SQL Server does some funny business in the query plan. Now, you can see this, oh, wait, that’s the wrong one.
There it is. This query, or this query plan, ran for a really long time. The operator times that you see here are misleading.
I forget if the query time stats for this ended up being anywhere near accurate for exactly how long it was running, but that’s what that was. But the real big problem here is that this query actually errors out. So, when I run this query on its own, I get an error message.
A big, nasty error message. All this stuff in red here. We’re at the maximum system generated unique value for a duplicate group was exceeded for index with partition ID, whatever.
Try dropping and recreating the index. Might fix it. Otherwise, try another clustering key.
What’s interesting here is what I’m not doing in this query, I’m not inserting data anywhere. It’s just a select. But in the query plan, we have spools.
And you can see that this spool didn’t really get much of anything. And, well, we had this big old thing over here. But this spool, when SQL Server, I guess, was trying to build it or trying to do something with it, we actually hit that error message.
This query failed while this spool was building. Which is crazy, right? Insane to think about.
So, the real value of batch mode here is in batch mode. And I blogged about this before. I’ll find the post and stick a link to it in there. When you use row mode to calculate multiple distincts, you end up with these nasty parts in your query plan here.
And if you tried to build too big of a result set, you might hit that same 666 error that I hit. This was not intentional. This is just a query that runs for a really long time and then fails in row mode.
Runs a little bit longer in batch mode. But, I mean, not longer than this. Didn’t run for an hour, 22 minutes, and 47 seconds. Which is down here.
If I move far enough out of the way, you can… Oh, green screen. You’re a jerk. Casting shadows, I suppose. At least you know I’m not a vampire now because the shadow that I cast on my green screen just ruined this entire video.
Thanks a lot. And we can see over in query store that this particular… If we use my wonderful, beautiful, gorgeous store procedure that IntelliSense is picking on, SP underscore quickie store, and we look for queries that have run and failed with an exception, we’re going to have a couple in there.
One of them is a different thing that I was actually working on today that I’m going to record a different video about. But this query up here is the one that hit that 666 error. And if we look at how long it ran for, of course, the hour and 22 minutes.
Well, I mean, that’s almost 5 million milliseconds of wall clock time. And just about, well, a little bit less of CPU time. And the waits for this are really weird, too.
Like in query store, well, using SP quickie store, I show you the waits that the query waited on while it was running. And you can see those up here, right? All this stuff it was waiting on.
50 million milliseconds of parallelism, 893,000 seconds of idling. Which doesn’t seem like a good time to me. And then, you know, 16 seconds of CPU time, 12 seconds of memory, and 7 seconds of other disk.
What other disk? I don’t know. The D drive?
Was it the H drive? Was it the F drive? Was it the T drive? We have no idea what drive it was. All we know is that this query died a very dishonorable death building up a spool and a query plan in row mode. So, really, the value of batch mode in tuning this query, like what I showed you was like the finished product.
I didn’t show you kind of where this started because I already had enough to talk about with the finished product. So, if you’re, I mean, if you have row mode queries that calculate distinct, especially multiple distincts in them, you would be doing yourselves a big, big favor to, if you do what you can to pre-aggregate data.
So, like, if you’re on enterprise-y edition of SQL Server, you just might want to get batch mode going because calculating multiple distincts in batch mode doesn’t suck. If you’re on standard edition, then you might want to do some pre-aggregating of the data so that your final query, when you calculate multiple distincts, doesn’t have to do as much crappy work because query plans for multiple distinct aggregates would be pretty awful in row mode.
I think that’s about it on this one. I think that’s all I had to say here. This is the first time in a long time that I have such a backlog of interesting SQL Server stuff to talk about that I’m going to be recording videos and writing quite a bit over the next week or so.
So, if you like this sort of thing, here’s the big sales pitch. If you like this sort of thing, you like this sort of free SQL Server performance tuning content, you do me a huge favor by liking this video, little thumbs up icon somewhere along the bottom here.
I don’t know exactly where. If you want to keep getting notified when I release these magnificent, perfect videos that prove I’m not a vampire, you can subscribe to the Darling Data channel.
If you’d like a subscription to Beer Gut Magazine, you can send me money. I don’t know how much. Just whatever you have.
Whatever’s in your pocket, just empty them up. It’s worth it. We’ll pass around a little collection tray on that. But anyway, I hope you enjoyed yourselves. I hope you learned something.
I hope that you’ve had a great day. It’s Monday. I hope you’re coming off a great weekend. And I will see you in another video probably tomorrow when, after I’ve drank enough blood to stain me. Anyway, thank you for watching.
I’m going to hit that stop button now. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.