Please enjoy this reasonable facsimile of what I’ll be hearing.
Thanks for sizzling!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
A Little About Hash Aggregate Spills and SLEEP_TASK Waits In SQL Server
Thanks for watching!
Video Summary
In this video, I dive into the world of hash spills within SQL Server, specifically focusing on hash match aggregates and their behavior in row mode. With a bit of a reflective tone, I share my experiences as Erik Darling from Darling Data, discussing how these spills can significantly impact query performance during what should be a relaxing Friday afternoon. The video covers various scenarios, including the differences between batch and row modes, and delves into the disappointing lack of detailed weight statistics provided by SQL Server for these operations, highlighting the frustration with Microsoft’s decision-making in this area. Through multiple queries and detailed analysis, I explore how adding more columns or text data can exacerbate hash spills, leading to substantial increases in execution time and page volumes spilled to disk. The goal is not only to understand these issues better but also to advocate for clearer insights into query performance bottlenecks.
Full Transcript
Erik Darling here with Darling Data. Feeling real bubbly and effervescent on this joyous, I think probably the final Friday of June 2024. Where did this year go? What happened? This year disappeared on us. Feels like slowly, slowly disappeared. Anyway, my eyes feel weird. My left eyelid won’t stop twitching. I think I have some form of irritation in there. But today, as you can see from the giant floating zarzad head of green text, in today’s video we’re going to talk about hash spills.
Because what the hell else are we going to do on a beautiful summer Friday except talk about hash spills. Yesterday we talked about sort spills. At least I think it was yesterday. It was probably yesterday. If anyone can remind me. What we did yesterday. That’d be wonderful.
Alright, so before the Don Julio shows up and everything changes, we’re going to talk about hash spills. Now, this video is only about hash match aggregates. This is not about hash joins. Hash joins are going to be in the next video. Not today. Another time.
Because hash joins… What’s really interesting with hash joins is looking at extended events for the hash bailouts and recursion stuff. Because that’s where you can see the sort of spectacularly bad performance that can come out of hash joins when they spill. So, in general. And now, I’m just… Specifically for row mode.
For batch mode, hash and sort spills make me very nervous. Because if you remember the sort spill video from yesterday, the batch mode sort took forever. It was like five minutes. Whereas, like the equivalent query in row mode only spilled for like a few seconds or something.
So, both batch… Both hash and sort spills in batch mode make me incredibly nervous when I see them. In row mode, hash spills tend to make me more nervous than sort spills. Because, as you’ll see in this video, like, hash spills of… Like, when we cap memory at the same amount.
And sorts usually require a lot more data. Because remember, you’re sorting all the columns you’re selecting by the columns you’re ordering by. But it’s usually like a much closer to size of data operation.
Whereas, with hashes, when those spill, for some reason, they just… Whatever algorithm is responsible for the hash spills tends to really beat performance up. Even for like similarly sized spill amounts.
So, like in pages, right? So, let’s look at this first query here. And we have query plans turned on.
Thank God, if I forgot that. If I forgot that and I had to restart recording this video, I don’t know what I would do. Alright, so, this query takes around 4 and a half seconds. And, you know, I’d really like Microsoft to recall the summer intern who did the operator times here.
Because these make no gosh darn sense. And I’m not quite entirely sure how to interpret this. Because in the row mode plan, the times are supposed to be cumulative.
So, the repartition streams should not… Should have technically run for as long… Should contain the time from the clustered index scan.
So, it should be at least 1.626 plus whatever time gets spent in there. I don’t know where to put that 1 in the accounting. I don’t know where that 1 goes.
I don’t know if we add it to the 1.6. I don’t know if it’s part of the 1.6. I don’t know if the 1.6 is part of the 1. I don’t… I just don’t know what to do with it. Alright?
So, let’s just say that the hash match aggregate ran for 2 and a half seconds or something. Right? Or 2 seconds. I really… These numbers are too depressing for me to think about too much. How we ended up here, I don’t know.
I like the batch mode version better where each operator is its own time. Because then I don’t have to worry about whatever mess this is. Alright.
So, relatively simple. Relatively straightforward. Now, we’re going to look at these next two queries in other windows. Because these next two queries took more time than I want to fill up dead air for.
While I’m waiting for them to finish. Alright? So, what we’re going to do in each of these windows is run the query and then look at our session weight stats for what happened when these queries ran.
And there’s going to be some real disappointing stuff happening here. Alright? And it’s not just related to the timing here where once again this number is lower than this number for some reason.
And I don’t know how to munch those numbers together into some sense. So, let’s just say that the hash match aggregate took, I don’t know, about 25 seconds. With the spilling.
Right? So, in the query where it didn’t spill, it took like two, two and a half seconds. In the query where it did spill, it took a lot longer. Right?
About ten times as long. Alright? Sort spills don’t usually hit you for ten times as long. I guess hash spills are different in that way. This hash spill looks about like so. Spill level two!
We had to make two passes of the spills. All eight threads did that and about 290,000 pages spilled out to disk. So, I don’t know.
That seems pretty slow for 290,000 pages to be honest with you on that. I don’t really know what to say there. Whatever. Not having a good time internally that hash spill. But what’s really disappointing is something that we’ve seen before in here where our old friend Sam, someone at Microsoft, that glorious idiot, decided to hide information from us.
Because the top weights that you see for this query over here are CX import, CX packet, and SOS scheduler yield. But they don’t really tell you where the time went in this query, do they? They don’t really account for the 30 seconds that we spent in here.
What comes a lot closer to accounting for the 30 seconds we spent in here are these things up top. Like Sleep Task and CX Consumer. That our good friend Sam said, It’s for your own good.
It’s in your own self-interest. All you wanted was a Pepsi, but you don’t get these weight stats. You can have the Pepsi. You’re not getting these weight stats in your query plan.
So the Sleep Task and the CX Consumer weights. Don’t show up in your query plan. Because Sam is an idiot. We don’t like Sam, do we?
Sam is not our friend. All right. So looking at this same query, essentially, but this one running in batch mode. So again, I’m using the query optimizer compatibility level to get batch mode on rowstore.
And I’m making this thing spill a whole bunch. All right. Spills are the name of the game.
They are the word of the day is spill. They’re also the number of the day. They’re also the special of the day. They are everything.
Everything for us. Now in this one, let’s look at the weight stats here first. So now we have some kind of new weights, don’t we? We have some HT weights. All right.
These are related to batch mode. These are very batchy mode-y related weights. But we still have a bunch of Sleep Task. And we still have a bunch of CX Consumer. Now let’s go look at our query plan and see what happened in here.
All right. This thing spilled for about 10 seconds. And this one isn’t too bad. Where they get really bad is with the bigger spills, which we’re going to take a look at in a minute.
But if we look at the weights over here. And we look at the weight stats. There we go.
That’s the button. We get htbuild and we get htdelete. Apparently these ones were considered important enough to show. But we still get no Sleep Task. And we still get no CX Consumer. Even though those are at the very top of our waiting query game.
Right. So to recap, we get this and we get this. We do not get this or this. We are all quite sad by that.
Especially because, you know, if you look at some of these wait times, these max wait times, that’s almost 11 seconds on CX Consumer. Right. And if you look at these total wait times in here, that’s a lot of time to not account for in an executing query. Right.
Kind of. It’s not cool, Sam. It’s not cool at all. We don’t like being lied to, Sam the man. All right. So let’s move on a little bit.
And let’s see what happens when our hash spills involve more columns. Remember yesterday when we had sort spills, the more data we added to those sort spills, the worse they got. It was from like a time perspective, from like a weight perspective.
Like they just dragged on and on and on and on and on. These, well, at least for the non-spilling query, even here we add some time to it. Right.
The first one was four and a half seconds. We’re essentially doing the same thing. We’re just selecting more columns and this took about a second and a half longer. Right. Like everything in here, despite, like, holy cow.
It worked on this one. I don’t know what happened. I don’t know what magic happened, but look, the time is actually cumulative on this one. That wasn’t true for any of the other ones.
What happened in here? I don’t know. I don’t know. Sometimes it works. Sometimes it doesn’t. These racy conditions, I guess. What’s happening inside your head?
It’s like trying to figure out what a toddler is thinking. It’s amazing. But anyway, this one did take a little bit longer. Right. This one did.
This one did take. It’s a little bit longer than the one where we were just selecting one integer column. In this query, we’re selecting one, two, three, four integer columns and one date time column. All right.
So now let’s go look at this one in another window. I’ve already pre-run this because, again, I don’t want anyone sitting around bored. And now look what happens here. That went from taking 30 seconds to taking one minute and 33 seconds.
This one took a full minute longer selecting more columns. All right. More stuff in here spilled because we are selecting more stuff.
This one got to spill level three. Right. This is a full level higher. A full spill level higher than the one before that. And the number of pages is also about tripled.
Right. This one went from like about 200 and something thousand to 832,000. Right. So a lot more stuff spilled, though, because we had a lot more in the hash to spill. So when it was just one column we were grouping by, we didn’t have a lot.
I mean, we still ended up messing things up pretty good. Good job, us. But we didn’t. But we didn’t. But this one, because we have more columns that we need to group by and all the other stuff, we end up doing way, way more work.
And in the results, we have way more sleep task and way more CX consumer than we did in the other query. Now, since this one is in row mode, we don’t have the HT weights, which is OK.
Like, we don’t need to know that. But again, these weights aren’t going to show up in the query plan weight stats because Sam needs to get talked to by someone.
Sam needs a talking to. Sam, Sam, Sam. So now let’s look at what happens when we start messing with text columns.
All right, so in this query, we are going to, if I recall correctly, I don’t know, again, some of these queries were written yesterday. Just kidding.
They weren’t written yesterday. I just can’t keep everything in my head all the time. Now that we’re selecting a text column and we’re grouping by this text column, this text column in the comments table.
I don’t know if you remember the sort spill video. We looked a lot at the like average length and the like, you know, like how SQL Server estimates memory for these things. And, you know, and especially at how even like the text column stuff in row mode tended to make spills worse because you’re dealing with larger data when it spills off to disk.
All right. So this query takes about 8.8 seconds. And this doesn’t spill. And somehow, miraculously, the repartition streams is working here.
Maybe it just takes like more data to make a repartition streams to work. Maybe like something has to really slow down in order for the code and read the repartition streams one to work the way it should.
I don’t know. It’s really weird. But at least it’s cumulative. I don’t know if it’s right, but at least it tracks, right? At least it’s logically cumulative going from here to here to here. At least we have that going for us.
But this, we know that by the time we get past the hash, it takes, we are a few seconds ahead of where we were when we were not messing with any text columns. Right?
So now we’re going to look at this thing running in batch mode and spilling. And, oh wait, this is the one I was supposed to close. Ah, nuts.
Here we go. So this is what happens when batch mode hash match spills a text column. Look at that.
2 minutes and 24 seconds. Ain’t that something? That’s crazy, right? That’s nuts. Like, like it’s, it’s right up there with how bad the batch mode sort was. 2 minutes and 24 seconds.
Can you imagine waiting 2 minutes and 24 seconds for this? Now, the spill level on this is back to, is back to spill level 3. But there’s a lot more pages in this, right?
Because we had that text column involved. And there, like there’s definitely some differences between the votes table and the comments table. The votes table is like 53 million rows about.
And the comments table is like 25 million rows. So the comments table, even though it’s smaller, because it has, we’re spilling that text column out. The data pages that we’re spilling out are way bigger.
I mean, not like way big, like there’s a way bigger number of them because the text column makes the pages, like adds more space, right? So the, when we’re dealing with like a whole bunch of narrow data types, even though we did spill a lot and it took a long time, it’s still not quite as disastrous as when we spill out like, like anything that involved text data.
Like the, the high end three, level three, eight spilled threat, eight spilled threat hash join from the votes table was like 800 something thousand pages. This is like 2.6 million, almost 2.7 million pages. So that text column adds a lot more page volume to the spill and really messes things up.
And of course we have in here, our friends. We have sleep task, ht build and ht delete, and cx consumer. And you know, again, for this query, the ht weights will be available in the query plan x xml, but sleep task and cx consumer, because our enemy Sam at Microsoft doesn’t want us to see these weights.
They are not going to be in the query plan, and a lot of the time that you would, a lot of the time that you would, would account for like what, what went wrong with this query. A lot of things that you would, you know, maybe see peripherally, like, you know, like when you go and examine a query plan, that would help you determine stuff are just not in there, right? So it’s always good to know where this stuff comes from.
Now, you know, I do a lot of experimentation with running queries and seeing what their weights are, and that’s sort of how I figure this stuff out. And that’s why in SP pressure detector, you know, the list of weights that I have in there, and with like a sort of description on them, will, you know, decode some of this stuff for you. So like if you, if you were looking at this query plan on your own, when you might like, like, look, the spill is visible, there’s an exclamation point on it, the operator time is visible, you can see how long that thing spilled for.
When you look at the weights, you don’t get the full story of what weights show up when these things happen. And that’s what you kind of have to know because when you’re looking at a server from the top down, if you’re like, you know, you just get on a server, and like you use whatever script you want to look at weight stats, hopefully it doesn’t screen any of these out because someone at Microsoft is a jerk. But maybe it would show you like, you know, these weights in total.
And if you saw the HT weights, and if you saw the CX consumer weights, and if you saw the sleep task weights, you saw the IO completion weights like from yesterday’s video, it would give you a better indicator of like maybe where queries are struggling as a whole. Right? And like that maybe like gives you a place to focus.
Right? Maybe it helps you figure out like, you know, like where the stress and strain on the server is. Now, especially if you see a lot of these spilly type weights, right, like IO completion, sleep task, the HT stuff, if it’s batch mode, you know, and you also see a lot of the page IO latch underscore whatever weights, that’s a pretty good sign that there’s just a constant battle going on between the buffer pool and query memory grants. And that’s that server probably doesn’t have enough memory in it in general.
You know, it might, you know, like, there might be all sorts of other ways you could go to try and get those numbers under control. But like, it just might be a sign that the server is completely underpowered. And that’s where you need to start.
Like, that’s where the quickest performance win is just like, just get some more memory in this thing if you can. Right? So, let’s go look at one last query in here. And we’re going to close this out.
And this one is particularly interesting to me because this one will get a hash match flow distinct. And that hash match flow distinct will spill. And we’re playing kind of a weird trick on SQL Server here with top.
And the bigger you set this number to, the worse this spill is. I had to find something in the middle. And then we’re going to say optimize for top equals one. And even with a recompile hint, can’t figure that out.
Or rather, it is still under the spell of the optimize for hint. And if you look at what happened in here, we, of course, have a number of things that we’re going to do. Once again, a whole bunch of sleep tasks up at the top.
10.991 milliseconds. And I think the reason why, like, you know, this one is helpful to look at is because this one is single threaded and a lot of the other ones run in parallel. So, it’s a little bit more clear, like, where weights go in here.
And so, if you look at the results where we have, you know, fully 10 seconds of sleep tasking, we can probably figure out just how much time was spent actually spilling on that single thread in there. Right?
So, but, you know, once again, if we look at what happened here and we look at the weight stats for the query, the only thing we will see is 4 milliseconds of SOS scheduler yield. All right?
There that is. There’s that 4 milliseconds of SOS scheduler yield and a query that ran for 25 seconds. All right?
So, we took 6.7 seconds here and we took, well, I mean, 25 is 19. And so, we spent about 10 of the 19 seconds in this operator spilling to disk. Isn’t that exciting?
Isn’t that exciting to know about? And, of course, the spill level for this is spill level 5. One spilled thread. Anyway, that’s about enough about hash spills.
Now, again, this was purely about hash aggregate spills. Tomorrow’s video, or actually, no, tomorrow’s Saturday. So, probably not tomorrow’s video and probably not Sunday’s video.
Maybe Monday’s video will be about hash join spills. So, I hope that you’ll join me for that. Anyway, thank you for watching.
I hope you enjoyed yourselves. I hope you learned something. If you ever meet someone at Microsoft, I hope you have a good talk with them about the weight stats that they’re including in these query plans. That’d be nice.
You know, we deserve better. Us people paying, well, I mean, I don’t pay per core, but you probably pay per core. So, you deserve better.
Apparently, I deserve whatever I get. That’s okay. Yeah, if you enjoyed this video, I do like thumbs ups and I like encouraging comments. Up to and including you, go girl.
If you enjoy this sort of SQL Server content, you can join. Let’s see. Let’s make sure we have this refreshed up until the absolute most current. You can join nearly 3,790 other of my data darlings by subscribing to this channel and getting a notification every time I publish one of these.
And I would just like to apologize to anyone not in an East Coast time zone who gets this notification late at night, like someone in Europe maybe. Or even further away than Europe. Past Europe.
I don’t even know what time it is in New Zealand right now. Australia? Who can tell? So, I’m not sure if anyone else subscribes to me from further away than that. Probably not.
Anyway. I’m gonna go start Friday-ing, because it is Friday and it is time to Friday. Thank you for watching and I will see you in the next video about hash join spills. It will be just as exciting and riveting.
I promise you. I would never lie to you. I’m not from Microsoft. Or the government. Or the government. And that’s smart because I would take care of them. Welcome to this band right now. Now, what we’re aware of is that these are folks that we can play for with our muutest territory where we’re not Arabia Thank you.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
A Little About Sort Spills And IO_COMPLETION waits In SQL Server
Thanks for watching!
Video Summary
In this video, I delve into the fascinating world of sort spills in SQL Server, explaining how they can impact query performance and offering insights on when to address them. Starting off with a bit of personal frustration over Microsoft support’s suggestion to shut down an Azure instance to save costs, I highlight why better support is available through services like mine at Darling Data. The video then focuses on demonstrating sort spills using two queries—both with hints to ensure one spills while the other doesn’t—and explores why the spilled query can sometimes run faster than its non-spilled counterpart due to factors such as local storage speed and the complexity of sorting multiple columns, especially those containing text data. Through detailed analysis and practical examples, I aim to provide viewers with a deeper understanding of sort spills and their implications for database performance optimization.
Full Transcript
Erik Darling here with, as you may have guessed, Darling Data. In this video we’re going to talk about sort spills. Now, if you’re wondering why I sound a little bit low right now, oh boy. I just got off a very depressing client call where Microsoft support suggested out loud and with a serious face that one way that they could say, save money on their Azure bill would be to turn off their managed instance that runs their e-commerce site overnight so it didn’t accrue any spend. So yeah, we’ll just turn the website off at night. It’s like the early 90s when you would leave the office and turn off the lights and it would also turn off all the servers or something. So yeah, anyway, that’s the thing. That stunk. No one was happy. So once again, if you would like better support than Microsoft is willing to offer you. My rates are reasonable and I am available for higher.
Erik Darling of Darling Data is in fact available for hire for these kinds of things. So you don’t have to be abused by Microsoft financially and mentally. So you might notice that something up here that I’m doing is setting compat level explicitly to 140, 140 because that is the 2017 compat level. The reason I’m doing that is because if we use the 150 or 160 compat level, we will get sort spills that neither of us have the patience to sit and wait for during one of these videos. So here’s what I got from these two queries. It’s really the same query twice. But this is what happens when a batch mode sort spills. That is nearly five minutes. And remember, this is a fully batch mode plan.
So all of the operators in this plan are responsible for only for the time that is spent in them. Down here in the row mode plan, things are a little bit different. Things do get weird because you might see that this ran for almost 2.2 seconds. Repartition stream says, no, I only ran for 1.3 seconds. And the sort says, I ran for 4 seconds. So that’s a little misleading. But anyway, the reason for this is all explained in great detail in this wonderful post by Mr. Paul White, or Ms. A.K.A. Pablo Blanco. And it was based on this demo. And actually, my name actually appears in the blog post, which is a magnificent thing.
And never thought that I would see the day. And if you feel like Microsoft should probably work on this scalability issue, there’s also a feedback item that is actually under review. To Microsoft’s credit, I will click on this so you can see. So this feedback item opened by y’all is truly nine months ago. Oh, man, this one’s ready to pop.
This is under review. And I actually got a thank you from the company. That’s as good as a gold watch, isn’t it? All right. Getting a thank you. Comment thank you. Anyway, let’s get back to regular row mode sort spills. So I’m going to run these two queries. And they’ve got all sorts of hints and stuff on them to recompile and clear out the procedure cache.
And this one up here is I’m using the min grant percent hint to ensure that this thing gets the minimum amount of required memory to not spill. Because I want this one to not spill. And I’ve got this query down here using the max grant percent hint, ensuring that it most definitely will spill.
It was built to spill if you’re into that kind of music. We hear it darling data or not. We like hard goth. Hard goth only. Anyway, just kidding. We like a wide variety of music.
Depends on what the mood is. When it’s a bad mood because Microsoft support is awful, we listen to the hard goth. So let’s look at the query plans for these.
Because that’s what we do, isn’t it? We’re the data darlings who stare at query plans. And let’s be moderately surprised when we look at these two query plans. And we see that the plan that didn’t spill.
Oh, that was terrible framing by me. We’re going to do some sit-ups after this one. And the plan that did spill, we see way over here, these sword operators. The sword operator way…
Oh, my hand. I look like I’m doing something awful to that sword operator. The sword operator way up top did not spill. And that took five seconds. The sword operator right here…
Ooh, that’s nice. That’s right there. Man, that’s good framing on my part. That took 3.4 seconds. But why? Why?
Why my data darlings did this… Why did the query that spilled take less time than the query that didn’t spill? Now, this is something that I… Like, I never used to, like, really catch well query tuning things before Microsoft introduced operator times into query plans.
Because you would see a spill and all you would have to go on is, Well, crap. Spills are pretty slow, right? You should try to get rid of spills. If I fix the spill, maybe it’ll be faster.
It didn’t always turn out that way. Now, the reason why my spill is faster, I mean, first and foremost, is because I am on fast local storage, right? So this is Crystal Disk Mark hitting my fast local SSDs on this computer.
Again, these are SSDs plugged directly into all the same parts and components encased in a beautiful Lenovo laptop right next to where all the other hardware and stuff is.
Because, you know, you don’t have that probably, though. Because you work for knuckleheads. And you work for knuckleheads who dragged you kicking and screaming into the cloud where storage is awful.
Generally awful. And if it’s not the storage that’s awful, then it’s the path that the data has to take getting to the storage. It has to get way over here, right? Your data is nowhere near your SQL Server.
It’s miles away, probably. Miles of network cable away. So I get the benefit of fast local storage that I don’t have to go across miles of wires to get to. You probably don’t have that because you work for knuckleheads.
I work for one knucklehead. But the one knucklehead I work for bought one nice laptop to do demos on. So that’s why this sort is fast for me. It probably wouldn’t be fast for you.
I realize some cloud instances do have, like, a local storage with, like, you know, hyper drives on them. And you could get stuff fast there, too, probably. But most people don’t have that.
So they get really screwed up by this stuff. So you will probably want to fix sort spills. I probably don’t need to fix sort spills. But I’m going to show you in a minute how you know if you need to fix sort spills.
Aside from, like, just, or, like, if you have a lot of sort spilling and, you know, doing things. So one thing that’s sort of interesting about sort spills, at least in parallel execution plans, and we’re going to hope that this query works correctly the first time because this demo is a little weird.
Sometimes it’s, like, great the second I run it. Other times I have to tinker with the memory grant percents. And it’s not fun when I have to tinker with the memory grant percents.
And what do you know? I’m probably going to have to tinker with the memory grant percents. Let’s change this one to, like, 13. Because, you know, what’s funny is it worked three seconds ago when I ran this before recording the video.
Don’t take it out on me. I’m still better than Microsoft support. There we go.
That’s what I wanted to see. So if you look at this top query up here, right? This query, when it spilled, it only… So this query, just to make sure everyone understands, this query is running at doc 8. That’s this many fingers.
And this query spilled… Well, spilled level 1 and spilled 7 threads out to disk, right? That’s this many fingers.
8 threads is this many fingers. And so one of these threads is showing that it did something, right? So if we come over here and we look at the properties and we look at this, we will see one thread with 1,435 rows on it. It looks like it did some stuff.
But this is just a weird query plan timing issue, right? This is not actually an actuality kind of what happened. It’s sort of what happened. Both the thread stuff and the operator time stuff, as we saw in the previous demo with the row mode thing where the repartition streams was not in the realm of reality of what the other operators around it were doing.
The operator timing and the thread stuff can also not be anywhere near reality. It’s sort of like me after 8 p.m. Me and reality are not shaking hands anymore.
But so this query down here, which spilled to level 1 and spilled, oh, why did you disappear? You were right there. All you had to do was not leave like my dad.
So this is level 1 and spilled all 8 threads. So this sort, even though almost 53 million rows from both of these go into this sort, and both of these sorts sort 53 million rows, this one looks like it didn’t do anything. All right, it’s just, it’s all zeros in there, all right?
Like my report cards. So, again, the reason why, like, the sort spills are generally faster is because… I have nice local storage, which you don’t have, probably.
I hate whispering. Sorry. Sorry about that. Sorry about that. So sort spills can get worse as you have more columns to spill. So just, you’ll allow me to go back in time one moment.
If we look at this sort, we have one column in the output list, that is post ID, and one column in the order by, which is post ID descending. Okay? So if we run these two queries now, and again, I have my little hints here just to make sure everything happens the way I want it to.
That should be… Is that… Those are the right two queries?
I didn’t highlight the one above it, did I? That was rather foolish of me. Rather foolish. Oh, Eric. Where does your foolishness cease? Ever.
So we’re going to run these two, and this is important because you should understand this about sorting data in SQL Server. Right? And this one is still a little bit faster.
Not as crazy faster as the other one. Right? Six point… Oh, man. Zoom it is all over the place today. 6.2 seconds versus 5.7 seconds. But now the sort operators are going to look a little bit different than they did in the previous demo.
And they’re going to look different because we have, if the tooltip ever graces us with its presence, we have way more columns in the output list now. Right? We’re still only ordering by post ID.
But what SQL Server has to do is all the columns in the output list, those also have to be put in order. Right? Like, that’s what a sort does.
It sorts all the data that you’re outputting by the column that you’re ordering by. So, you know, again, I’ve probably gone over the Excel analogy a few times where when you’re using Excel and you click that button in the top left-hand corner and everything gets highlighted. And then you click sort and you choose a column and everything in the spreadsheet flips to match the sort order of that one column.
Or Excel kind of yells at you and is just like, are you sure you just want to sort this one column and not everything around it? Because you’d look kind of stupid if you did. So, that’s what SQL Server kind of has to do in memory too.
It has to flip all the order by columns to the order of the, so it has to flip all the output columns to the order of the order by column. So, that’s why queries that select more columns and need to order those columns need more memory. Right?
So, that’s one thing to keep in mind there. And one thing that will exacerbate those issues is when you have text data, or not just like the data type text or ntext, I mean like any string data really. Anything that is not, like all the columns that we’ve been dealing with before, these are all integers or dates.
I guess they’re all integers and there’s one date time. So, these aren’t like, you know, big honking columns with like variable lengths and stuff where SQL Server has to guess how much data is in them. Right?
So, if we look at the comments table and we run this query, right, what I want to show you is that a lot of columns in here don’t, like, so the, just to make sure, make sure, sure, we understand what we’re talking about here. The, the text column in the comments table is an envarchar 700. Right?
Envarchar. Double, double byte encoded text. And so, that’s why I have data length divided by two. Also, data length tends to be a little bit faster than length when we do these things. So, that’s why, that’s the divide by two there.
So, if you look at all the, the stuff in here, like, a lot of the comments that, you know, just, listen to this top section, don’t have very long length, byte lengths, compared to the maximum byte length of the column. But, what’s really interesting, ready, like, so, I’m going to show you this and I’m going to talk a little bit about memory grant stuff, is when we run this now, we have this average column length of 302 bytes. And this is actually, this actually plays pretty well into how SQL Server does memory grants for string columns.
Right? Because what it does is it guesses that every row that is produced, that needs to be sorted, for a string column, that, that, that row data will be half full. Right?
So, for a var, and varchar 700, having 302 bytes in there is actually pretty, pretty close to half. Right? So, the average comment length in here actually works pretty well for that algorithm. It might not work well for every, like, data set.
You might have, in varchar 700 columns, where, like, legit, like, everything really is only, like, you know, half or, like, 50 bytes full or something. And you would just get really excessive memory grants for that stuff. So, if we run this query right here.
And this query does not select the text column. And we look at what it does. This, this one still sorts, spills, this one still spills a little bit.
But it’s pretty quick, right? Two seconds. Like, no one’s, no one’s going to really gripe about two seconds. But now, if we run this query, which is, which is captive, a max grand percent of one with the text column involved. Here’s, here’s what we’re going to have to, here’s what we’re going to do.
Is we’re going to come over here. And we’re going to run sp pressure detector with a sample of 12 seconds, which might, which might help you understand exactly how long that query that I just highlighted is going to run for. So, we’re going to kick that off.
And then we’re going to run this. And what we’re going to see at the end is that SQL Server spent way more time spilling when there was text data involved. Because we have way more pages to spill out.
We have bigger data size to spill out because of that text column. Right? So, this whole thing takes about 10 seconds, which is just, which, if you’re wondering why pressure detector was at 12 seconds, it’s so I could run it and have like a second or two of grace period to, to come over here and execute this one. Right?
So, if we look at what sp pressure detector tells us about the 12 seconds that this ran for, what you’re going to see way up at the top is this IO completion weight. And if you, if you notice the, the helpful description column that I’ve put into SP pressure detector, just for you, just for you, because I love you and I care about you way more than Microsoft support does. So, this is the weight type.
This is the weight type that you will see from queries when they are spilling sorts a lot. There are different weight types, which we’re going to look at in future videos that happen when hash spills, hash operators spill from both a hash aggregate and hash join perspective. But the IO completion weight is pretty decent, like you can associate that pretty decently with row mode sort hash spills.
So, if this were, if this were a batch mode query, you would see BP underscore sort as the batch mode sort thing that was happening when things were spilling and getting awful. So, if you, if you’re looking at a server as a whole, or if you’re looking at weight stats for a query and you’re wondering what IO completion means, well, if, you know, if you have a lot of slow queries that are, you know, doing a lot of sorting and they’re doing that sorting in row mode, there’s a pretty good chance that they are spilling lots and lots of stuff out to disk. And since you work for knuckleheads who, you know, put you on the cloud, you’re probably going to have to fix those because that actually can meaningfully slow a query down right there.
So, IO completion weights, if you see those associated with running queries, a lot, some spilling going on. Whether that spill is the root cause of why the query is slow, you’re going to have to determine that or hire me to do that because I’m happy to, happy to tell you. Either way.
But that’s, that’s what you would have to do there. Anyway, my, my wife has been texting me for 20 minutes. So I should probably respond or something. But before I do, before I go, thank you for watching.
I hope you enjoyed yourselves. I hope you learned something about sort spills. If you like this video for whatever reason you like it, it doesn’t have to be the content. It could be, it doesn’t have to be what you see in SSMS.
It could just be my bright, sunshiny presence here on the screen. I like the thumbs ups on the videos. And I like, you know, the you go girl comments in the videos.
You can even say you go girl. I won’t, I won’t be offended. So there’s that. If you like this sort of SQL Server content in general, please subscribe to my channel and you can join drum roll. Let’s hit this refresh button.
Make sure we’re totally up to date. You can join nearly 3,779 other data darlings who get notified every time I publish one of these videos that mean so, so very much to me. So once again, from the bottom of my heart, thank you for watching.
Thank you.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Everything You Know About Isolation Levels Is Wrong
You’ve been told that NOLOCK hints are bad, so you look at all the queries your developers write and hang your head in shame.
A NOLOCK hint here, a NOLOCK there, a NOLOCK hint seemingly everywhere. Like termites, eating at the foundation of your well-being.
But in the real world, how are you supposed to remove those all those yucky hints without blocking and deadlocking causing huge problems?
I’m Erik Darling, a world class NOLOCK hint removal expert, and in this demo-heavy session, I’ll change your mind about every isolation level. You’re going to learn why:
Read Committed is nearly as weak as Read Uncommitted
You don’t need to worry about tempdb’s version store
No isolation level is perfect for every workload
At the end, you’ll have the confidence and knowledge to start turning on optimistic isolation levels and stop hanging NOLOCK hints all over your queries like Christmas tree ornaments.
Session Prerequisites: Basic understanding of locking and blocking problems, some familiarity with isolation levels.
Get A Deal On Ticket Prices
If you want to get a deal on registration — and you should hurry up and do that because birds of earliness prices expire on July 9th — head over here.
When you’re registering, use the discount code DARLINGE24Â for $150 off the regular price for the three regular session days, Wednesday – Friday.
Because I’m teaching on my birthday, and if you don’t come we are NOT FRIENDS ANYMORE!
Thanks for reading, and see you in Seattle!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
To get the PASS Summit discount, head over here and use the discount code DARLINGE24 to get $150 regular price on the 3-day convention.
Thanks for watching!
Video Summary
In this video, I dive into the appropriate use of string split in SQL Server queries, highlighting common pitfalls and best practices. Erik Darling from Darling Data shares insights on how using string split directly within a query can lead to poor performance due to inaccurate cardinality estimations. He demonstrates this with a practical example, comparing it to his previous video on table variables, showing that both methods result in similarly suboptimal performance. To avoid these issues, he emphasizes the importance of dumping the results into a temporary table before using them in your query, which allows SQL Server to make better estimates and optimize the execution plan effectively. By following this advice, you can write more efficient queries and even take advantage of his special discount code DARLINGY24 for Pass Summit, saving 150 bucks while enjoying some well-deserved drinks with Erik on his birthday!
Full Transcript
Erik Darling here with Darling Data. Gosh darn it. And in today’s video, we’re going to talk about two things. The main point of the video is the appropriate use of string split in SQL Server queries. The first thing, probably the most interesting thing, is I have a discount code for you for past Data Community Summit 2020, 2024. That is, of course, coming up this November in Seattle. Remember, one of those days in there is my birthday where you are contractually obligated to buy me a drink. Everyone at the conference is contractually obligated to buy me one drink on my birthday. And I am contractually obligated to drink that drink on my birthday. I’m going to die. But if you are on the fence or if your employer is cheaping out on you about coming to pass Data Community Summit this November 4th, through 8th, through 8th, in Seattle, Washington, I have a discount code for you. So if you use, and this will be in the show notes for the Darling Data Radio podcast for Data Darlings, if you use the discount code, and I hate this, Darlingy24, you can get $150 off the three-day conference ticket. That’s this thing. So if you’re bad at math like me, and you use SSMS as a calculator, the regular price is $1,895. With the discount, it’s $1,745. Now granted, that probably isn’t coming out of your pocket. Your boss or your company is probably paying for that. So if you want to save your boss or your company $150, they can… I don’t know what a company does with $150, to be honest with you. It’s always blown my mind that $150 would make a difference.
in a company budget. But anyway, if you want to get that discount, and when you sign up to go to Pass Data Summit, you can plug Darlingy24 in there and receive your $150 off. So that’s cool, I think. Anyway, let’s talk about the appropriate use of splitting strings in SQL Server queries. Now I’ve created a couple opportune indexes here on the post table and on the user’s comments table, sorry. And what I’m going to do is I’m going to show you two queries that hit these tables. Now this demo might look very familiar to you. If you remember my smash… I’m going to actually… I’m going to start running this because…
it runs for a little bit. So if you remember my New York Times best-selling YouTube video on table variables, splitting strings and using table variables have two very big things in common. SQL Server has no idea what’s in them. SQL Server has no idea what’s in your table variable. It might know how many rows are in there, maybe, under the right circumstances, but it has no idea what data makes up those rows. And you would have seen that in my video called, for the last time it doesn’t matter how many rows you put in a table variable.
And the same thing goes for a string. So this demo might look very familiar to anyone who is… who watched that video. But here’s what I’m doing. Wait, let me… I’m sorry, I’ve messed things up a little bit. I did not frame things appropriately here. I want to make sure… ZoomIt is very laggy on me today, I’m not sure why. ZoomIt is once again rebelling against me. So right up here, I am declaring some IDs, or I’m declaring an ID variable, not a table variable.
Right? Just in the… For the naysayers out there, it’s only a varchar8000, it’s not a varcharMax, not a varcharMaximum data type. So there’s nothing to freak out about there, just a varchar8000. Still a pretty long string, right? It’s a pretty long letter. Someone wrote 8000 bytes to me, I probably wouldn’t read half of it.
But then what I’m doing is… exactly like what I did in the table variable demo, to show you how… Sebel Server cannot do good cardinality estimation with table variables. It can also not do good cardinality estimation with split strings.
So I am doing exactly the same thing, where I am inserting the ID for Mr. John Skeet. At least the last time I checked, he was still a mister. I don’t know if that may have changed. It’s difficult to keep up with John Skeet. And then I’m also sticking in the top 99 nobodies.
Sorry if you’re in this group. The top 99 nobodies in the Stack Overflow database. These are users with a reputation of one who have created their accounts most recently. And we do not have high expectations of these new recruits, these cadets, because they have not had enough time to make an impact yet.
John Skeet has been around from the beginning, making big impacts, tyrannosaurus foot impacts on things. And so, you know, we don’t expect them to have much data. And so what I see clients do a lot is something like this, where they will extract values from a list of things.
And then they will either do like, you know, with an in or exists or a join. I’ve chosen a join in this case because it was the easiest way to convert the old demo to this one. Pure laziness on my part. Nothing funny going on, I promise.
But this is what I do, where I extract the value from the string split. And then I join that to the post table. And I join the post table to the comments table. And then down here, I show you how wrong of an idea this is.
This is definitely not a good thing to do. This is the bad way of doing things. This is the inappropriate use of string split in a SQL Server query.
It’s not how you want to do things, because you will end up with very bad performance. If I move my armpit a little bit, oh, that’s my hand too. You might see two minutes and 13 seconds under there.
And that two minutes and 13 seconds was largely eaten up by just this query here by itself. Right? Because this is the crappy way of doing things. If you do things in this way, my rates are reasonable.
You can hire me to fix these things. I will fix them all day long, happily. I’ll whistle. I’ll draw you doodles of things. I’ll bring flowers.
Maybe we’ll go on a picnic. I don’t know. We could do all sorts of stuff. But really what you want to be doing is something more like this, where you can split your strings into a temp table with the appropriate data type already assigned to it. One thing that it doesn’t…
So it can certainly make things messier, but it’s not like the biggest issue with the string split. And I’ll show you in a minute. But by default, the value column that comes out of string split is like some big envarkar column.
But SQL Server is able to like do a convert implicit on what comes out of the string split function if you’re using it like I am where you’re joining it off to something. And I’ll show you that in a minute in the query plan. But then if we put that into a temp table with the correct data type, SQL Server can do this a lot better.
Because two reasons. One, you know, again the correct data type, minor point there. But when you put that data into a temp table, SQL Server is then free to figure out what makes up that data.
You get a statistical histogram on the temp table, so SQL Server can do better cardinality estimation. This is almost the exact same thing that happened with the table variable. SQL Server doesn’t know what makes up the data in the table variable.
SQL Server also doesn’t know what values, what comma separated values make up the string that you’re splitting. It does not pause every time it extracts a value and say, oh, it’s this value, cardinality estimate. Or it doesn’t like extract, it doesn’t fully extract the values and say, oh, this is what I’ve got.
Let’s do cardinality estimation based on these values. It just says, uh, crappy guess. Got a crappy guess for you?
Want a crappy guess? Got a crappy guess. Got lots of crappy guesses. Handing them out, making it rain crappy guesses. So, if you’re gonna split on that string, what you want to do is not this, where you immediately join to it in your query. You want to do that and dump it into a temp table.
Right? And I think the proof here is in the query plans. Now, we return exactly the same result for both of these. That’s a big number and all, but if we look at what goes on here, right?
We have our initial string split, right? Or, sorry, this is our initial string assignment, right, up here. And this is quick, about 45 milliseconds in total.
And then down here, this is our first attempt at the count query. And this ends up performing just about as poorly as the table variable query for just about the same reasons. Now, I could have made this worse, but I chose to not make it worse.
I chose to make it a little bit easier because if I made it worse, it might have looked a little unrealistic. But, I mean, if you look what happens here. Oops, come on, zoom it.
This query takes 2 minutes and 12 seconds in total. And SQL Server guesses 50, but we get 100, right? So we have a 200% misguess here.
But then SQL Server’s estimates down here are also awful. We are off by 5, wow, that lined up really nicely. I didn’t do that on purpose.
If I tried to do that a million times, I would not be able to do that. That is a very nice… Look at this, look at the symmetry on that. God bless. Wow. Anyway, SQL Server makes a very bad guess here, right?
We get almost, we’re not even close to reality. We get roughly 5,804% wrong. But granted, since we have a good index, it happens pretty quick. The trouble is that, I mean, A, the guesses are off.
And then SQL Server has chosen this particular plan shape, where we just go from nested loop to nested loop. And you’ll notice that we spend quite a bit of time in here, where this estimate is off by some number that can only be conveyed with an ellipsis at the end. That is 81704…
I don’t know where that number ends. I do know, looking at this, that if we pay attention to these numbers right here, and again, I’m not good at math as a high school dropout. A public education received student.
But, I can tell you that 1-3-0-4-0-0-9-0-6-7, that is a 10 digit number, is a lot bigger than 1-5-9-6-0, a 5 digit number. How much bigger? You figure it out.
81704 something percent bigger. A lot bigger, right? Orders of magnitude. If anyone knows how to order a magnitude, please let me know. I’m willing to pay extra.
So, this thing, this whole thing takes 2 minutes and 12 seconds. Now, if we look at the rest of what happened in here, when we go, and we look at our ingenious method of, dumping data into a temp table from the string that we are splitting, right?
We dump this in here, and we look at the query plan down here. It’s going to look a lot like the query plan that we got when we used a temp table in the, for the last time, it doesn’t matter how many rows you put in a table variable video, where SQL Server makes slightly better guesses.
So, in here, SQL Server actually makes the right guess. It knows 100 rows are going to come out of here. It even goes to the trouble of sorting those 100 rows, so that we match the order of, if we had a clustered index on the IDs table, we wouldn’t need to do that, but that’s okay.
For this, I’ll forgive the zero second, the zero anythings that it takes to do that, right? And then we, granted, this guess is not fantastic, but you know, the default cardinality estimator is not fantastic, so we don’t get an impossibly good guess here, but we don’t necessarily, you know, the nested loops join thing is fine here, okay?
The big deal here is that SQL Server sticks a stream aggregate between one nested loop and the other, right? So we have this combination of query plan operators in this plan that we don’t have in this plan, right? Remember I said from this one, it just goes from nested loop to nested loop.
For this one, SQL Server says, you know what? I’ve thought about it, and I think we need to trim this down. And so SQL Server has to do far less work in here.
This goes from taking, let’s go look at that again. This goes, that took, ah, get out of the way tooltip. No one needs you.
We’re all hitting F4 and getting properties when we need to see that stuff. So this goes from taking one minute and 28 seconds to taking one millisecond. Now granted, this cardinality estimate is still off by a little bit, but this aggregate saves our buttocks in a very serious way.
And so this query working off the temp table only ends up taking 10 milliseconds. It’s a big difference, all right? So when you’re, I beg of you, if you are writing queries in SQL Server, and you are passing comma separated or anything separated lists, even if you’re passing XML or JSON or anything like that, and you need to relate that blob of string text to something in a query, in a joiner where clause, or a subquery, or an exists, or really anything relational, anything with relational meaning, please, I beg of you, do not do the string splitting in the query.
Take the XML or the JSON or the comma separate, anything separated list of things you’re passing, dump them into a temp table.
You can use any string splitting facility you want for that. You can use the built-in one that Microsoft gave you in 2016. You can use the one that you copied and pasted that’s a multi-statement table valued function with a while loop in it. You can take that one that you copied and pasted from the internet in 2002 and use that, and it will not profoundly affect your query because you’re dumping it into a temp table.
And then you’re going to use that temp table in your query instead of doing the string split in the query, and you’re going to look so smart in front of everybody. You’re going to look so smart that your boss will be like, Hey, you know what?
You get to go to pass summit this year, and you get to use Erik Darling’s discount code DARLINGY24 to do it, because we’re going to save 150 bucks, and I’m going to put 150 bucks cold hard cash in your hands to buy Erik Darling drinks for his birthday. That’s exactly how that’s going to go.
So, for the sake of all of us, so that you get to come to pass and celebrate my birthday with me, and you get 150 bucks cold hard cash from your boss to buy me drinks, you’re going to not put string split where it would matter in your queries.
You’re going to dump your split string results into a temp table, and then you’re going to use that temp table in your query. All right? Cool.
So, with that out of the way, thank you for watching. I hope you enjoyed yourselves. I hope you learned something about string splitting, the appropriate uses for it in SQL Server queries.
If you like this video, you know what I like? Thumbs ups. I like nice comments.
I like motivational comments. Those are cool, too. If you like this sort of SQL Server content, you are free to subscribe to this channel and join, let’s see, let’s get the most current tally on this, nearly 3,775 other data darlings out there by subscribing to the channel and getting a delightful notification, very helpful notification every time I publish one of these videos.
And you know what that gets you? 150 bucks off pass. And the pleasure of my company.
And, I don’t know, cool way to kill 15 minutes? I don’t know. Maybe you’re like, you’re in the bathroom, you got nothing else to do. You finish playing Wordle.
You finish playing Words with Friends. And you’ve gone through your RSS feed. And you’re just like, what am I going to do for 20 minutes? Earbuds in.
Watch my video. Soothing. It’s meditative, practically. Reduce blood pressure. Give you a good laugh. I don’t know.
Whatever. It’s a cure for what ails you, basically. It’s like drinking a Guinness in the morning. Right? The smart thing to do. Anyway. That’s enough for me.
I’m going to go back to work. Maybe go to the gym. Give this Adidas t-shirt a workout. Right? Get sweaty a little bit.
And I don’t know. I’m going to think about what to record next. Because I’m always thinking for you. Anyway. Thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Some Notes On Query Tuning With Batch Mode In SQL Server
Thanks for watching!
Video Summary
In this video, I delve into some interesting aspects of query tuning in SQL Server, particularly focusing on batch mode execution and parallelism. Starting off with a humorous anecdote about re-recording due to a weekend brain fog, I share the excitement of being voted “the classiest SQL Server consulting shop ever” by BeerGut Magazine—a moment that adds a light-hearted touch to what can sometimes be complex technical discussions. The video then explores how batch mode execution works and its limitations, especially in achieving full rowstore parallelism. Through detailed examples, I demonstrate techniques like using temporary objects and query hints to optimize performance, highlighting the nuances of SQL Server’s heuristics and the importance of understanding these mechanisms for effective query tuning.
Full Transcript
Erik Darling here with Darling Data, re-recording the video that I uploaded Friday where it sounded like my throat was full of weird bugs. So we’re going to try this again and hopefully I remember to say all the right stuff because it’s been a weekend and after a full weekend everything that was in my brain the week before disappears. So we’re celebrating a very very special day here at Darling Data. We were voted by BeerGut Magazine to be the classiest SQL Server consulting shop ever. We’ve got some drivers apparently Intel is telling me about. Thanks Intel. Get right on that. And yeah, we’re going to talk today about, I don’t know, some batch mode stuff, some parallelism stuff, some parallelism stuff. You know, some general query tuning stuff. Nothing, there’s nothing overly specific about this video. It’s kind of a funny thing about it. It’s just, it’s just the stuff that I find interesting and you’re just going to have to deal with that today. You might find value in, you know, a few different places here, but there is no overriding theme or point to this video aside from the fact that you can see here.
I find it interesting. So let’s look at a query. Let’s do that. All right. We’re going to make sure we’re doing all the right stuff there. We have query plans turned on and now let’s run this. And what we’re going to do is in about eight seconds, we are going to see the most offensive query plan ever. Highly offensive. I might have to blur it out. That’s how bad it is. Because Tipper Gore is going to be all over this channel when you see this thing. Look at this. Behold. Behold the terror, the tragedy that is this query plan. Look at this. Look at this thing.
So we have two, it’s so awful. We have a parallel merge join, which of course should not exist. Right? That’s just ugly. And for some reason, SQL Server decided that it wanted to sort a 53 million row table to make that parallel merge join happen. And no, just no SQL Server. You’re bad. You’re awful. Why? Why? I’m downloading Postgres as we speak. Postgres would never do this to me. Actually, I’m lying. It probably would.
But any database engine can forsake you like this. None of them are, you know, none of them are beyond this level of offense. But holy cow. It’s a wild one, isn’t it? It’s a real wild one. Now, whenever I’m tuning store procedures or queries and I see stuff like this, one of the first things that I always try to do is just see, like, what will happen if I use batch mode. Because oftentimes, batch mode will help us avoid calamitous query plans like that. And, you know, for queries that do a lot of CPU work, have a lot of CPU effort, batch mode can be very useful.
Because we work on batches of rows. We don’t work on a single row at a time. Sorry, something is itching my forehead and I don’t know what it is. It’s a bit distracting. Anyway, so what I usually, you know, first try to do, and, you know, if I’m using a SQL Server version that has in-memory tempDB turned on, where I can’t use a temp table with a columnstore index because I’ll get an error, which you don’t have to worry about in managed instance because it’s, like, apparently 15-year-old software and that option isn’t available in it.
Don’t use managed instance. But this is one of the first things that I’ll try here because this will, you know, give me at least some hope of maybe getting a batch mode plan that’ll be helpful. Right? And so what I’ll usually do is something like this, where I’ll, you know, create a temp table.
There will be a clustered columnstore index on it. And then I’ll do sort of a funky, tricky, little lefty join-y thing like that just to, you know, let the optimizer get a little width of a columnstore index so that it’ll be a little bit more lenient about using some batch mode operators.
Now, one thing that I’ve been sort of lazy about showing in demos when I do this is I usually give the column in whatever, you know, either temp table, columnstore helper object, or if I can’t use a temporary object because of in-memory tempDB, and I have to create, like, a dbo.t or dbo.more descriptive name, like columnstore helper table, I’ll always give this column kind of a weird name because if we were just, if we were being real lazy, and let’s say we were doing select star here, or we were doing, like, we had a CTE, we were doing, like, select into, we would get potentially duplicate column name errors.
Or if, like, we were working on just a real shamble of a query where no one was aliasing stuff, we could get, like, weird, ambiguous column name errors. If, like, this column was just, like, called ID and we were selecting just, like, ID up here, right?
Or we were joining to just ID down here without an alias. So it just, it kind of helps you to avoid weird stuff if you give that column a weird name. So there’s that.
Again, another weird little point along the way. Something for you to marinate on later. So let’s see what we get now because we’ve got this all set up. At least I think I created this temp table.
We’re going to find out. If this throws an error, it’s fine. If it doesn’t, then it would have thrown an error when I ran this query. So either way, you were going to see some, you were probably bound to see some red text if I didn’t do that, right? If I didn’t preemptively do that.
And the problem we run into here is that this trick doesn’t work here. SQL Server still chooses this godawful plan where, again, we have a parallel merge join, which is just grotesque beyond compare.
And SQL Server has once again decided to sort 53 million rows to get us, to allow for this, right? And that is, this is just no good. All of this, bad.
If I were a better artist, I would draw like a thumbs down thing here. But it’s terribly difficult to draw with Zoomit. And, you know, I think this is bordering on obscene. So I’m going to stop that.
But anyway, so this is one of those kind of funny things that happens is we’ve actually costed ourselves out of a better execution plan. So let’s run these two back to back, right? We’re going to have our eight-second monstrosity.
And then we’re going to have a slightly faster plan below it. And the reason we’re going to get a slightly better plan below it is because we have told SQL Server, please, no more any kind of join other than hash join.
And when you force SQL Server to use the hash join plan, it is also far more likely to engage batch mode because batch mode and hash joins, well, they go together like Forrest and Jenny, right? They make quirky little babies.
And we’re going to, okay, so now let’s look at the query plans. All right. So we have our same monstrosity up here. Il monstro, we hate you.
You are awful. If I saw you in real life, I would tell whoever was running SQL Server to get a refund. And down here, we have a slightly better version of the query plan where we have these two operators, which I’ll show you in a minute.
But these two operators, we got our hash join and we have a sort here. But the important thing here is that we didn’t get a parallel merge join. And the reason we were getting parallel merge joins before is because costing is stupid and broken.
And if we look at the estimated subtree cost of this query at $3,945 query bucks, and we look at the estimated cost of this query at $4,512.88 query bucks, we can at least understand why SQL Server chose this awful monstrosity of a plan.
It thought that the cost would be much lower. But it wasn’t. Well, I mean, it was.
I mean, the estimated cost was much lower. But it was still wrong. So, like, one thing that I’ve said in a billion videos at this point is, you know, you can have queries with a very high estimated cost that are fast, and you can have queries with a very low estimated cost that are slow.
So, cost really has nothing to do with how fast or slow a query is. Right? Cost is just a bunch of estimates that the optimizer uses to assemble a query plan and hope it’s the best one.
The problem is that all those estimated costs are a lot like lottery numbers. If you knew what the actual numbers were going to be ahead of time, the results would have been much better. Like, I think that if SQL Server understood what it would take to run this query, the cost would not have been so prohibitive that it chooses this awful monstrosity up here.
So, that’s that. Now, the thing that I want to show you in this bottom query plan, actually, is that we have a sort that runs in batch mode.
And, like I said, we have the hash join that runs in batch mode. But that’s it. Right? This clustered index scan on the post table, I believe. Yeah, post.
And this clustered index scan on the go away SQL prompt on the votes table, these both run in row mode, which isn’t how it has to be. Now, if you’re…
Okay, so here’s, like, the list of, like, things that have to align perfectly for you to get batch mode on rowstore. You have to be on SQL Server 2019 and up.
You have to be in compat level 2019 or better. And you have to be on Enterprise Edition or an equivalent SKU, which could be, you know, developer edition, which I wouldn’t blame you if you wanted to use that.
Because that’s how Microsoft screws you into buying Enterprise Edition in production. When, like, you’re developing stuff on developer edition and you’re like, wow, all this stuff works great and is fast.
And look at this nice thing that SQL Server does. And then you, like, release it on standard edition and everything kind of falls apart. And you’re like, wait a minute, what happened? That’s what happened.
So, you know, I sort of sympathize with people who are like, oh, well, we’re just going to use developer edition. You know? Who am I to argue?
Screw job like that? Get your revenge. So, anyway. So, like, it doesn’t have to be that bad. So, what you can do on newer versions of SQL Server is, let’s say you, like, that turned out pretty all right, right?
Where you have 3.6 seconds, like, down from 8 point, I mean, at worst, this thing was like 9 point something seconds. So, we got it down to 3.6 seconds just by getting a couple things in batch mode and using a hash join instead of a merge join and not sorting 53 million rows and spilling and, God almighty, what’s wrong with you?
But the thing that’s interesting to me is that batch mode tricks do not get you the full batch mode on rowstore experience that you could get, right?
So, if we, what you can do is you can use these compatibility level query hints to get you higher compat levels than what your database is set to, right? So, this is, like, a good way to sort of override some things and be like, hey, I got a better idea.
So, let’s run this query now. And, unfortunately, we still need the hash join hint there because if we don’t put that there, SQL Server will still cost itself into that crap hound of a merge join plan.
So, batch mode on rowstore has problems because it is all based on heuristics. And those heuristics might kick in, they might not. And even with me doing my batch mode trick there in the higher compat level, I still don’t get batch mode where I should.
I still get costed into that awful merge join plan. So, now if I look at this execution plan, this one’s a full second faster. And where it gets faster is on this clustered index scan on the votes table.
For some reason on the post table, SQL Server, oh, come on, give me the tooltip. I don’t need a missing index in. So, for some reason, SQL Server on the post table is like, no, we’re going to scan this in row mode still. That’s whatever.
And down here, come on, go away. Zoom it, catch up. Do try to keep up, zoom it. This actually does get scanned in batch mode. So, this ends up faster. But there’s something interesting going on in this query plan.
And then, so if I go and look at the properties here, this is something that we have looked at before, you and I, all of us lovely folkses together.
And this is something kind of interesting, right? Another, again, something interesting along the way. Hmm. Neat stuff.
So, we have eight threads that cooperatively scanned the post table, right? All these numbers, I promise you, will add up to this number. But only one thread, oops, I hit the wrong button there. Only one thread produced any rows.
Produced 47,575 rows. So, only thread two. This lucky devil right here did any real work. All right? Well, it produced any real work. All right?
So, you know, this thing goes into the hash join. And then, I think down here, it’s sorted out pretty well. I kind of forget, though. Oh, so only, yeah. So, down here, two threads end up doing some work. So, this is, again, some sort of a bit of lopsided parallelism.
Unfortunately, batch mode stuff does not fix this sort of parallel skew issues that can come up. Now, one thing that you can do to get around that is you can use a temporary object, right? Because sometimes, if you can’t, you can’t always rewrite a query to get it to do exactly what you want and fix every single problem.
So, sometimes using a temporary object is the right choice. So, let’s dump our post table stuff into a temp table. And notice, this is a whole heck of a lot faster now, right?
Remember, there was like 1.7-something seconds. Now, that’s only 500 milliseconds. So, we save almost a second and a half just by doing this. And what’s nice about this now is that rather than just ending up with all our rows on one single thread, we actually spread things out quite a bit.
So, we don’t have that same parallel skew problem just dumping that into a temp table. So, we’re doing well here, right? I think we’re doing better, right?
We’ve made this a little bit faster. But now, watch this, right? This is, you’re going to get a kick out of this. I know you’re going to get a kick out of this because you watch my videos. And if I get a kick out of something, you’ll probably get a kick out of it too.
Now, watch what happens here. Now, we get a batchy, modey plan, right? I think, yeah, batch mode, batch mode. Okay, so that turned out pretty good.
You know, our left join to the little temp table over there did us some good. And now, we’re working up this temp table. But watch what happens down here now, right? This one’s okay, right?
This one’s kind of okay. This number sticks out a little bit further than the rest. But now, look at, watch, ready for this? Ready for the filter? Actually, let me select this so this stays.
You ready for this filter operator? Bam! We just shifted the parallel skew problem over here. All right, that’s interesting.
Okay. So this thing, all in all, this takes 1.8 seconds. 1.889 seconds in full right there. All right.
All right. So now, batch mode, you know, for all the great good that it does to queries and SQL Server, sometimes it is, sometimes it doesn’t do as much as it should when you’re only using the batch mode tricks. Right?
Right? And what, the reason, I mean, I don’t, is there a reason why? I suppose if there’s a reason why, it’s because, you know, we are using the older compat level where batch mode on rowstore isn’t available. Right?
So if we were in compat level 160 or better, like, we would get, like, a better batch mode experience, but only if batch mode on rowstore, like, chooses us. Like, just using a batch mode trick doesn’t necessarily guarantee you batch mode on rowstore. Now, let’s race the batch mode version of this query versus the row mode version of this query.
Right? Because, like, the only two operators in here that get batch mode are these two. You know, this, all this, all the rest of this stuff still happens in row mode.
And, of course, the parallel exchange operators, they don’t, they don’t have any batch mode support. So their row mode is hell anyway. Now, look at down here, though.
That’s 1.9 seconds now. This one is just one second. Our fully row mode hash join plan is only one second. Pretty wild, isn’t it?
Now, it’s not because one of them gets a bitmap and the other one doesn’t. This filter operator is an optimized bitmap. There we are. Optimus bitmap-ius. And so we have a bitmap down here.
Remember, in these query plans, bitmaps, they get made up here, but then they get used down here. Right? So they go this way and get used to filter out rows here, usually. Sometimes they’ll get stuck at the partition stream.
Sometimes they’ll get stuck somewhere else. But most of the time, when you have a nice, effective bitmap, they’ll get pushed down from here to there. Thereabouts. You know, you can figure it out when you look at your query plan.
But this is a lot faster. And we don’t have the sort of same weird skew issue. At least I don’t remember having the same weird skew issue on this one. This one spreads all out.
So this one, the parallelism works better in the row mode version of this plan. That might not always be the case, but it’s certainly the case here. Now, if we run this query and we ask for this to happen in compat level 160, we should get a fully, a more batch mode-y version of this plan, where now we get actual batch mode on rowstore. So now we’re actually able to read once again from the votes table in batch mode.
And this gets us down to about under a second. So if we were to recap this a little bit, the query with just batch mode is about 1.8 to 2 seconds. The rowstore version of that query is 1 second.
But the fully batch mode-ed on row stored version of that query is about 800 milliseconds. So that turns out a lot better. Now, one of the reasons why Enterprise Edition is so important for batch mode workloads is because Microsoft, once again, I wouldn’t blame you if you use Developer Edition, hobbles the hell out of you in Standard Edition, where batch mode stuff is limited to a DOP of 2 in a parallel plan.
So a lot of times when I’m tuning stuff on Standard Edition, if you have a pretty big aggregation query that’s running at DOP 8, and you’re like, oh, we can try batch mode, you know, you’ll be pretty lucky if you end up with it running for like an equivalent amount of time because you lose like six DOPs if you’re running at DOP 8. So there’s a pretty big chance of a downgrade.
If you’re only running at like DOP 4 and you get downgraded at DOP 2, maybe less. But if you’re running at DOP 8 or higher, the DOP downgrade on Standard Edition with batch mode can be pretty painful. That’s why when like tuning Standard Edition stuff, it has to do big aggregations.
I’m far more likely to lean towards like indexed views and stuff like that because you don’t have those DOP limitations on indexed views. Granted, you have an indexed view, so you have a different set of problems. But, you know, an indexed view that has good indexes to help it, you know, get maintained is, you know, in the words of my wine distributor in New Zealand, it is no more overhead than another nonclustered index.
So, you know, hopefully that works out. Now, there are all sorts of things. I mean, say there are all sorts of things. There are like literally two things you can do to trick SQL Server into using batch mode.
One of them is what we did up above where we had a temp table with a clustered columnstore index on it. And we did a left join on 1 equals 0 to that temp table. And SQL Server smelled a columnstore index.
And it was like, ah, maybe batch mode. Another thing you can do if you’re not free to rewrite queries in that way is to create an index like this, non-clustered columnstore index that’s filtered.
And if you filter it to something that cannot possibly exist, then this index will not actually contain any data, but it will exist on the table as a columnstore object and give SQL Server some, you know, some impetus to use batchy, modey stuff.
You know, and if you’re, you know, tuning sort of regular things, you know, supporting indexes, you know, they’re good things. They’re there for a reason.
You know, they can help you with your joins and they can help you with your filtering and other stuff like that that are useful. And sometimes creating indexes might be, like, you know, the final answer, but the batch mode stuff is usually pretty useful as well.
So anyway, just a few notes on, you know, using batch mode, testing different stuff out when you’re trying to tune kind of bigger, crazier queries and query plans. Granted, this one wasn’t big and crazy, but that parallel merge join plan, man, that just set me off.
And I had to do the rest of this stuff because I feel like if I don’t talk about how awful parallel merge joins are, no one will. They’re truly despicable creatures.
I wish that they would be removed from query optimization, just to be honest with you there. So anyway, a little bit of a hodgepodge there, but again, stuff that interests me. Hopefully it interests you.
If it doesn’t, you don’t need to watch this. You can go do something else with 20 minutes of your life. It’s okay. Either way, my feelings won’t be hurt. So anyway, I hope you enjoyed yourselves.
I hope you did find at least some parts of this interesting. I hope you learned something, maybe a few things, because this was, again, a potpourri of knowledge. Yeah, what was I going to say?
Anyway, if you like this video, I like thumbs ups are good, as well as, you know, comments that don’t make me cry. At least, I mean, tears of joy are fine, but like tears of sadness. We don’t want sorrow, Eric, on these videos.
If you like this sort of SQL Server content, you can hit the subscribe button. It’s apparently a bell-shaped object. And you can join nearly 3,762 other data darlings.
Never going to be able to say that with a straight face. Who have subscribed to this channel and apparently find me interesting enough to keep getting notified when I’ve talked about SQL Server. So anyway, thank you for watching.
I do apologize for the audio issue on Friday’s video. There was no fixing it. Like, once it’s done, it’s done. So I just had to re-record the whole thing. I had lots of funny jokes about Justin Timberlake getting arrested in a tuxedo.
And you should have been there. It was great. Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
In this video, I delve into the world of dynamic SQL security and how to keep your data safe from potential threats like SQL injection. I walk through a practical example where we create a table called `DropMe`, insert a row, and then execute a query that inadvertently drops the table due to SQL injection. This serves as a stark reminder of why it’s crucial to parameterize dynamic SQL properly. I also discuss how to safely handle cases where you need to pass in column names or other schema-related elements using SQL Server’s system views like `sys.columns`, `sys.tables`, and `sys.schemas`. By leveraging these views, we can ensure that only valid objects are used within our dynamic SQL queries, significantly reducing the risk of accidental or malicious commands being executed. Additionally, I touch on some practical tips for maintaining security, such as using string functions like `STRING_SPLIT` (available in SQL Server 2016 and later) to safely parse input parameters and logging invalid inputs for further review if needed. Overall, this video provides a comprehensive guide on how to secure your dynamic SQL queries against potential threats while ensuring they remain performant and efficient.
Full Transcript
Erik Darling here with Darling Data, and my microphone is on, but the sound bar just looks funny. Okay, well, we’ll deal with that, hopefully. Anyway, today was my oldest daughter’s fifth grade graduation, which I know sounds a little weird. A lot of people just have like a kindergarten and then an eighth grade, but, you know, in New York they shuffle you from a K-5 to a 6 to 8 school, so fifth grade is apparently a big deal celebration here. And, of course, you know, being here in America, they sang, they do the Pledge of Allegiance and they sang the National Anthem with the Star Spangled Banner and it kind of, kind of struck me sitting there in the audience that, um, there are a few words in the Star Spangled Banner that, um, people, they, like, aren’t common words anymore and just people might not know the meaning of. Like, uh, Gallantly or Rampart or even Spangled might send some people running for the dictionary. Who knows? Kind of, I don’t know, might need, might need some, uh, might need to throw some TikTok lingo in there if we want the Star Spangled Banner to remain a true top 40 hit. So, anyway, uh, in this video, uh, from Erik Darling of Darling Data, we’re going to talk about how you can keep your Darling Data, uh, safer, uh, in the, in the face of Dynamic SQL. And I’m going to show you a few things that I, I will sometimes do when I’m, when I’m working with Dynamic SQL to make sure that no, there’s no, there’s no, there’s no hijinks, there’s no shenanigans, there’s no tomfoolery going on and, and when, when, when queries get executed. So, uh, just, just the first thing we’re going to do is a small primer on, uh, the, the perils, the potential perils of the data.
So, uh, we’re going to create a table called DropMe. We’re going to insert a row into DropMe. And what we’re going to do is just, just run a very simple query where someone has SQL injected a drop table command onto a, onto a, onto a per, onto an input. And we’re going to execute that. And we’re going to see that, uh, at the very end of this, we no longer have our table called DropMe. In fact, we have, we have an error message saying invalid object name DropMe. And we, the reason we have that, that error message is because this is what our, our executed SQL string looked like. We had a totally valid select query, which executed, and I’ll show you that in a second.
And then after that ran, we had a drop table command. Also run. That’s, that’s unfortunate. If we go over to the results though, uh, we will see, well, there’s the result of our first select from DropMe up there. And then here’s the, here’s the result of our, oh, that jumped around a little bit. So you Dickens, zoom it. There’s the result of our first query. And there’s the result of our database name search. And you can see all the stack overflows that I have, uh, on my server, right? Obviously not good.
And obviously the solution is to parameterize the dynamic SQL. And so if we, we sort of rerun that portion of the script again, when we rewrite our dynamic SQL in a safe way, we will get no error messages, but more importantly, we don’t get any search results, which is, which is a good thing, right? Why is it a good thing? Because we didn’t execute an unsafe SQL command, right? I mean, I don’t think it’s a very, very, uh, prodigious way of working through a SQL Server.
If you, you’re okay with user searches running, uh, and returning results and also executing out of bounds commands. So if we look over in the messages tab now, there’s a printout of what we put into our, our database name, right? It’s just S percent wildcard, you know, any old thing that blows after that, uh, and a drop table command.
But in here, this, this looks different now, doesn’t it? We just have the parameter name. We don’t have that string sitting in there. And that’s great for things that you can actually parameterize.
There are lots of things in SQL Server that you can’t parameterize. At least not, like, in a meaningful way. Uh, that, uh, you might need to pass into dynamic SQL.
Like, you might, you might need to pass in column names or a schema name or a table name or even a database name. And none of those things can really be supplied as parameters to dynamic SQL. It’s not allowed.
So, here’s, here’s some stuff that I do to make sure that when, for cases where that, that does have to happen, uh, we, we stay safe. So, uh, SQL Server has all sorts of views in it that enumerate different objects. Like, we can enumerate databases from sys.databases and half a dozen other views.
Uh, we can enumerate, uh, schema from sys.schemas. We can enumerate tables from sys.tables. And we can enumerate columns from sys.columns.
So, when, if I’m going to allow someone to pass in a list of columns or a list, like, a schema and a table or a list of columns, that they want to return from a user query, I need to make sure that those are all safe as well. So, what I, what I’ll do in those cases is, uh, declare some placeholders in here.
And then I’ll do stuff like lookup the schema name and table name that got passed in. And since this, this isn’t dynamic SQL, we don’t have to worry about any filthy, dirty inputs in here. And then I’ll set the, uh, the safe schema and table names, uh, with quote name, because quote name is very important.
Um, uh, I, I was working with a database that had, database name that had a space and a dot in it earlier this week. And that was a real adventure in a number of ways. Um, um, found myself using a lot of parentheses for that one.
Uh, and this will make that safe. And then what, what’s nice to do is if you’re going to, if you’re going to use something like this over and over again, uh, rather than constructing this constantly, uh, just, you just have like one, like, thing that you concatenate the schema name and the dot.
And, you know, one of my big gripes about dynamic SQL is that, or rather dynamic SQL and SQL Server is that like, there’s no, there’s not like enough like typing for it. I mean, not like there’s a lot of like keyboard typing, but there’s no like, like types built into SQL Server to handle stuff like this.
So like, it would be cool if there was, you know, like, like an object type or something where if you passed in, like, you know, like a, like a table or a schema or a database or whatever else, that you would get back like a constructed object with, with the right, the dots in the right places. That’d be cool. It’d be nice if you could just, if you could, if you could, there was some constructor for that.
Like quote name, you, you, you pass something into that and it puts quotes around. It’d be nice if there was something to put dots in the right places too, so that you didn’t have to do foolish things like this and realize that you forgot a dot somewhere. You left a dot in and you didn’t mean to leave a dot and it becomes quite a hassle.
There’s a reason why dynamic SQL is, is a real art, is, is practice as a, is a true art form by few. A lot of hacks out there, a lot of hackers out there, but not a lot of true dynamic SQL artisans. And then I’m going to, because, you know, I’m, I’m, I’m living in the future here on SQL Server 2022, but you know, this, we have this string split function, which came around in SQL Server 2016.
And we have this string ag function, which came around in SQL Server 2017. The problem is, and this is where Microsoft really screws up again, is that you have to be in the database compatibility level that the function was released in, in order to use it. So to be, to use string split, you have to be in compat level 130 or better.
To use string ag, you have to be in 140 or better. And it doesn’t help. Now, now Microsoft was cool from like a query optimization standpoint, where it was like, oh, we’re going to give you all these use hints to like specify which like, like compatibility level you want the database, this query to be optimized in, in this database. Which allows for like, you know, some neat stuff to happen.
Like in 2016, you can like do a parallel, like insert select into a temp table or any other table, but temp table is particularly helpful for. And like, you know, like 140, 150, 150 and up, you can get all sorts of like batch mode on rowstore type stuff and other neat things, right? Like, you know, batch mode or memory grant feedback and other intelligent query processing features, which are lovely.
But, um, not, not these for some reason. For some reason, changing the compatibility level of the query doesn’t unleash these functions. So if you are using a really old version of SQL Server or a really old compatibility level, you are going to have to go the old school route of, rather than using string ag, you’re going to have to use some XML hijinks to construct whatever list, whatever comma separated list of things you want.
And if you are not using, uh, the, the 2016 or better compatibility level, you will have to work out your own string splitting scheme. Now, normally I would be pretty picky about this, right? Because this is not a good SARGA-able thing.
I see a lot of queries with string split in them or like some string splitter in them. It doesn’t have, not necessarily the built-in one. Uh, whether it’s like a, it’s a, it’s a meaningful query, not just hitting some dynamic management views. And, you know, like it’s an aware clause or a join or something else.
And, uh, that’s, it’s not a good idea for performance because SQL Server has no idea what’s in there. And it makes no attempt at, like, um, like, like, like creating a statistics histogram for the results of a string split. So you can get some really bad cardinality estimates when you start getting that stuff involved in a where clause.
But since this is just a very simple query against the dynamic management view, um, I’m not gonna get picky here. Right? This is, like, I can deal with this.
Right? We’re not performance tuning in this one. We are, we are trying to keep our query safe from the hackers. We’re concerned about the security of our stuff. Right?
Not performance tuning in this one. And then, um, if any of these come back as null, someone clearly did something either devious or mischievous or maybe just wrong. Uh, there’s a quote about, like, malice and ignorance and all that.
I suppose that could, I suppose that could, that could apply here. I’m not, I’m not sure how I feel about it. But, you know, just some, some general handling.
And if you wanted to take this further, uh, you know, especially for, like, the column list, you could even, like, you know, have a table variable or some other logging feature. Some other logging thing happened where you, like, logged a list of things that someone passed in that were incorrect. Right?
If you really cared about it, you could totally do that. I don’t care enough to do that here. It’s just gonna make the code longer. But you could absolutely, like, you know, stick a, like, a list of stuff from the column names parameter, uh, that were invalid column names. Right?
It’s a totally doable thing. And then down here, so this is what I kind of meant by, like, some things can’t be passed in as parameters. Like, you couldn’t directly pass this to dynamic SQL as a parameter. SQL Server would say, I don’t know what you’re talking about.
Likewise, you couldn’t pass this to the, you can’t pass, like, you know, database schema object names, table names, stuff like that, without SQL Server saying, no. Can’t do it, not happen. And what this allows for is for if someone wants to select the top 100 rows from the user’s table, they can do that.
And the query will look like this. And one thing that is very nice for you to do in all your dynamic SQL is to put a comment with the store procedure name in there so people know where the hell these things come from. It’s not just some buck-naked query running around in your plan cache or query store that no one can, no one can identify.
But what’s cool about this is then if we were to run our store procedure with something devious, mischievous, and malicious in it like that, it would not show up in there. Right? Our messages tab would still just show the three columns that were actually valid for the user’s table and the comment list.
So, good tips on keeping your dynamic SQL extra safe. Right? And if we look at our drop-me table, it still exists despite someone’s best efforts to drop our poor drop-me table.
Right? Now, this is just one kind of example with object names, like, you know, schema, table, database, column name, stuff like that. You might have other instances where, you know, and I grant you, I grant you every grudge you have against this store procedure for not needing, not actually needing dynamic SQL.
But you might have a more involved case, and without really, like, doing something somewhat complicated, this is kind of just a good general example of, you know, an alternate thing you could do that might make life better, might actually perform better, might cut some crap out of your query plan, stuff like that. Where, let’s say we have a store procedure where we want to search display names in the user’s table of the Stack Overflow Data. You probably won’t be working on this particular store procedure, but you might, you know, let people search like this.
And you might have dynamic SQL. Well, again, an unsanitary dynamic SQL that looks a little something like this, where, you know, you’re just passing in some bare-naked parameters to a thing like this, which is, you know, obviously this is a horrible idea. You shouldn’t, you shouldn’t trust anyone this much.
Wouldn’t, wouldn’t, I wouldn’t trust us, I wouldn’t trust anybody with something like this. But, one thing that you might find useful to do is, rather than have all this stuff in your dynamic SQL, you might find it performs better, and it keeps you safer if you do something like this, where you just, you create a temp table, right, and then you enumerate just the IDs of the things you care about. You stick that into a temp table, and again, like, this doesn’t matter much, right?
If you want to throw a recompile hint on here, whatever you want to do, you can do it to tune up this query, but, you know, for the context of what we’re doing, this is pretty quick and fast. And just dump the contents into a temp table, right? Nice little primary key on there, too, right?
Look at our primary key. Look at our nice clustered primary key on this table. Isn’t it gorgeous? Most beautiful thing you’ve ever seen? I think so.
And then we can use that temp table within the dynamic SQL block, because when you, you know, it’s state, like, we still have that temp table created in this context, and we can execute that dynamic SQL referencing that temp table, which is created outside. Now, there’s funny things about creating temp tables inside of dynamic SQL, in that, like, the first one is that this is not reciprocal. So if you create a temp table inside of dynamic SQL, you can’t reference it outside of the dynamic SQL.
But if you create a temp table outside of the dynamic SQL, you can reference it inside of dynamic SQL, which is pretty cool. I like it. I’m into it.
I dig it. I can get on board with that. The other funny thing about creating temp tables inside of dynamic SQL is that it makes them not cacheable, which might be good or bad. Paul White has three magnificent, wonderful blog posts about temp tables and temp table caching in store procedures, stored procedures.
I end up sending people to those quite a bit, because they are, as Paul is wont to do, he has written wonderful, detailed, comprehensive blog posts about these topics. So, yeah, so, like, you might have an issue where, you know, you create a temp table and SQL Server caches that temp table and part of what it caches is the histogram. And then the next time you go to use that temp table or you go to run that store procedure, SQL Server reuses the histogram for that temp table, which is a wild thing.
Absolutely wild thing. But if you create a temp table inside of dynamic SQL like this, it is not cacheable, and the histogram or nothing else will be cached along with it. Now, that, of course, has upsides and downsides, right?
If you’re running a store procedure a ton, that temp table caching might be a good thing. You might find a lot less contention in temp DB if you, you know, are reusing cached temp tables. You might, you know, you might even find that performance isn’t that big a deal because you meet the thresholds to, like, uncache or whatever you want to call it and update the statistics on that temp table.
But, you know, in general, this is kind of like a more stranger angle on performance tuning an issue with temp table caching. But fun nonetheless, because every, sometimes you get to pull these weird tricks out and, boy, they, they work, work like gangbusters. I don’t, I don’t really like that phrase that much.
I don’t know. It’s kind of, again, it’s kind of old-timey. It’s like a, it’s like a gallant rampart. If my gangbusters on this gallant rampart are all spangled.
Anyway, that’s about all I had for this one. I think some useful tips on making dynamic SQL safer in your database. Again, you know, if you’re letting people pass in database, schema, table, column names, you’re much safer if you look those up in dynamic management views based on what people pass in and either give them the yes, no based on that.
If you, you know, if you’re using other stuff, right, if you’re not using object names and you’re still writing dynamic SQL like this, you are doing yourselves a bad, you are doing your company a bad, you are doing your data a bad. And you should really start writing parameterized dynamic SQL instead. You can follow along with the examples up above if you need to.
And if you, if you really want to start performance tuning stuff in dynamic SQL, you might even find that simplifying the amount of work done in the dynamic SQL and taking, you know, larger, more complex parts out and dumping just like simple lists of keys into a temp table and using that temp table in the dynamic SQL instead gets you far better performance. So anyway, thank you for watching. Hope you enjoyed yourselves.
Hope you learned something. If you like this video, thumbs, thumbs upsies and, you know, helpful praising comments are always welcome. If you think I messed something up or I’m wrong about something, you can, you can tell me that too.
But I didn’t mess anything up and I’m not wrong about anything. So you might, you might, you might, you might be unsatisfied with the response there. If you like this sort of SQL Server content, usually it’s performance tuning, but this is sort of like a hot dynamic SQL tips because here at Darling Data, we believe in hot SQL action.
So, you know, I got all sorts of useful stuff up my sleeve. Don’t, don’t, don’t, don’t make a judgment on account of my sleeves being short. I can, can stuff a lot of useful stuff in here.
But if you like this sort of SQL Server content, you can join nearly 3,700 and, hold on, let’s wait for it, 29 other dedicated, observant, faithful SQL Server or whatever you are, professionals, by subscribing. And I like subscribers because then I get to keep saying bigger numbers every time I do one of these. So that’s nice, that’s nice for me.
It’s a good, good pat on the back for all E-Darling of Darling Data. Anyway, it’s starting to, starting to get hot in here and I, I want to open the door and let the air conditioning in. So I’m going to, going to can this one, upload it and, and hope and pray that you watch it and that you, you follow my instructions here and that you, you don’t end up in the newspapers for, for bad reasons.
So anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
In this video, I dive into a scenario where my client upgraded from SQL Server 2014 to 2022, taking advantage of the new adaptive join features. I explain how these intelligent query processing capabilities were previously hindered due to insufficiently wide indexes and the additional overhead of potential lookups or sorts. By creating more comprehensive indexes that cover all necessary columns, we successfully enabled adaptive joins for certain queries, demonstrating their importance in optimizing performance. This video is packed with practical examples and insights into what’s required for SQL Server 2022 to leverage these advanced features effectively.
Full Transcript
Erik Darling here with Darling Data. Today’s video, we’re going to talk about a situation where a client of mine, you know, the nice people who pay me, you could be one of those too if you paid me, they had switched, upgraded to SQL Server, well actually to 2022. So they went from like 2014 to 2022. So it was a pretty big jump for them. And part of the reason why they wanted to go there was because they wanted all of these crazy, fancy, new, intelligent query processing features. One of them was adaptive joins. And the reason for that is because there were a number of queries that we found where adding a join hint, like either hash join or loop join, or sometimes a loop join, or hash join, or just to exclude merge joins from the picture, or just to exclude merge joins from the picture, because screw a merge join, were like helping query performance. And one of the problems that we found once we got to SQL Server 2022 and up the compatibility level, this is Enterprise Edition, of course, because Microsoft only gives the good stuff to people who spend money. I give the good stuff to everyone for free, even though even though I was recently accused of having useless drivel in my videos. The problem is that the nice folks at Beer Gut Magazine, they pay me by the minute to do these, so I do try to stretch them out a little bit. Maybe there is some useless drivel in here, I don’t know.
If you want some real useless drivel, I’ll tell you that this spot on my shirt is from my morning protein shake, because I can’t figure out how to pop the lid, shake up the thing, and then you get protein stuff on the flappy lid, and then drink from the flappy lid, or drink from the shaker, but the flappy lid drips on me. So this is my fitness credibility badge right here, this little protein shake spot on my shirt. Alright, so that’s all the useless drivel in this video. The useless drivel about a protein drivel, right there.
So one of the things that we found was that SQL Server was not choosing adaptive join plans, and one of the reasons why it was not choosing adaptive join plans is because we did not have sufficiently wide indexes. So, to give you a little background on adaptive joins, they were introduced in SQL Server 2017, and they require some sort of batchy mode-y thing to happen, either because you have a columnstore index on your table, or you’re using the fancy pants enterprise edition of SQL Server, and you’re getting batch mode on rowstore. Or maybe you create a temporary table, or just like a helper table in your database that has no rows in it, but has a clustered columnstore index on it, and you can do a fake left join on 1 equals 0 to that thing to get some of the batchy mode-y intelligent query processing features to kick in for you.
So, that wasn’t happening. An extra sort of level of costing that makes the join ineligible for an adaptive join. SQL Server doesn’t want to choose between a hash join and a nested loops join, where on the inner side of the join, you might have to do a lookup to get rows out.
At least, I’ve never seen it choose an adaptive join where there was a lookup involved. So, you know, maybe you can find an example of it and prove me entirely wrong, but, you know, kind of doubt it. Anyway, let’s look at a couple examples of this, right?
All batch mode adaptive joins start out as hash joins. And there’s this, like, threshold for the join, where if it passes that threshold, it will remain a hash join. But if it doesn’t pass that row threshold on the outer part of the join, then it will switch to nested loops.
All right? So, we have, I think I already created this index. Let’s make sure.
I’m not a fool. And let’s, actually, you know what? We’re going to do a little prep work because there’s another index down here. I’m just going to make sure that that index, the second index definitely isn’t there, but the first index definitely is there.
And that will make the demo go a lot smoother. All right? That will at least prolong my SQL Server career.
That will keep me out of the fitness industry for, like, another week or so, I think. All right? So, we’ve got this index.
Now, this is a simple example with just a single key column index. In real life, I know your indexes are probably a bit wider, and your queries are probably even wider than that. So, this is kind of a common thing that can happen.
So, what we got here is a query that does a couple left joins, and we’re going to force optimizer compatibility level 160 right here so that we are totally and completely eligible for batch mode on rowstore. And if we run this query with query plans turned on, it’s not terribly slow, but the point of this demo isn’t to show you a slow query and then a dramatic improvement with adaptive joins.
The point of this demo is to show you what your sort of requirements are for adaptive joins kicking in. All right? So, in this query plan, we get a nested loops join, which was not inappropriate, but this plan also features a key lookup.
And so, the adaptive join is not a thing here. All right? So, we do get a batch mode hash join later on, but SQL Server does not make an adaptive join choice here.
Okay? And that’s all because SQL Server doesn’t want to have to make that choice with the additional overhead of a potential lookup involved. So, let’s create an index that covers everything in the query.
Remember, we do a lookup down here, and what we’re doing in the lookup is we have a predicate on the score column, and we output the score column. So, if we have the score column in the index, we’re going to create a brand new one here, right, that has owner, user ID, and score on it. We’re going to make a brilliant indexing choice, maybe, and we rerun this query.
Now, we get an adaptive join between the users and the post table. We still don’t get one for the comments table. Now, you can have more than one adaptive join in a plan.
SQL Server just doesn’t go for it here. SQL Server doesn’t think it’s worthwhile to invest in an adaptive join here. You can totally have multiple adaptive joins in a single execution plan. But now we get the adaptive join here between users and posts that we didn’t get before because the post table did not have an index that adequately covered all the columns that we needed for it.
Now, I did tell you earlier that merge joins were not part of this consideration. And the reason why merge joins aren’t part of the consideration is because merge joins expect sorted input. And when we need to sort input, like, say, for this query, or rather for this join, SQL Server doesn’t want to think about, oh, we needed to sort that.
I don’t want to, like, in the same way that, like, having to do a lookup is an additional costing, like, perspective for the query, having to sort data to employ a merge join is also an additional costing thing. So merge joins are not part of the picture.
And, you know, of course, because I hinted, even though we don’t need to sort for this part between users and posts, I just have a merge join hint for the whole thing, so it shows a merge join hint here. I just wanted to show you the merge join plan to show you that, you know, if you need to, like, this is why merge joins aren’t a consideration for the adaptive join.
Now, what’s interesting is that neither merge joins nor nested loops joins can operate in row mode. I’m sorry, in batch mode. They can only operate in row mode.
So only, like, if you get a hash join for an adaptive join, that will be in batch mode. But if you get an adaptive join with nested loops, that’ll be in row mode, right? Okay, so that’s all good there.
And I do want to show you one kind of funny thing about this query. Let’s see if I can get a somewhat better execution plan for it. It looks like last night it was a little bit better.
But this query plan features some very odd operator timings. Very odd indeed. And if we look at what happens in here, right? Where’s the funny part?
Where’s the funny bones? It’s kind of right at the end here. If we look at the very end of this query, we get 565 milliseconds and then 1.1 seconds here. But then if we go and look at the query time stats, the CPU time and the elapsed time agree with the operator before the gather streams.
So that’s amusing to me. Anyway, apparently the operator time code could use some work. I would imagine with the influx of summer interns, now that we’re entering June, July, and August, Microsoft will have some top interns on top of this operator timing code.
Any day now, it’ll be all fixed. Anyway, that’s all I had to say for this one. Thank you for watching.
I hope you enjoyed my useless dribble. I hope you learned something, even from the useless dribble. I hope you enjoyed yourselves. If you like this video, as usual, thumbs up and nice comments. Even if comments supportive of my useless dribble are always appreciated.
And if you like this sort of SQL Server content, you can join the… Hold on. I want a freshly up-to-date… I want a read-committed up-to-date number here.
You can join nearly 3,719 other SQL Server professionals or some sort of… I don’t know. I actually don’t know the makeup of the audience.
I say SQL Server professionals. It could be just derelicts off the street who like to watch my videos. Maybe they’re like, I’m going to get my hands on his Adidas t-shirt someday. I don’t know.
But yeah, you could join nearly 3,700 and… Yeah, it hasn’t changed. 19 people who subscribe to this channel and get helpful little bonks on the head. Every time I publish one of these…
Well, I guess you could interpret them as either… Beautiful Gems of Wisdom. Or… Beautiful Gems of Wisdom Drizzled in Useless Dribble. Drift…
Protein Shake. I don’t know. However you want to call it. Anyway, thank you very much for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Efficiently Finding Aggregate Values In SQL Server: Row Mode vs Batch Mode vs Indexes!
Thanks for watching!
Video Summary
In this video, I delve into an interesting exploration of finding maximum values in SQL Server queries, showcasing multiple approaches and their performance implications. Erik Darling from Darling Data shares his insights on using cross-apply and derived joins versus the more traditional methods like CTEs, emphasizing how a well-designed supporting index can significantly enhance query performance. The video also delves into the nuances of batch mode and its impact on execution plans, providing practical examples that highlight both the benefits and potential pitfalls of different query writing strategies. Whether you’re a seasoned SQL developer or just starting out, this content offers valuable insights into optimizing your queries for better performance.
Full Transcript
Erik Darling here, with Darling Data. No longer dying. I want to say just about a full week with no antihistamines, which is wonderful, because I no longer feel like I’m dying, which is nice, right? Because, you know, you record these videos and there’s no live audience, so you have no idea if you’re dying on, say, on stage or not. It’s almost impossible to tell. Just have to wait for that one person who always comes along and downvotes my videos. There’s always one.
Just have to wait for them to pipe up someday and maybe let me know why they didn’t like the video. Maybe it’s me. Maybe they’re jealous. Maybe they’re in love with me. So there is that. But I have an exciting announcement. I have the first bit of Pass Data Summit swag. And in honor of the fact that Kendra Little and I are co-presenting two days of action-packed SQL Server performance tuning content, I’ve got these temporary tattoos made. They’re a little hard to see because they’re glossy and the lights in here are bright. And if I put them really close to the camera, it doesn’t necessarily make the situation better.
And it would also eventually make my green screen start to freak out. But they say hot SQL action. Got kind of like a cool Led Zeppelin vibe. Because that’s what all the kids are into these days. Cool Led Zeppelin vibes. That’s what I see on TikTok all the time. Kids just jocking that Robert Plant vibe thing. Anyway, in this video, I should probably mention what I’m going to talk about, right? In this video, we’re going to talk about the best way to find, well, all the queries are going to be looking for max values. But, you know, this could be, you could be looking for a man or an average or something else too.
You could do any of those things. And the reason I want to record this is because I often see queries written. Someone figured out one way to do this. There are many ways to do this. Someone figured out one way to do this and put it everywhere. And it’s not necessarily the best way to do it. There may be many different ways to write this query that are faster than the one way you found and keep repeating. So we’re going to talk about different ways to do it. We’re going to talk about both using cross-apply and derive joins. We are not going to be using CTE.
If you’re the sort of reprehensible dork who white knights CTE by saying that they make things more readable, well, you’re not going to see any here. You would find that if you did write a CTE to achieve this, you would get the exact same performance as if you wrote the derived join type query. We’re also going to look at the difference between row mode and batch mode and how indexes make things better.
Usually. Isn’t that the line on indexes? They make things better? That’s what I keep hearing anyway.
So we got that going for us. So anyway, let’s dive right in, finally. And we’re going to look at the same query written in four different ways. We’re going to look at a very simple way.
And you’re going to see these again, both with cross-apply and derived joins, because that can change things a bit. So we have a cross-apply with a max in there, correlated on owner user ID to the ID in the user table. Makes sense, right?
We’re going to try out top one over here. We’re going to try out row number in here. And remember, this is a correlated row number. Because this row number is correlated inside of the cross-apply, we do not need to partition by owner user ID.
Which is a little bit different from this query, because this one has an extra level of nestation on it, where we are selecting from this. And we are generating a whole row number in here. And then we are correlating the row number outside.
The things here. It’s exciting stuff, isn’t it? Very exciting. All sorts of different ways to skin that cat. Which is probably similar to something I…
Many different ways to feed that cat. Many different ways to pet that cat. Something like that.
I’ve never actually skinned a cat. Never actually heard of anyone skinning a cat. Or needing to skin a cat. So, I don’t know who came up with that phrase. It’s a little gross, actually.
Cats are a little gross. So anyway, let’s look at query plans. The first query, just using the simple max, you know, does pretty okay, I think. SQL Server chooses a hash join.
It does… You know, a lot of the times when you use cross-apply, SQL Server will optimize that as nested loops. But SQL Server here said, nah. We got a hash join, and the whole thing finishes in about 700 milliseconds.
Which is alright. It’s alright with me. This query benefits quite greatly from a bitmap. Remember that bitmap gets used up here and applied down here.
And that bitmap does a lot of… That bitmap goes to work. That’s why this cardinality estimate looks crappy. But because SQL Server came up with the cardinality estimates, and then used the bitmap later, and then did the stuff with the bitmap, and then the bitmap did some good work, and we streamed out a lot of rows.
So we did not get as many rows as we thought we would, passing through the bitmap here. Remember that bitmap gets applied as a predicate right there. So bitmap did some good work, and this query ended up being really fast.
Second two queries stink on ice. They stink in winter, because SQL Server did not do what it did up there. SQL Server did not say, I should just use a hash join and throw a missing index request.
SQL Server built into eager index pool twice. These each take around 27, 28 seconds. These are completely uncompetitive and useless, at least without a good index in place.
The third query, somewhere in the middle. A little bit closer to the fast one, but not quite as fast. It’s about four and a half seconds slower.
The reason why it’s four and a half seconds slower is because, you know, this is something that I caught in an, like, probably, I think it’s been in like two or three Adam Mechanic sessions from back in the day, about, you know, when you’re doing anything with computers, particularly sorting, you really want to, it’s rather, it’s preferable to do many little sorts than it is to do one big sort.
And, you know, so we’re kind of in the middle on this one, where, you know, SQL Server did probably the smart thing by choosing a hash join instead of a nested loops join, but it also did this whole thing where it sorted 17 million rows.
Now, you know, part of the problem with doing the one big sort is that you might not get enough memory. You might spill to disk. This one spilled a little bit to disk. It’s not the end of the world here, right?
It still took about a second and a half to do the whole sort, but, you know, it’s a lot slower to do one big sort than lots of little sorts. And that’s what you get when you do nested loops joins.
You get a whole bunch of little sorts. SQL Server grabs a row, puts it through this side, does the sort for just the data for that row, does that, comes back, does the next one, does the little sort.
And that would be a lot faster if we, you know, had a way to seek into this index, right? Even if we still had to sort stuff in here, it would be faster to do a bunch of little sorts than one big sort.
So now let’s look at things where there’s an index involved. All right, so we created this index and this is highly specialized index on owner user ID and score descending, not only sets us up for success with how we are correlating the post table to the user’s table, but also fully supports our windowing function where we need to partition by owner user ID and order by score descending.
So we have done ourselves a huge favor here by creating this level 1000 IQ index. So we have the same four queries.
We have the simple max. We have the top one. We have the cross supply, correlated row number cross supply, and then the uncorrelated row number cross supply. These are our four queries.
Let’s look at how these plans did. With an index in place, this one kicked butt again, right? Went from about 700 milliseconds to a little under 200 milliseconds. A very simple nested loops plan with a little top in there.
Teeny weeny little top. And then we have nearly the same query down here where SQL Server, that’s a teeny weeny little top, right? So what is kind of amusing is that the SQL Server transformed the max query in here and the top query in here to both just be a sort of a top one query, right?
Like this, they shared the same execution plan. And they’re both fast, right? They’re within a few milliseconds of each other. If I ran these a whole bunch of times, this would probably end up just about even, maybe go back and forth a little bit.
So, you know, decently, well competitive. You could write this query either way with a good supporting index. The third query down, the correlated row number, does a little bit worse.
This is, I mean, just about as good as the max query from the first one with a serial plan. But this is another kind of good example of how doing a bunch of little things is faster than doing one big thing, right?
Because this one takes 800 milliseconds. There’s about 500 milliseconds spent in here. You know, of course, you know, this is one of the things that makes me nervous about missing index requests.
SQL Server is like, we can impact by almost 77%. But, I mean, that’s not where we spent our time. The costing in this plan is kind of a joke, right?
87%. And then where we spend all our time is 10%. But a couple zero percents in there. Good. I don’t know if that’s so smart. And then here’s where the doing the one big thing sucks a whole lot more than doing a bunch of small things.
So the difference between these two plans, right? The bunch of small things plan, like we talked about, like I mentioned earlier, is nested loops. The one big thing plan uses a merge join.
And look how much slower this whole branch is doing the merge join than it is doing the nested loops join, right? We scan the whole index. We segment and sequence project to build the row number over the whole thing.
We filter stuff out later. And like, you know, the filter does the same thing in both places, just to different degrees. But then this whole thing takes just about three and a half.
Oh, that’s a very devilish number. 3.666 seconds. I did not do that on purpose. I did not have the power to do that. I wish I did.
Maybe someday I’ll work out how to attach and detach a debugger just fast enough to get query time to whatever devious numbers I want.
Not today. Not today. All right. So this is all those four queries with a good supporting index in place. If you ask me what my preference is, a good supporting index, you can write this query in a couple different ways and still get reasonably good results.
Supporting index will help just about, you know, anything that you do. So you should have those. You should make one.
The next set of queries, these are a little bit different. So we got rid of the supporting index for these. And what we’re going to talk about in these is just how batch mode affects things. Now, for the first query, and we do get batch mode in the first query, right?
We have batch mode on the hash match and we have batch mode on the join. But for the first query, it doesn’t get a whole lot faster. Right? At 602 milliseconds, it saved about 100 milliseconds from the row mode run.
The second two queries where, I don’t know, God has forsaken us yet again and we build a couple more eager index spools like this, these are totally uncompetitive. There’s no batch mode going on in here.
Where batch mode does make a difference is with the uncorrelated row number, right? So if we come down here and look, we got batch mode in a couple different places. We have it on the sort, we have it on the window aggregate, which I don’t, which is brand new for all of these things we’ve talked about and we have it on this hash join.
Now, one piece of SQL Jeopardy that you can absorb and I don’t know, you can choose to do whatever you want with it. You can, you can, you can, you can choose to take it in, chew it a little bit and spit it out or you can, you can digest it and you can, you can have this wonderful belly full of knowledge where under normal circumstances, so batch mode sorts can sort data across DOP threads, but unless they are the child aggregate, child operator of a window aggregate, they cannot, they cannot be read from using DOP threads, or they’ll be read from single threaded, but in this case, we get the window aggregate so we can read from the sort across our DOP 8 threads and that, this thing ends up being fairly efficient.
Not quite as good as the, the, the simple max in the, in the cross supply, but it’s still much better than it did with the, with the index in place, this thing was like five seconds without the index in place, this thing was also not fast, so getting this one down to one second is a pretty good win for batch mode, right?
So batch mode can make a difference with these things when, when you write queries in somewhat strange ways. All right. So now, the only thing we have left to look at now is taking cross supply out of the picture and using derived joins instead.
And we’re going to look at that with the index in place, that ends up just about the same as the other ones without the, without, so we’re going to look at these without the index.
And the two ways we’re going to look at is with and without batch mode. So one thing that’s really important to point out here before we get too far into things is that one of these queries is actually logically incorrect if we just use a derived join, and that’s this one right here.
And if you’ve been writing queries for, you know, 10, 15 minutes, you’ll probably see why. Because with derived joins, because we can only do the, the correlation outside of this, SQL Server is going to give us an incorrect result for this.
We’re just going to get the top one post ordered by score descending in here, and this gives us an incorrect result. If we come over to the results pane, that second query only returns one row, when it should be returning 600 something rows.
So this one, out of the running. We’re not even going to, we’re not even going to spend time on that one because it’s such an idiot. All right. So this one, bleh, you don’t do the right thing, you are gone, you are stricken from the record.
But the derived join actually ends up the same as the cross supply, just about, within, you know, what, 10, 11 milliseconds of each other, so not a big difference here. And then the second two queries with the, both, what you notice something here is that what happened to the uncorrelated row number in the first query, and the first, like, example that I showed you happens to the correlated row number in the second one now.
These both end up with very close execution plans. Right? And again, the problem here is that we are doing the one big scan and sort.
Right? So again, in row mode, this is fairly painful. Right? So we didn’t get the nested loops, we had to do the big sort for both of these, and they both end up with just about the same amount of time.
There’s apparently about a 300 millisecond difference, I don’t know why. Maybe, maybe, maybe I was staring at the computer funny when they ran. But using the derived join, nothing ends up like remarkably better or, I mean, aside from the query that returns wrong results, there’s like the lack of eager index pools, which I guess is a blessing.
You know, there’s, you know, this was the, one of the queries that would have eager index pool, but this one is just, you know, gone. You know, we don’t need you. But the one that didn’t eager index pool ends up the same as the uncorrelated row number one.
So, I don’t, I don’t think that’s necessarily a great arrangement. Right? Because we’re doing the one, again, we’re doing the one big sort. We want to avoid doing the one big sort. And then finally, the same derived join scheme, but this one where, where batch mode is going to be in play, we’re going to get pretty decent results, I think.
Right? Again, the top query with just the simple max does the best at about 600 milliseconds. The second query which returns wrong results, we don’t care about you.
Again, that returns just the one row and just that one row is not the correct result. We need about 600 rows from all these. And then the second two queries that do benefit from batch mode end up just about identical and just about one second and 40, well, 30 to 43 milliseconds, which again, is not enough of a timing difference for me to care much about.
So anyway, some key points and takeaways from this video. One, supporting indexes really do make a huge difference when you’re trying to write these kinds of queries.
Row number is very frequently not the best choice. Row number can be much more competitive when you have batch mode in place.
And if you don’t have a good index in place, you really want to avoid the top one thing and the correlated row number thing because you will most likely end up with a nasty eager index spool on the inner side of your nested loops join and you will suffer tremendously both personally and professionally for the remainder of your days on this planet.
And the other thing is that when you need to do this kind of stuff, batch mode makes a huge difference and the more you can gear your queries towards doing lots of small sorts versus one big sort or even generating lots of small row numbers even if you don’t need to sort, even if you’ve been a whopping genius and you’ve created the right index, generating that row number over a lot of rows is a lot slower than generating a bunch of row numbers over smaller sets of rows.
So if you find yourself dealing with slow queries that look like this, as much as you can, either try to get batch mode involved or try to get a good index and try to get lots of little sorts and lots of little row numbers involved because generating a row number both with a sort and without a sort over, you know, in this case 17 million rows is incredibly painful but, you know, getting batch mode for this made it a whole lot less painful.
Cool. All right. So, with that, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. If you like this video, you can give it a thumbs up, leave a nice comment.
If you’re feeling extra charitable, I’ll accept the one person who always gives me a down vote and never leaves a comment. You’re still welcome here.
Just wish you were, I wish you were a nicer person. That’s all. If you enjoy this sort of SQL Server performance and student content, you can subscribe to my channel and you can join.
Hold on. Let me get the most up-to-date number here. You can join nearly 3,708 other lucky subscribers in getting a bonk on the head every time I publish one of these.
And if you need help with this sort of thing, my rates are reasonable. So, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Why Some Long Running Queries Don’t Have Wait Stats In SQL Server
Thanks for reading!
Video Summary
In this video, I delve into a fascinating aspect of SQL Server query performance that often goes unnoticed—how serial plans with eager index pools can hide the true nature of their execution. I explain why some queries mysteriously run for a long time without showing any significant wait stats or weights in the plan. By comparing a serial and parallel version of the same query, I demonstrate how the parallel plan provides more transparency compared to its single-threaded counterpart, which can be misleading due to the eager index spool operator. This video aims to highlight the importance of scrutinizing execution plans, especially when dealing with large tables, as eager index spools can significantly impact performance without obvious indicators in wait stats or query costs.
Full Transcript
Erik Darling here. Surprise! With Darling Data. Recording a little bit on the early side today because I have a lot of stuff to do later and I don’t want to interrupt my great streak of recording a video every day because, just to be honest with you, the YouTube videos, they get a lot of action on them. They get a lot of views, they get a lot of thumbs going in the right direction, straight up in the air, and they get a fair amount of comments. And, you know, written blog posts covering the same material have not nearly generated the same amount of engagement or interest. So, you know, I think investing more time in the videos and the recording stuff is, you know, kind of, I think the direction of the video is, you know, the direction that I have to go in order to get people to listen. I don’t know, maybe it’s getting to see this handsome mug smiling at you that keeps people in there. Who knows? It’s certainly not the slightly remaining, remainders of bed head that I haven’t quite worked out yet. Actually, it’s more like a headset head because I was wearing my headset earlier and now I have this, like, dent here. And then everything kind of poofed around it. Just in case you were wondering.
Why my head looks a little extra lumpy today. I got headset head. Or headset hair. Anyway, in this video, we’re going to talk about why some queries sometimes mysteriously run for a long time but don’t really tell you why. Now, there are going to be other peripheral reasons why you might see this, but they’ll be represented by weights. So, like, if a query gets blocked, you’ll see that that query spent a long time waiting on an LCK to take a lock and do something, right? If a query spent a long time reading pages from disk, you’ll see, like, page.io latch underscore some two-letter combination weights. Now, there’s a lot of stuff that when a query waits on things, it shows up for a reason. There are a few reasons. We’re going to talk about just one of them today. There are a few reasons why a query might run for a long time but not actually show you any particular weight that it waited on.
Of course, there is the scenario where, you know, you might have a query that runs a long time but the whole time is just burning away on CPU. Now, in a parallel query, you’ll at least see a lot of CX weights that represent that time. In a serial query plan, you know, the SOS scheduler weights just might not be crazy enough to account for all of the time that we spent, you know, doing a little merry-go-round on the CPU and that can be rather confusing.
But in this one, we’re going to look at how serial plans with eager index pools in them kind of hide the work that they do, like where that time in the query is spent in a weird way. So, I’ve got the same query, essentially twice run here, and one of them is limited to max.dop1 and the other one is allowed to go as dop as I allow, which on this server is 8. And if we look at the query plans for these, and I’ve already, I’ve taken the opportunity to run before, before recording the video so that I made sure everything looked right, made sure everything looked right when I did this.
Both of these queries, despite one being serial and one being parallel, run for nearly the same amount of time. Right? That’s not a very big difference for a serial execution plan versus a query that ran with a degree of parallelism of 8. Ocho, as they say in my favorite restaurants.
So, like, where are the differences? Well, they’re slight, but they’re there. And I think an interesting one is over here.
So, the serial scan of the post table took just about 5 seconds even. But, for some reason, a parallel scan of the post table only got, took 200 milliseconds more. That’s weird, right?
Why would a parallel scan take 200 milliseconds more? That’s kind of strange. But, the funny thing about both of these is that at least this one is honest, right? This is a serial execution plan.
We have a serial operator here. There is no indicator of parallelism because this query didn’t go parallel. This one is, of course, a filthy stinking liar. Right?
And if we zoom in, I’ve talked about this a little bit before. Or if we zoom in on the properties of the index scan on the post table, we get this kind of funny thing where all the threads, all the rows end up on a single thread. Right?
How 17 million rows end up on thread 3 this time. If we ran this query 8 times, they’d probably end up on a different thread every time just based on however SQL Server is feeling at the moment. So, like, the first, you know, first sign that the query plan is lying to us a little bit is in that, you know, a parallel scan takes 200 milliseconds longer than a single threaded scan.
And, of course, this is a limitation imposed by the eager index spool. You’re only allowed to build an eager index spool on a single thread. You can read from it in parallel.
Like, if we look at the properties of the eager index spool over here, we can see that all 8 threads had some stuff happening over here. Right? So, that’s just one sign of the dishonesty.
But that’s not really what the video is about. I’ve talked a lot about eager index spools. So, what I want to talk about in this one is how the serial plan is hiding what it did exactly. If we look at the properties of the select and we look at wait stats for the parallel execution plan, we can see that we spent 266 total seconds.
Remember, it’s milliseconds. So, you chop off those last three digits for savings and you get to seconds. So, about 266 seconds waiting on exec sync.
Now, in the context of this query, exec sync is happening while these threads synchronize and build this eager index spool. Right? So, we had eight threads that needed to synchronize, build that spool up in tempDB, load the 17 million rows in it, and then have SQL Server read the 4,390 rows that we cared about from it.
Which, you know, is at least honest. Right? Seeing that 266 seconds was spent in there, that’s, you know, eight threads times whatever the time was.
Maybe that’s eight times 38 or something. I don’t know. I’m not terribly good at math. But the single-threaded plan up here tells us absolutely nothing.
Right? If we look at this, we have three wait stats registered to this query plan. And not a single one of them shows us anywhere near the almost 38 seconds that this query executed for.
Right? We have 44 milliseconds of memory allocations. We have 12 milliseconds of SOS scheduler yields.
And we have 8 milliseconds of reserve memory allocations. That does not, that’s nowhere near 30 seconds. Right?
These are all definitely milliseconds. And so, and I find this quite dishonest. Now, there are other times when either a serial or parallel plan might exhibit this type of thing, but you would still have a wait stat associated with it. And this is something I’m going to record a different video about, how wait stats for sort spills and hash spills at least surface themselves in some ways in either query plan wait stats or wait stats in general.
But I’m going to not spoiler that whole thing just yet. I’m going to leave a little something on the table for when I record that. But, so what’s interesting about these sessions is, and this is something that I tell people to be careful and to keep an eye out for all the time, is when pure select queries do a bunch of writes.
Right? So, this is indeed session ID 115. Oh, that’s the wrong armpit.
There we go. Oh, wait, maybe that was the right armpit. Yep. You know what? I just can’t get out of the way of that thing. Trust me, this is session ID 115. We had two select queries, and both of those select queries combined did quite a few writes.
Now, we weren’t inserting into, I mean, we weren’t instructing SQL Server to insert anything into a temp table, but it did build those spools up in tempDB. So, that accounts for the writes here. But, like, if you were doing, like, select and an insert, or select and something like insert into a temp table, or select into a temp table, or something else that were, like, there was obviously a reason for writes to occur.
It would be quite strange to see, or it would not be strange at all to see a select query involved with writes. But these are just two queries that selected data and returned it. There’s absolutely no reason for them to write anything, except those spools.
You know, another reason for select queries causing writes could be stats updates. It could be spills. It could be query store related.
There are other reasons why you might see it happen. But, always, like, it’s something that I really tell people to be on the lookout for, because, especially if that query is slow, then those writes could absolutely be the root cause of why it’s slow. So, you know, figuring out what caused the writes is, you know, an exercise in executing the query, getting the actual execution plan, like we did in this tab.
But it’s certainly one that’s generally worth doing. So, what do I think about this? Well, this is more just like an FYI video.
I really do think it would be helpful if Microsoft did add some weight stats to account for things like this, because it can be quite opaque to end users. Exactly why a query was slow. Now, like, let’s just take a step back, and let’s just pretend that we found these queries either in the plan cache or query store or some other monitoring tool where we didn’t have the benefit, the absolute luxury of an actual execution plan.
You know, we would see this. And, you know, if you’re typical, if you’re one of the normies out there, you might start looking at query costs, and you might say, oh, 82%, oh, 87%, and you might completely ignore these. Because these, with a cost of 11%, remember the actual execution plans, we spent about 30 seconds apiece in those eager index spools.
And this is, that’s where the majority of the time was spent. But, you know, you would say, oh, but the cost is so low. It’s only 11%.
How bad could it be? How bad could it be, man? Well, this is how bad it is. 30 seconds apiece. So, you know, always be on the lookout for this stuff. Eager index spools are a particularly nasty query plan operator to run into, especially if they are latched on to rather large tables.
Off small tables, they’re not really of consequence. But, you know, big tables, they’re definitely of consequence. So, just, you know, be very mindful of that.
And if you’re looking at, you know, if you’re running any script, whether it’s spquickiestore or spblitzcash or, you know, whatever dustbin junk drawer script you have that interrogates one of those or the other, maybe you’re using the query store GUI to investigate things. If you have, if you see a select query that is causing a bunch of writes, pay special attention to operators in the query plan that might be responsible for them.
Whether they’re spools or whether they’re memory-consuming operators like sorts or hashes that might be spilling. Because if it’s ending up on the sucker board of things that you need to tune, those might be good places to start. So, anyway, I have to go start the rest of my working day now.
Hopefully, wear some headphones to mask the headphone scalp that I have currently. Thank you for watching. Hope you enjoyed yourselves.
I hope you learned something. I hope that you will, you will deign to give this video a thumbs up or leave a nice comment. You don’t, you don’t have to say anything about the hair. We can skip the hair this time.
I don’t know. Maybe just, maybe some, say something nice about my keen fashion sense, I guess. If you like this video and you like this sort of content, you can, you can subscribe to my channel and you can join nearly 3,653 other happy YouTubers out there who get notified, who get a knock right in the head every time I drop, every time I publish one of these videos. Wouldn’t that be, wouldn’t that be nice for you?
So, anyway, time to go work. Thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.