Join Me At Data Saturday Dallas Sept 6-7

Spring Training


This September, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning for Data Saturday Dallas.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of streaming on-demand content.

Get your tickets here for my precon, taking place Friday, September 6th 2024, at Microsoft Corporation 7000 State Highway 161 Irving, TX 75039

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for my precon!

Register for Data Saturday, on September 7th here!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Hash Join Spills And Bailouts In SQL Server

A Little About Hash Join Spills And Bailouts In SQL Server



Thanks for watching!

Video Summary

In this video, I delve into the complexities of hash join spills in SQL Server, providing a detailed walkthrough of how these operations can lead to significant performance issues. We start by examining a query that selects from a table containing large text data, which quickly demonstrates the severe impact of hash joins on system resources when dealing with large datasets. Through extended events and a series of tests, I illustrate how even minimal data volumes can trigger extensive recursion and bailouts, leading to prolonged execution times that are both frustrating and time-consuming to observe. The video also touches on batch mode and row mode operations, highlighting the consistent unpleasantness they bring to query performance during hash join spills.

Full Transcript

Erik Darling here with Darling Data. Still, uh, feels like a perpetual situation. I’m not sure, I’m not sure that this can ever be resolved. At least not peaceably. Anyway, in today’s video, we’re going to talk about hash join spills. Because, oh, I don’t know, those seem pretty important. Because, uh, much like hash aggregate spills, if these really start piling up, they can ruin your day. Not in a good way. Not like, uh, if your car breaks down in front of a bar, and you’re like, I’ll just go inside and call AAA from the payphone. And you end up having a great day inside the bar. Because you’re like, well, I don’t have any quarters. And the bartender points to a sign that’s like, no change. With a payphone. And so you have to buy a beer that costs, 75 cents to get a quarter to call the, to call the tow truck company. Then you just decide to hang around for a while.

That’s my kind of day. Anyway, uh, yeah, hash join spills. Sorry, I was, I was, I got a little carried away there. I actually got, I actually got lost in that moment in my head. I was like, because I can, I can, like, picture the bar and the bartender. Did something for me. Did, did something special for me. So, uh, hash join memory grants. There is a lot to say about them. And thankfully, uh, I don’t have to say all this. Uh, there’s a link. Well, my, my hand goes away where I want to point. But, uh, there’s a link up in the, that I’m going to put in the, in the show notes, as it were. Uh, written by, uh, Craig Friedman, who actually played the part of the bartender in that, in that scenario that we just talked about.

Uh, where he will, he will tell you in great, well, I don’t know. Is that great detail? It’s pretty good detail. How memory grants for hash joins, uh, are calculated by SQL Server. And, um, this is, and what happens when they spill. So, there’s all this great stuff that you will learn from Craig. Uh, I’m not going to repeat all this stuff because that would be weird plagiarism. I’m just going to tell you that it exists here. If you want to pause and read it and like, like, like, like type in the URL from there, but just, just know that my, my source is cited. All right. I’m not, I’m not claiming that this green text is mine.

Definitely not. Definitely would never write all that stuff. So, uh, like I promised in the video about hash aggregates, uh, we are going to use an extended event, which is over here. And that is going to show us, uh, when we hit hash warnings, uh, in, uh, with, with the hash operator that spills out and starts, and starts going through different levels of recursion and then hits a bailout point.

And the bailout point is, uh, when the hash join switches over to some, uh, naive kind of nested loops join. Uh, so that’s, it’s not, it’s not a, not a fun time. I promise you.

So what I’ve done is I pre-run four queries and, uh, I’m going to, I’m going to show you what the queries are because they, they relate to what is up here. Where like the size of the data that is going through all the hashing stuff has a big part to do with how much of a memory grant the, the hash join queries need. So I’ve got two queries here.

Uh, they both do just about the same thing. They join from votes to comments, but they join on some really low selectivity columns. Right.

The post ID column and the votes table and the post ID column in the comments table are like, there’s only eight possible like numbers in there. So there’s a lot of matches in there. These are not unique columns where there’s like very few buckets of matches.

Right. And some of the buckets of stuff are going to be way bigger than other buckets of stuff because there’s way more of certain post types than others. That’s something that we’ve looked at a million times in these videos.

So, uh, I ran these two up here without any memory grant hints on them. Right. So these ones get the full memory grant that they want to run. And then there are two down below that are, that are capped.

It’s essentially the same two queries in the same order, just with caps on them. Now, if you remember from the hash aggregate video, the, the, I mean, aside from the row count and aside from the intent of the table, uh, the, the main, like the focal point difference, the, the crucial difference between the votes table and the comments table is that the votes table has like five or six integer columns in a date time column.

And the comments table has like four or five integer columns, a date time column, and then an envarchar 700 text column, string column. So the, and the string texty columns inflate memory grant needs way higher because of the way SQL Server estimates the column fullness. And because it’s a string and strings are a mistake and you shouldn’t put strings in databases.

It just screws everything up. So I’ve got these four queries already run and we’re going to examine the query plans just a little bit. So this first one where we select just from votes takes 7.7 seconds.

The one where we select from comments, right? We see the C dot star here and the V dot star here indicating the, the alias of the table we selected from. Okay.

Got that. And, uh, the, the one that selects from comments does take a couple seconds longer. Um, you know, not real, really any real reason other than like the chunk, the chunkiness of the chunkiness of the data. Right.

So, uh, we can see that happening sort of all throughout the plan, uh, where, you know, the stuff takes longer. Right. And, um, if we look at the memory grants for these, the one that just selects the columns from the votes table gets about a 4.5 gig memory grant, almost 4.6 there. And the one that selects from the comments table gets nearly a 10 gig memory grant.

Right. 9,855 megabytes. It’s about 9.85 gigabyte. Uh, yeah.

Almost 10 gigs. Yeah. 9.8 gigs. Yeah. Close enough. There we go. Math. I can do that. Sometimes. Sometimes I remember things. So, obviously, uh, like SQL Server’s memory grants here, nothing spills from either of these. The hash joins are fine here, which means that, and then also another good thing to point out is that there are no warnings on the selects.

So, sometimes if SQL Server, um, is like detects after query execution that a memory grant was either too big, way too big or way too small, uh, it’ll throw up a warning on the select operator. And it’ll tell you that, like, the memory grant was too big or too small, and if you have some sort of, you know, um, memory grant feedback mechanism in place, it’ll start adjusting that. If not, it’ll just twiddle its thumbs and stare at you and be like, hmm, guess you should have paid for Enterprise Edition, hmm?

Hmm. Oh, you’re not using the newest compatibility level? Hmm. Weird. Weird for you.

Yeah. Oh, that’s too bad. Hmm. Hmm. Yeah, I’m just, I’m, I’ll be over here if you need me. All these, all these features and capabilities, duh, you’re not in the right compat level, or you didn’t pay $7,000 a course, so I’m just gonna hang on over here and wait for you. Someday you’ll get there.

Real, real helpful, real nice, real cool. Psyched on that. So, uh, obviously, stifling the query that selects from the comments table is gonna hurt way more from a memory grant perspective than stifling the comment, cycling the query that hits from the votes table, because that string column in the comments table is gonna really whomp things up.

So, if we scroll down and look at what happens to the two sort of nerf-balled queries, uh, SQL Server begs for an index on this one, right? It has not begged for an index previously.

And, uh, if you look at the hash join operations, uh, this one spills for, uh, this query, a whole thing spills for, like, nearly a minute. But the one where we select from the comments table spills for nearly four, over four minutes. Nearly four minutes and fifteen seconds.

Nearly. One second off. Uh, and if we look at the, uh, spill levels on these, uh, this one spilled to level three, and, of course, all eight threads spilled. And keep this number in mind, 663-800, right?

So, uh, that’s how many pages got spilled. Now, if you were to look at, uh, like, the hash warning thing for these queries, the level, uh, the spill level would match the recursion level that SQL Server notes for, uh, the, for, in the hash warning thing. Um, and then, if we look at this one, this is also, oops, oops, that, this thing keeps reframing, and that, that messed me up a little bit.

This one spilled to level four, and with eight, of course, still with eight, all eight threads spilling, but that’s way more pages, right? The last one was, like, 663,000. That’s 3, 1, 1, 9, 4, 8, 8.

That’s a seven-digit number. I only have these fingers left. So, that’s 3.1 million pages. So, that’s pretty tough there, right? We spilled a lot more because that text data takes up way more space on the pages.

You need way more pages to hold on to it. So, this is obviously not a very good situation, but, uh, none of these, neither of these queries, even when we nerf them down to, um, 0.1 max grant percent hit, do we hit the hash bailout. Now, the first thing I want to show you is that hash bailout is not just for hash joins.

So, you may remember this query from, that runs for about 30 seconds from the video about hash aggregates. We’re going to run this again, and we’re going to watch the extended event that I have over here. And, uh, about 5, 10 seconds in, this will start showing stuff.

And, uh, we’ll see it go through the different levels of recursion and then the bailout. There we go. There’s recursion one.

Eh, no, this thing runs for 30 seconds. And then you have to wait for extended events to, like, you know, get its act together and put the stuff in there. Ah, there’s two.

And now it finished. Hey, look at that. All right. So, what happens in here? Let’s, let’s open this up and let’s take a slightly closer look. There we go.

So, here are our threads. And here, well, we only, let’s do the top one, so there’s only one. And, uh, I mean, kind of awkward, isn’t it, right? Uh, we have a bailout and then another recursion. And then, well, some recursions here and a bailout and then, uh, then some more bailouts.

So, uh, I, I, I don’t know. Maybe it showed up out of order or maybe, maybe things are just weird. Uh, but anyway, uh, you can totally get bailout and recur, recur, recursion and then bailout with just a hash aggregate.

You don’t need a hash join for it. But now, let’s behold the real majesty here. Oh, not that one.

This one. This, this one, this one’s some real good majesty. So, we’re going to take our, our really crappy query that selects from the comments table. Right? And we’re going to run this one.

And I’ve, I cleared out the data in here. So, there’s nothing in there anymore. And if we run this, uh, this thing will, uh, almost immediately, uh, start recursing and recursioning and bailing and outing. Uh, it does not take much for, for this one to kick in.

Uh, now, uh, I had, I’m going to come back to that one in a second. Now, I had run this same query with the, uh, with batch mode going for it. Um, this one fared okay.

Uh, you know, the, the weight stats. Uh, there’s one kind of new one in here. Um, so, when we saw the, the hash problems, uh, with, um, just the hash aggregates, I believe it was HT build and HT delete that were way up top. Uh, went with the hash join.

We’re starting to see a lot more HT memo and we’re starting to see this HT repartition weight. Uh, sleep task is still a big deal for, um, for both the row mode and the batch mode hash join spills. So, the sleep task is still a big part of that and it’s still not in the query plan XML for either one of those.

So, just something that you should be aware of there. The sleep task thing is still a fact. The sleep task weight type is still a factor there.

But now, let’s come back and look at this. And we can see that, uh, this query has been bailing out for quite a while now. So, uh, we hit some recursion and then we just started bailing and bailing and bailing and bailing and bailing.

This query will run for a very long time. This query will run for longer than I care to stand here. This query will run for longer than you would care to watch.

Um, it, it, it would, it would be awful. So, uh, we’re not going to watch this thing finish because it takes too long to finish. Um, there’s, there’s.

Probably a good joke in there that I’m not going to make. But, uh, if we look at all this data, we can see all of the bailing out happening across all of these threads over and over and over again. And, uh, we have just hit, we have hit a point where we, we no longer care to live.

So, uh, I don’t know. That, the, those, that’s what happens during hash joins or hash join spills. Um, you know, the, the weight stats are the same as, uh, as they are for hash aggregate spills.

And, um, yeah, they’re unpleasant. Uh, batch mode, still not good. Row mode, no fun at all. Uh, and again, probably way more worth, uh, paying attention to, uh, hash, different hash spills than different sort spills.

Unless the sort spills are in, uh, batch mode. Um, next video, uh, we’re gonna, we’re gonna look at exchange spills. Which are when parallel, uh, exchange operators, uh, run out of memory buffer space.

And begin spilling all over the place. Like, often, awful drunken bar patrons who swore they just needed to use the payphone 57 beers ago. They haven’t left.

I don’t know. Anyway. Uh, maybe, maybe someday that’ll be me. If I ever, if I ever build a time machine and go back to, like, 1981. That’ll, that’ll be my plan.

Gosh, the car broke down. I’ve, I’ve got all these bills from the, from the year 2030. No?

Alright. Whatever. Uh, okay. Um, yeah. Uh, thank you for watching. I hope you learned something. Uh, if you invented time machine, please take me with you. Um, what was I gonna say?

Uh, hope you enjoyed yourselves. I would enjoy myself if you made a time machine and took me with you. Um, that was, that was, that’s really the crux of this whole thing. Um, uh, if you like this video, for some reason, if you like learning about how bad SQL Server can be at things, uh, feel free to give me a cordial thumb up.

Or a cordial comment. I like those. Feel good about those. They really make my day. They brighten my whole mood.

Uh, and if you enjoy this sort of SQL Server content, uh, you should hit the subscribe button. Because we’re, we’re, we’re getting awfully close to 4,000 here. Which would, uh, I think break, break, uh, break my, uh, tie.

Or break my current sort of standing with, uh, Amiga repair channels. So, we’re, we’re gonna get up there. We’re gonna, we’re gonna break through to a new level of SQL Server fandom.

You and me. All together, my, my data darlings up there in the world. So, uh, yes, you should, you should like, you should, you should subscribe. You should, you should cordial comment.

And, uh, you should see me in the next video where we talk about when, when parallelism gets real, real messed up. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Happy Fourth Of July From Darling Data 🫡🇺🇸

Erik Is Not Here Today



Please enjoy this reasonable facsimile of what I’ll be hearing.

Thanks for sizzling!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Hash Aggregate Spills and SLEEP_TASK Waits In SQL Server

A Little About Hash Aggregate Spills and SLEEP_TASK Waits In SQL Server



Thanks for watching!

Video Summary

In this video, I dive into the world of hash spills within SQL Server, specifically focusing on hash match aggregates and their behavior in row mode. With a bit of a reflective tone, I share my experiences as Erik Darling from Darling Data, discussing how these spills can significantly impact query performance during what should be a relaxing Friday afternoon. The video covers various scenarios, including the differences between batch and row modes, and delves into the disappointing lack of detailed weight statistics provided by SQL Server for these operations, highlighting the frustration with Microsoft’s decision-making in this area. Through multiple queries and detailed analysis, I explore how adding more columns or text data can exacerbate hash spills, leading to substantial increases in execution time and page volumes spilled to disk. The goal is not only to understand these issues better but also to advocate for clearer insights into query performance bottlenecks.

Full Transcript

Erik Darling here with Darling Data. Feeling real bubbly and effervescent on this joyous, I think probably the final Friday of June 2024. Where did this year go? What happened? This year disappeared on us. Feels like slowly, slowly disappeared. Anyway, my eyes feel weird. My left eyelid won’t stop twitching. I think I have some form of irritation in there. But today, as you can see from the giant floating zarzad head of green text, in today’s video we’re going to talk about hash spills.

Because what the hell else are we going to do on a beautiful summer Friday except talk about hash spills. Yesterday we talked about sort spills. At least I think it was yesterday. It was probably yesterday. If anyone can remind me. What we did yesterday. That’d be wonderful.

Alright, so before the Don Julio shows up and everything changes, we’re going to talk about hash spills. Now, this video is only about hash match aggregates. This is not about hash joins. Hash joins are going to be in the next video. Not today. Another time.

Because hash joins… What’s really interesting with hash joins is looking at extended events for the hash bailouts and recursion stuff. Because that’s where you can see the sort of spectacularly bad performance that can come out of hash joins when they spill. So, in general. And now, I’m just… Specifically for row mode.

For batch mode, hash and sort spills make me very nervous. Because if you remember the sort spill video from yesterday, the batch mode sort took forever. It was like five minutes. Whereas, like the equivalent query in row mode only spilled for like a few seconds or something.

So, both batch… Both hash and sort spills in batch mode make me incredibly nervous when I see them. In row mode, hash spills tend to make me more nervous than sort spills. Because, as you’ll see in this video, like, hash spills of… Like, when we cap memory at the same amount.

And sorts usually require a lot more data. Because remember, you’re sorting all the columns you’re selecting by the columns you’re ordering by. But it’s usually like a much closer to size of data operation.

Whereas, with hashes, when those spill, for some reason, they just… Whatever algorithm is responsible for the hash spills tends to really beat performance up. Even for like similarly sized spill amounts.

So, like in pages, right? So, let’s look at this first query here. And we have query plans turned on.

Thank God, if I forgot that. If I forgot that and I had to restart recording this video, I don’t know what I would do. Alright, so, this query takes around 4 and a half seconds. And, you know, I’d really like Microsoft to recall the summer intern who did the operator times here.

Because these make no gosh darn sense. And I’m not quite entirely sure how to interpret this. Because in the row mode plan, the times are supposed to be cumulative.

So, the repartition streams should not… Should have technically run for as long… Should contain the time from the clustered index scan.

So, it should be at least 1.626 plus whatever time gets spent in there. I don’t know where to put that 1 in the accounting. I don’t know where that 1 goes.

I don’t know if we add it to the 1.6. I don’t know if it’s part of the 1.6. I don’t know if the 1.6 is part of the 1. I don’t… I just don’t know what to do with it. Alright?

So, let’s just say that the hash match aggregate ran for 2 and a half seconds or something. Right? Or 2 seconds. I really… These numbers are too depressing for me to think about too much. How we ended up here, I don’t know.

I like the batch mode version better where each operator is its own time. Because then I don’t have to worry about whatever mess this is. Alright.

So, relatively simple. Relatively straightforward. Now, we’re going to look at these next two queries in other windows. Because these next two queries took more time than I want to fill up dead air for.

While I’m waiting for them to finish. Alright? So, what we’re going to do in each of these windows is run the query and then look at our session weight stats for what happened when these queries ran.

And there’s going to be some real disappointing stuff happening here. Alright? And it’s not just related to the timing here where once again this number is lower than this number for some reason.

And I don’t know how to munch those numbers together into some sense. So, let’s just say that the hash match aggregate took, I don’t know, about 25 seconds. With the spilling.

Right? So, in the query where it didn’t spill, it took like two, two and a half seconds. In the query where it did spill, it took a lot longer. Right?

About ten times as long. Alright? Sort spills don’t usually hit you for ten times as long. I guess hash spills are different in that way. This hash spill looks about like so. Spill level two!

We had to make two passes of the spills. All eight threads did that and about 290,000 pages spilled out to disk. So, I don’t know.

That seems pretty slow for 290,000 pages to be honest with you on that. I don’t really know what to say there. Whatever. Not having a good time internally that hash spill. But what’s really disappointing is something that we’ve seen before in here where our old friend Sam, someone at Microsoft, that glorious idiot, decided to hide information from us.

Because the top weights that you see for this query over here are CX import, CX packet, and SOS scheduler yield. But they don’t really tell you where the time went in this query, do they? They don’t really account for the 30 seconds that we spent in here.

What comes a lot closer to accounting for the 30 seconds we spent in here are these things up top. Like Sleep Task and CX Consumer. That our good friend Sam said, It’s for your own good.

It’s in your own self-interest. All you wanted was a Pepsi, but you don’t get these weight stats. You can have the Pepsi. You’re not getting these weight stats in your query plan.

So the Sleep Task and the CX Consumer weights. Don’t show up in your query plan. Because Sam is an idiot. We don’t like Sam, do we?

Sam is not our friend. All right. So looking at this same query, essentially, but this one running in batch mode. So again, I’m using the query optimizer compatibility level to get batch mode on rowstore.

And I’m making this thing spill a whole bunch. All right. Spills are the name of the game.

They are the word of the day is spill. They’re also the number of the day. They’re also the special of the day. They are everything.

Everything for us. Now in this one, let’s look at the weight stats here first. So now we have some kind of new weights, don’t we? We have some HT weights. All right.

These are related to batch mode. These are very batchy mode-y related weights. But we still have a bunch of Sleep Task. And we still have a bunch of CX Consumer. Now let’s go look at our query plan and see what happened in here.

All right. This thing spilled for about 10 seconds. And this one isn’t too bad. Where they get really bad is with the bigger spills, which we’re going to take a look at in a minute.

But if we look at the weights over here. And we look at the weight stats. There we go.

That’s the button. We get htbuild and we get htdelete. Apparently these ones were considered important enough to show. But we still get no Sleep Task. And we still get no CX Consumer. Even though those are at the very top of our waiting query game.

Right. So to recap, we get this and we get this. We do not get this or this. We are all quite sad by that.

Especially because, you know, if you look at some of these wait times, these max wait times, that’s almost 11 seconds on CX Consumer. Right. And if you look at these total wait times in here, that’s a lot of time to not account for in an executing query. Right.

Kind of. It’s not cool, Sam. It’s not cool at all. We don’t like being lied to, Sam the man. All right. So let’s move on a little bit.

And let’s see what happens when our hash spills involve more columns. Remember yesterday when we had sort spills, the more data we added to those sort spills, the worse they got. It was from like a time perspective, from like a weight perspective.

Like they just dragged on and on and on and on and on. These, well, at least for the non-spilling query, even here we add some time to it. Right.

The first one was four and a half seconds. We’re essentially doing the same thing. We’re just selecting more columns and this took about a second and a half longer. Right. Like everything in here, despite, like, holy cow.

It worked on this one. I don’t know what happened. I don’t know what magic happened, but look, the time is actually cumulative on this one. That wasn’t true for any of the other ones.

What happened in here? I don’t know. I don’t know. Sometimes it works. Sometimes it doesn’t. These racy conditions, I guess. What’s happening inside your head?

It’s like trying to figure out what a toddler is thinking. It’s amazing. But anyway, this one did take a little bit longer. Right. This one did.

This one did take. It’s a little bit longer than the one where we were just selecting one integer column. In this query, we’re selecting one, two, three, four integer columns and one date time column. All right.

So now let’s go look at this one in another window. I’ve already pre-run this because, again, I don’t want anyone sitting around bored. And now look what happens here. That went from taking 30 seconds to taking one minute and 33 seconds.

This one took a full minute longer selecting more columns. All right. More stuff in here spilled because we are selecting more stuff.

This one got to spill level three. Right. This is a full level higher. A full spill level higher than the one before that. And the number of pages is also about tripled.

Right. This one went from like about 200 and something thousand to 832,000. Right. So a lot more stuff spilled, though, because we had a lot more in the hash to spill. So when it was just one column we were grouping by, we didn’t have a lot.

I mean, we still ended up messing things up pretty good. Good job, us. But we didn’t. But we didn’t. But this one, because we have more columns that we need to group by and all the other stuff, we end up doing way, way more work.

And in the results, we have way more sleep task and way more CX consumer than we did in the other query. Now, since this one is in row mode, we don’t have the HT weights, which is OK.

Like, we don’t need to know that. But again, these weights aren’t going to show up in the query plan weight stats because Sam needs to get talked to by someone.

Sam needs a talking to. Sam, Sam, Sam. So now let’s look at what happens when we start messing with text columns.

All right, so in this query, we are going to, if I recall correctly, I don’t know, again, some of these queries were written yesterday. Just kidding.

They weren’t written yesterday. I just can’t keep everything in my head all the time. Now that we’re selecting a text column and we’re grouping by this text column, this text column in the comments table.

I don’t know if you remember the sort spill video. We looked a lot at the like average length and the like, you know, like how SQL Server estimates memory for these things. And, you know, and especially at how even like the text column stuff in row mode tended to make spills worse because you’re dealing with larger data when it spills off to disk.

All right. So this query takes about 8.8 seconds. And this doesn’t spill. And somehow, miraculously, the repartition streams is working here.

Maybe it just takes like more data to make a repartition streams to work. Maybe like something has to really slow down in order for the code and read the repartition streams one to work the way it should.

I don’t know. It’s really weird. But at least it’s cumulative. I don’t know if it’s right, but at least it tracks, right? At least it’s logically cumulative going from here to here to here. At least we have that going for us.

But this, we know that by the time we get past the hash, it takes, we are a few seconds ahead of where we were when we were not messing with any text columns. Right?

So now we’re going to look at this thing running in batch mode and spilling. And, oh wait, this is the one I was supposed to close. Ah, nuts.

Here we go. So this is what happens when batch mode hash match spills a text column. Look at that.

2 minutes and 24 seconds. Ain’t that something? That’s crazy, right? That’s nuts. Like, like it’s, it’s right up there with how bad the batch mode sort was. 2 minutes and 24 seconds.

Can you imagine waiting 2 minutes and 24 seconds for this? Now, the spill level on this is back to, is back to spill level 3. But there’s a lot more pages in this, right?

Because we had that text column involved. And there, like there’s definitely some differences between the votes table and the comments table. The votes table is like 53 million rows about.

And the comments table is like 25 million rows. So the comments table, even though it’s smaller, because it has, we’re spilling that text column out. The data pages that we’re spilling out are way bigger.

I mean, not like way big, like there’s a way bigger number of them because the text column makes the pages, like adds more space, right? So the, when we’re dealing with like a whole bunch of narrow data types, even though we did spill a lot and it took a long time, it’s still not quite as disastrous as when we spill out like, like anything that involved text data.

Like the, the high end three, level three, eight spilled threat, eight spilled threat hash join from the votes table was like 800 something thousand pages. This is like 2.6 million, almost 2.7 million pages. So that text column adds a lot more page volume to the spill and really messes things up.

And of course we have in here, our friends. We have sleep task, ht build and ht delete, and cx consumer. And you know, again, for this query, the ht weights will be available in the query plan x xml, but sleep task and cx consumer, because our enemy Sam at Microsoft doesn’t want us to see these weights.

They are not going to be in the query plan, and a lot of the time that you would, a lot of the time that you would, would account for like what, what went wrong with this query. A lot of things that you would, you know, maybe see peripherally, like, you know, like when you go and examine a query plan, that would help you determine stuff are just not in there, right? So it’s always good to know where this stuff comes from.

Now, you know, I do a lot of experimentation with running queries and seeing what their weights are, and that’s sort of how I figure this stuff out. And that’s why in SP pressure detector, you know, the list of weights that I have in there, and with like a sort of description on them, will, you know, decode some of this stuff for you. So like if you, if you were looking at this query plan on your own, when you might like, like, look, the spill is visible, there’s an exclamation point on it, the operator time is visible, you can see how long that thing spilled for.

When you look at the weights, you don’t get the full story of what weights show up when these things happen. And that’s what you kind of have to know because when you’re looking at a server from the top down, if you’re like, you know, you just get on a server, and like you use whatever script you want to look at weight stats, hopefully it doesn’t screen any of these out because someone at Microsoft is a jerk. But maybe it would show you like, you know, these weights in total.

And if you saw the HT weights, and if you saw the CX consumer weights, and if you saw the sleep task weights, you saw the IO completion weights like from yesterday’s video, it would give you a better indicator of like maybe where queries are struggling as a whole. Right? And like that maybe like gives you a place to focus.

Right? Maybe it helps you figure out like, you know, like where the stress and strain on the server is. Now, especially if you see a lot of these spilly type weights, right, like IO completion, sleep task, the HT stuff, if it’s batch mode, you know, and you also see a lot of the page IO latch underscore whatever weights, that’s a pretty good sign that there’s just a constant battle going on between the buffer pool and query memory grants. And that’s that server probably doesn’t have enough memory in it in general.

You know, it might, you know, like, there might be all sorts of other ways you could go to try and get those numbers under control. But like, it just might be a sign that the server is completely underpowered. And that’s where you need to start.

Like, that’s where the quickest performance win is just like, just get some more memory in this thing if you can. Right? So, let’s go look at one last query in here. And we’re going to close this out.

And this one is particularly interesting to me because this one will get a hash match flow distinct. And that hash match flow distinct will spill. And we’re playing kind of a weird trick on SQL Server here with top.

And the bigger you set this number to, the worse this spill is. I had to find something in the middle. And then we’re going to say optimize for top equals one. And even with a recompile hint, can’t figure that out.

Or rather, it is still under the spell of the optimize for hint. And if you look at what happened in here, we, of course, have a number of things that we’re going to do. Once again, a whole bunch of sleep tasks up at the top.

10.991 milliseconds. And I think the reason why, like, you know, this one is helpful to look at is because this one is single threaded and a lot of the other ones run in parallel. So, it’s a little bit more clear, like, where weights go in here.

And so, if you look at the results where we have, you know, fully 10 seconds of sleep tasking, we can probably figure out just how much time was spent actually spilling on that single thread in there. Right?

So, but, you know, once again, if we look at what happened here and we look at the weight stats for the query, the only thing we will see is 4 milliseconds of SOS scheduler yield. All right?

There that is. There’s that 4 milliseconds of SOS scheduler yield and a query that ran for 25 seconds. All right?

So, we took 6.7 seconds here and we took, well, I mean, 25 is 19. And so, we spent about 10 of the 19 seconds in this operator spilling to disk. Isn’t that exciting?

Isn’t that exciting to know about? And, of course, the spill level for this is spill level 5. One spilled thread. Anyway, that’s about enough about hash spills.

Now, again, this was purely about hash aggregate spills. Tomorrow’s video, or actually, no, tomorrow’s Saturday. So, probably not tomorrow’s video and probably not Sunday’s video.

Maybe Monday’s video will be about hash join spills. So, I hope that you’ll join me for that. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. If you ever meet someone at Microsoft, I hope you have a good talk with them about the weight stats that they’re including in these query plans. That’d be nice.

You know, we deserve better. Us people paying, well, I mean, I don’t pay per core, but you probably pay per core. So, you deserve better.

Apparently, I deserve whatever I get. That’s okay. Yeah, if you enjoyed this video, I do like thumbs ups and I like encouraging comments. Up to and including you, go girl.

If you enjoy this sort of SQL Server content, you can join. Let’s see. Let’s make sure we have this refreshed up until the absolute most current. You can join nearly 3,790 other of my data darlings by subscribing to this channel and getting a notification every time I publish one of these.

And I would just like to apologize to anyone not in an East Coast time zone who gets this notification late at night, like someone in Europe maybe. Or even further away than Europe. Past Europe.

I don’t even know what time it is in New Zealand right now. Australia? Who can tell? So, I’m not sure if anyone else subscribes to me from further away than that. Probably not.

Anyway. I’m gonna go start Friday-ing, because it is Friday and it is time to Friday. Thank you for watching and I will see you in the next video about hash join spills. It will be just as exciting and riveting.

I promise you. I would never lie to you. I’m not from Microsoft. Or the government. Or the government. And that’s smart because I would take care of them. Welcome to this band right now. Now, what we’re aware of is that these are folks that we can play for with our muutest territory where we’re not Arabia Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Sort Spills And IO_COMPLETION waits In SQL Server

A Little About Sort Spills And IO_COMPLETION waits In SQL Server



Thanks for watching!

Video Summary

In this video, I delve into the fascinating world of sort spills in SQL Server, explaining how they can impact query performance and offering insights on when to address them. Starting off with a bit of personal frustration over Microsoft support’s suggestion to shut down an Azure instance to save costs, I highlight why better support is available through services like mine at Darling Data. The video then focuses on demonstrating sort spills using two queries—both with hints to ensure one spills while the other doesn’t—and explores why the spilled query can sometimes run faster than its non-spilled counterpart due to factors such as local storage speed and the complexity of sorting multiple columns, especially those containing text data. Through detailed analysis and practical examples, I aim to provide viewers with a deeper understanding of sort spills and their implications for database performance optimization.

Full Transcript

Erik Darling here with, as you may have guessed, Darling Data. In this video we’re going to talk about sort spills. Now, if you’re wondering why I sound a little bit low right now, oh boy. I just got off a very depressing client call where Microsoft support suggested out loud and with a serious face that one way that they could say, save money on their Azure bill would be to turn off their managed instance that runs their e-commerce site overnight so it didn’t accrue any spend. So yeah, we’ll just turn the website off at night. It’s like the early 90s when you would leave the office and turn off the lights and it would also turn off all the servers or something. So yeah, anyway, that’s the thing. That stunk. No one was happy. So once again, if you would like better support than Microsoft is willing to offer you. My rates are reasonable and I am available for higher.

Erik Darling of Darling Data is in fact available for hire for these kinds of things. So you don’t have to be abused by Microsoft financially and mentally. So you might notice that something up here that I’m doing is setting compat level explicitly to 140, 140 because that is the 2017 compat level. The reason I’m doing that is because if we use the 150 or 160 compat level, we will get sort spills that neither of us have the patience to sit and wait for during one of these videos. So here’s what I got from these two queries. It’s really the same query twice. But this is what happens when a batch mode sort spills. That is nearly five minutes. And remember, this is a fully batch mode plan.

So all of the operators in this plan are responsible for only for the time that is spent in them. Down here in the row mode plan, things are a little bit different. Things do get weird because you might see that this ran for almost 2.2 seconds. Repartition stream says, no, I only ran for 1.3 seconds. And the sort says, I ran for 4 seconds. So that’s a little misleading. But anyway, the reason for this is all explained in great detail in this wonderful post by Mr. Paul White, or Ms. A.K.A. Pablo Blanco. And it was based on this demo. And actually, my name actually appears in the blog post, which is a magnificent thing.

And never thought that I would see the day. And if you feel like Microsoft should probably work on this scalability issue, there’s also a feedback item that is actually under review. To Microsoft’s credit, I will click on this so you can see. So this feedback item opened by y’all is truly nine months ago. Oh, man, this one’s ready to pop.

This is under review. And I actually got a thank you from the company. That’s as good as a gold watch, isn’t it? All right. Getting a thank you. Comment thank you. Anyway, let’s get back to regular row mode sort spills. So I’m going to run these two queries. And they’ve got all sorts of hints and stuff on them to recompile and clear out the procedure cache.

And this one up here is I’m using the min grant percent hint to ensure that this thing gets the minimum amount of required memory to not spill. Because I want this one to not spill. And I’ve got this query down here using the max grant percent hint, ensuring that it most definitely will spill.

It was built to spill if you’re into that kind of music. We hear it darling data or not. We like hard goth. Hard goth only. Anyway, just kidding. We like a wide variety of music.

Depends on what the mood is. When it’s a bad mood because Microsoft support is awful, we listen to the hard goth. So let’s look at the query plans for these.

Because that’s what we do, isn’t it? We’re the data darlings who stare at query plans. And let’s be moderately surprised when we look at these two query plans. And we see that the plan that didn’t spill.

Oh, that was terrible framing by me. We’re going to do some sit-ups after this one. And the plan that did spill, we see way over here, these sword operators. The sword operator way…

Oh, my hand. I look like I’m doing something awful to that sword operator. The sword operator way up top did not spill. And that took five seconds. The sword operator right here…

Ooh, that’s nice. That’s right there. Man, that’s good framing on my part. That took 3.4 seconds. But why? Why?

Why my data darlings did this… Why did the query that spilled take less time than the query that didn’t spill? Now, this is something that I… Like, I never used to, like, really catch well query tuning things before Microsoft introduced operator times into query plans.

Because you would see a spill and all you would have to go on is, Well, crap. Spills are pretty slow, right? You should try to get rid of spills. If I fix the spill, maybe it’ll be faster.

It didn’t always turn out that way. Now, the reason why my spill is faster, I mean, first and foremost, is because I am on fast local storage, right? So this is Crystal Disk Mark hitting my fast local SSDs on this computer.

Again, these are SSDs plugged directly into all the same parts and components encased in a beautiful Lenovo laptop right next to where all the other hardware and stuff is.

Because, you know, you don’t have that probably, though. Because you work for knuckleheads. And you work for knuckleheads who dragged you kicking and screaming into the cloud where storage is awful.

Generally awful. And if it’s not the storage that’s awful, then it’s the path that the data has to take getting to the storage. It has to get way over here, right? Your data is nowhere near your SQL Server.

It’s miles away, probably. Miles of network cable away. So I get the benefit of fast local storage that I don’t have to go across miles of wires to get to. You probably don’t have that because you work for knuckleheads.

I work for one knucklehead. But the one knucklehead I work for bought one nice laptop to do demos on. So that’s why this sort is fast for me. It probably wouldn’t be fast for you.

I realize some cloud instances do have, like, a local storage with, like, you know, hyper drives on them. And you could get stuff fast there, too, probably. But most people don’t have that.

So they get really screwed up by this stuff. So you will probably want to fix sort spills. I probably don’t need to fix sort spills. But I’m going to show you in a minute how you know if you need to fix sort spills.

Aside from, like, just, or, like, if you have a lot of sort spilling and, you know, doing things. So one thing that’s sort of interesting about sort spills, at least in parallel execution plans, and we’re going to hope that this query works correctly the first time because this demo is a little weird.

Sometimes it’s, like, great the second I run it. Other times I have to tinker with the memory grant percents. And it’s not fun when I have to tinker with the memory grant percents.

And what do you know? I’m probably going to have to tinker with the memory grant percents. Let’s change this one to, like, 13. Because, you know, what’s funny is it worked three seconds ago when I ran this before recording the video.

Don’t take it out on me. I’m still better than Microsoft support. There we go.

That’s what I wanted to see. So if you look at this top query up here, right? This query, when it spilled, it only… So this query, just to make sure everyone understands, this query is running at doc 8. That’s this many fingers.

And this query spilled… Well, spilled level 1 and spilled 7 threads out to disk, right? That’s this many fingers.

8 threads is this many fingers. And so one of these threads is showing that it did something, right? So if we come over here and we look at the properties and we look at this, we will see one thread with 1,435 rows on it. It looks like it did some stuff.

But this is just a weird query plan timing issue, right? This is not actually an actuality kind of what happened. It’s sort of what happened. Both the thread stuff and the operator time stuff, as we saw in the previous demo with the row mode thing where the repartition streams was not in the realm of reality of what the other operators around it were doing.

The operator timing and the thread stuff can also not be anywhere near reality. It’s sort of like me after 8 p.m. Me and reality are not shaking hands anymore.

But so this query down here, which spilled to level 1 and spilled, oh, why did you disappear? You were right there. All you had to do was not leave like my dad.

So this is level 1 and spilled all 8 threads. So this sort, even though almost 53 million rows from both of these go into this sort, and both of these sorts sort 53 million rows, this one looks like it didn’t do anything. All right, it’s just, it’s all zeros in there, all right?

Like my report cards. So, again, the reason why, like, the sort spills are generally faster is because… I have nice local storage, which you don’t have, probably.

I hate whispering. Sorry. Sorry about that. Sorry about that. So sort spills can get worse as you have more columns to spill. So just, you’ll allow me to go back in time one moment.

If we look at this sort, we have one column in the output list, that is post ID, and one column in the order by, which is post ID descending. Okay? So if we run these two queries now, and again, I have my little hints here just to make sure everything happens the way I want it to.

That should be… Is that… Those are the right two queries?

I didn’t highlight the one above it, did I? That was rather foolish of me. Rather foolish. Oh, Eric. Where does your foolishness cease? Ever.

So we’re going to run these two, and this is important because you should understand this about sorting data in SQL Server. Right? And this one is still a little bit faster.

Not as crazy faster as the other one. Right? Six point… Oh, man. Zoom it is all over the place today. 6.2 seconds versus 5.7 seconds. But now the sort operators are going to look a little bit different than they did in the previous demo.

And they’re going to look different because we have, if the tooltip ever graces us with its presence, we have way more columns in the output list now. Right? We’re still only ordering by post ID.

But what SQL Server has to do is all the columns in the output list, those also have to be put in order. Right? Like, that’s what a sort does.

It sorts all the data that you’re outputting by the column that you’re ordering by. So, you know, again, I’ve probably gone over the Excel analogy a few times where when you’re using Excel and you click that button in the top left-hand corner and everything gets highlighted. And then you click sort and you choose a column and everything in the spreadsheet flips to match the sort order of that one column.

Or Excel kind of yells at you and is just like, are you sure you just want to sort this one column and not everything around it? Because you’d look kind of stupid if you did. So, that’s what SQL Server kind of has to do in memory too.

It has to flip all the order by columns to the order of the, so it has to flip all the output columns to the order of the order by column. So, that’s why queries that select more columns and need to order those columns need more memory. Right?

So, that’s one thing to keep in mind there. And one thing that will exacerbate those issues is when you have text data, or not just like the data type text or ntext, I mean like any string data really. Anything that is not, like all the columns that we’ve been dealing with before, these are all integers or dates.

I guess they’re all integers and there’s one date time. So, these aren’t like, you know, big honking columns with like variable lengths and stuff where SQL Server has to guess how much data is in them. Right?

So, if we look at the comments table and we run this query, right, what I want to show you is that a lot of columns in here don’t, like, so the, just to make sure, make sure, sure, we understand what we’re talking about here. The, the text column in the comments table is an envarchar 700. Right?

Envarchar. Double, double byte encoded text. And so, that’s why I have data length divided by two. Also, data length tends to be a little bit faster than length when we do these things. So, that’s why, that’s the divide by two there.

So, if you look at all the, the stuff in here, like, a lot of the comments that, you know, just, listen to this top section, don’t have very long length, byte lengths, compared to the maximum byte length of the column. But, what’s really interesting, ready, like, so, I’m going to show you this and I’m going to talk a little bit about memory grant stuff, is when we run this now, we have this average column length of 302 bytes. And this is actually, this actually plays pretty well into how SQL Server does memory grants for string columns.

Right? Because what it does is it guesses that every row that is produced, that needs to be sorted, for a string column, that, that, that row data will be half full. Right?

So, for a var, and varchar 700, having 302 bytes in there is actually pretty, pretty close to half. Right? So, the average comment length in here actually works pretty well for that algorithm. It might not work well for every, like, data set.

You might have, in varchar 700 columns, where, like, legit, like, everything really is only, like, you know, half or, like, 50 bytes full or something. And you would just get really excessive memory grants for that stuff. So, if we run this query right here.

And this query does not select the text column. And we look at what it does. This, this one still sorts, spills, this one still spills a little bit.

But it’s pretty quick, right? Two seconds. Like, no one’s, no one’s going to really gripe about two seconds. But now, if we run this query, which is, which is captive, a max grand percent of one with the text column involved. Here’s, here’s what we’re going to have to, here’s what we’re going to do.

Is we’re going to come over here. And we’re going to run sp pressure detector with a sample of 12 seconds, which might, which might help you understand exactly how long that query that I just highlighted is going to run for. So, we’re going to kick that off.

And then we’re going to run this. And what we’re going to see at the end is that SQL Server spent way more time spilling when there was text data involved. Because we have way more pages to spill out.

We have bigger data size to spill out because of that text column. Right? So, this whole thing takes about 10 seconds, which is just, which, if you’re wondering why pressure detector was at 12 seconds, it’s so I could run it and have like a second or two of grace period to, to come over here and execute this one. Right?

So, if we look at what sp pressure detector tells us about the 12 seconds that this ran for, what you’re going to see way up at the top is this IO completion weight. And if you, if you notice the, the helpful description column that I’ve put into SP pressure detector, just for you, just for you, because I love you and I care about you way more than Microsoft support does. So, this is the weight type.

This is the weight type that you will see from queries when they are spilling sorts a lot. There are different weight types, which we’re going to look at in future videos that happen when hash spills, hash operators spill from both a hash aggregate and hash join perspective. But the IO completion weight is pretty decent, like you can associate that pretty decently with row mode sort hash spills.

So, if this were, if this were a batch mode query, you would see BP underscore sort as the batch mode sort thing that was happening when things were spilling and getting awful. So, if you, if you’re looking at a server as a whole, or if you’re looking at weight stats for a query and you’re wondering what IO completion means, well, if, you know, if you have a lot of slow queries that are, you know, doing a lot of sorting and they’re doing that sorting in row mode, there’s a pretty good chance that they are spilling lots and lots of stuff out to disk. And since you work for knuckleheads who, you know, put you on the cloud, you’re probably going to have to fix those because that actually can meaningfully slow a query down right there.

So, IO completion weights, if you see those associated with running queries, a lot, some spilling going on. Whether that spill is the root cause of why the query is slow, you’re going to have to determine that or hire me to do that because I’m happy to, happy to tell you. Either way.

But that’s, that’s what you would have to do there. Anyway, my, my wife has been texting me for 20 minutes. So I should probably respond or something. But before I do, before I go, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something about sort spills. If you like this video for whatever reason you like it, it doesn’t have to be the content. It could be, it doesn’t have to be what you see in SSMS.

It could just be my bright, sunshiny presence here on the screen. I like the thumbs ups on the videos. And I like, you know, the you go girl comments in the videos.

You can even say you go girl. I won’t, I won’t be offended. So there’s that. If you like this sort of SQL Server content in general, please subscribe to my channel and you can join drum roll. Let’s hit this refresh button.

Make sure we’re totally up to date. You can join nearly 3,779 other data darlings who get notified every time I publish one of these videos that mean so, so very much to me. So once again, from the bottom of my heart, thank you for watching.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Join Me In Seattle To Learn Why Everything You Know About Isolation Levels Is Wrong

Everything You Know About Isolation Levels Is Wrong


The PASS Data Summit session lineup has been announced! And, you know, since me and Kendra are double-teaming two days of precons Nov 4-8 in Seattle, I’ve got a regular session too!

Everything You Know About Isolation Levels Is Wrong

You’ve been told that NOLOCK hints are bad, so you look at all the queries your developers write and hang your head in shame.

A NOLOCK hint here, a NOLOCK there, a NOLOCK hint seemingly everywhere. Like termites, eating at the foundation of your well-being.

But in the real world, how are you supposed to remove those all those yucky hints without blocking and deadlocking causing huge problems?

I’m Erik Darling, a world class NOLOCK hint removal expert, and in this demo-heavy session, I’ll change your mind about every isolation level. You’re going to learn why:

  • Read Committed is nearly as weak as Read Uncommitted
  • Optimistic isolation levels aren’t incorrect-result factories
  • Repeatable Read isn’t what it sounds like
  • Serializable isn’t the enemy of concurrency
  • You don’t need to worry about tempdb’s version store
  • No isolation level is perfect for every workload

At the end, you’ll have the confidence and knowledge to start turning on optimistic isolation levels and stop hanging NOLOCK hints all over your queries like Christmas tree ornaments.

Session Prerequisites: Basic understanding of locking and blocking problems, some familiarity with isolation levels.

Get A Deal On Ticket Prices


If you want to get a deal on registration — and you should hurry up and do that because birds of earliness prices expire on July 9th — head over here.

When you’re registering, use the discount code DARLINGE24 for $150 off the regular price for the three regular session days, Wednesday – Friday.

While you’re there, don’t forget to sign up for me and Kendra’s precon days.

Because I’m teaching on my birthday, and if you don’t come we are NOT FRIENDS ANYMORE!

Thanks for reading, and see you in Seattle!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About String Splitting In SQL Server

A Little About String Splitting In SQL Server



To get the PASS Summit discount, head over here and use the discount code DARLINGE24 to get $150 regular price on the 3-day convention.

Thanks for watching!

Video Summary

In this video, I dive into the appropriate use of string split in SQL Server queries, highlighting common pitfalls and best practices. Erik Darling from Darling Data shares insights on how using string split directly within a query can lead to poor performance due to inaccurate cardinality estimations. He demonstrates this with a practical example, comparing it to his previous video on table variables, showing that both methods result in similarly suboptimal performance. To avoid these issues, he emphasizes the importance of dumping the results into a temporary table before using them in your query, which allows SQL Server to make better estimates and optimize the execution plan effectively. By following this advice, you can write more efficient queries and even take advantage of his special discount code DARLINGY24 for Pass Summit, saving 150 bucks while enjoying some well-deserved drinks with Erik on his birthday!

Full Transcript

Erik Darling here with Darling Data. Gosh darn it. And in today’s video, we’re going to talk about two things. The main point of the video is the appropriate use of string split in SQL Server queries. The first thing, probably the most interesting thing, is I have a discount code for you for past Data Community Summit 2020, 2024. That is, of course, coming up this November in Seattle. Remember, one of those days in there is my birthday where you are contractually obligated to buy me a drink. Everyone at the conference is contractually obligated to buy me one drink on my birthday. And I am contractually obligated to drink that drink on my birthday. I’m going to die. But if you are on the fence or if your employer is cheaping out on you about coming to pass Data Community Summit this November 4th, through 8th, through 8th, in Seattle, Washington, I have a discount code for you. So if you use, and this will be in the show notes for the Darling Data Radio podcast for Data Darlings, if you use the discount code, and I hate this, Darlingy24, you can get $150 off the three-day conference ticket. That’s this thing. So if you’re bad at math like me, and you use SSMS as a calculator, the regular price is $1,895. With the discount, it’s $1,745. Now granted, that probably isn’t coming out of your pocket. Your boss or your company is probably paying for that. So if you want to save your boss or your company $150, they can… I don’t know what a company does with $150, to be honest with you. It’s always blown my mind that $150 would make a difference.

in a company budget. But anyway, if you want to get that discount, and when you sign up to go to Pass Data Summit, you can plug Darlingy24 in there and receive your $150 off. So that’s cool, I think. Anyway, let’s talk about the appropriate use of splitting strings in SQL Server queries. Now I’ve created a couple opportune indexes here on the post table and on the user’s comments table, sorry. And what I’m going to do is I’m going to show you two queries that hit these tables. Now this demo might look very familiar to you. If you remember my smash… I’m going to actually… I’m going to start running this because…

it runs for a little bit. So if you remember my New York Times best-selling YouTube video on table variables, splitting strings and using table variables have two very big things in common. SQL Server has no idea what’s in them. SQL Server has no idea what’s in your table variable. It might know how many rows are in there, maybe, under the right circumstances, but it has no idea what data makes up those rows. And you would have seen that in my video called, for the last time it doesn’t matter how many rows you put in a table variable.

And the same thing goes for a string. So this demo might look very familiar to anyone who is… who watched that video. But here’s what I’m doing. Wait, let me… I’m sorry, I’ve messed things up a little bit. I did not frame things appropriately here. I want to make sure… ZoomIt is very laggy on me today, I’m not sure why. ZoomIt is once again rebelling against me. So right up here, I am declaring some IDs, or I’m declaring an ID variable, not a table variable.

Right? Just in the… For the naysayers out there, it’s only a varchar8000, it’s not a varcharMax, not a varcharMaximum data type. So there’s nothing to freak out about there, just a varchar8000. Still a pretty long string, right? It’s a pretty long letter. Someone wrote 8000 bytes to me, I probably wouldn’t read half of it.

But then what I’m doing is… exactly like what I did in the table variable demo, to show you how… Sebel Server cannot do good cardinality estimation with table variables. It can also not do good cardinality estimation with split strings.

So I am doing exactly the same thing, where I am inserting the ID for Mr. John Skeet. At least the last time I checked, he was still a mister. I don’t know if that may have changed. It’s difficult to keep up with John Skeet. And then I’m also sticking in the top 99 nobodies.

Sorry if you’re in this group. The top 99 nobodies in the Stack Overflow database. These are users with a reputation of one who have created their accounts most recently. And we do not have high expectations of these new recruits, these cadets, because they have not had enough time to make an impact yet.

John Skeet has been around from the beginning, making big impacts, tyrannosaurus foot impacts on things. And so, you know, we don’t expect them to have much data. And so what I see clients do a lot is something like this, where they will extract values from a list of things.

And then they will either do like, you know, with an in or exists or a join. I’ve chosen a join in this case because it was the easiest way to convert the old demo to this one. Pure laziness on my part. Nothing funny going on, I promise.

But this is what I do, where I extract the value from the string split. And then I join that to the post table. And I join the post table to the comments table. And then down here, I show you how wrong of an idea this is.

This is definitely not a good thing to do. This is the bad way of doing things. This is the inappropriate use of string split in a SQL Server query.

It’s not how you want to do things, because you will end up with very bad performance. If I move my armpit a little bit, oh, that’s my hand too. You might see two minutes and 13 seconds under there.

And that two minutes and 13 seconds was largely eaten up by just this query here by itself. Right? Because this is the crappy way of doing things. If you do things in this way, my rates are reasonable.

You can hire me to fix these things. I will fix them all day long, happily. I’ll whistle. I’ll draw you doodles of things. I’ll bring flowers.

Maybe we’ll go on a picnic. I don’t know. We could do all sorts of stuff. But really what you want to be doing is something more like this, where you can split your strings into a temp table with the appropriate data type already assigned to it. One thing that it doesn’t…

So it can certainly make things messier, but it’s not like the biggest issue with the string split. And I’ll show you in a minute. But by default, the value column that comes out of string split is like some big envarkar column.

But SQL Server is able to like do a convert implicit on what comes out of the string split function if you’re using it like I am where you’re joining it off to something. And I’ll show you that in a minute in the query plan. But then if we put that into a temp table with the correct data type, SQL Server can do this a lot better.

Because two reasons. One, you know, again the correct data type, minor point there. But when you put that data into a temp table, SQL Server is then free to figure out what makes up that data.

You get a statistical histogram on the temp table, so SQL Server can do better cardinality estimation. This is almost the exact same thing that happened with the table variable. SQL Server doesn’t know what makes up the data in the table variable.

SQL Server also doesn’t know what values, what comma separated values make up the string that you’re splitting. It does not pause every time it extracts a value and say, oh, it’s this value, cardinality estimate. Or it doesn’t like extract, it doesn’t fully extract the values and say, oh, this is what I’ve got.

Let’s do cardinality estimation based on these values. It just says, uh, crappy guess. Got a crappy guess for you?

Want a crappy guess? Got a crappy guess. Got lots of crappy guesses. Handing them out, making it rain crappy guesses. So, if you’re gonna split on that string, what you want to do is not this, where you immediately join to it in your query. You want to do that and dump it into a temp table.

Right? And I think the proof here is in the query plans. Now, we return exactly the same result for both of these. That’s a big number and all, but if we look at what goes on here, right?

We have our initial string split, right? Or, sorry, this is our initial string assignment, right, up here. And this is quick, about 45 milliseconds in total.

And then down here, this is our first attempt at the count query. And this ends up performing just about as poorly as the table variable query for just about the same reasons. Now, I could have made this worse, but I chose to not make it worse.

I chose to make it a little bit easier because if I made it worse, it might have looked a little unrealistic. But, I mean, if you look what happens here. Oops, come on, zoom it.

This query takes 2 minutes and 12 seconds in total. And SQL Server guesses 50, but we get 100, right? So we have a 200% misguess here.

But then SQL Server’s estimates down here are also awful. We are off by 5, wow, that lined up really nicely. I didn’t do that on purpose.

If I tried to do that a million times, I would not be able to do that. That is a very nice… Look at this, look at the symmetry on that. God bless. Wow. Anyway, SQL Server makes a very bad guess here, right?

We get almost, we’re not even close to reality. We get roughly 5,804% wrong. But granted, since we have a good index, it happens pretty quick. The trouble is that, I mean, A, the guesses are off.

And then SQL Server has chosen this particular plan shape, where we just go from nested loop to nested loop. And you’ll notice that we spend quite a bit of time in here, where this estimate is off by some number that can only be conveyed with an ellipsis at the end. That is 81704…

I don’t know where that number ends. I do know, looking at this, that if we pay attention to these numbers right here, and again, I’m not good at math as a high school dropout. A public education received student.

But, I can tell you that 1-3-0-4-0-0-9-0-6-7, that is a 10 digit number, is a lot bigger than 1-5-9-6-0, a 5 digit number. How much bigger? You figure it out.

81704 something percent bigger. A lot bigger, right? Orders of magnitude. If anyone knows how to order a magnitude, please let me know. I’m willing to pay extra.

So, this thing, this whole thing takes 2 minutes and 12 seconds. Now, if we look at the rest of what happened in here, when we go, and we look at our ingenious method of, dumping data into a temp table from the string that we are splitting, right?

We dump this in here, and we look at the query plan down here. It’s going to look a lot like the query plan that we got when we used a temp table in the, for the last time, it doesn’t matter how many rows you put in a table variable video, where SQL Server makes slightly better guesses.

So, in here, SQL Server actually makes the right guess. It knows 100 rows are going to come out of here. It even goes to the trouble of sorting those 100 rows, so that we match the order of, if we had a clustered index on the IDs table, we wouldn’t need to do that, but that’s okay.

For this, I’ll forgive the zero second, the zero anythings that it takes to do that, right? And then we, granted, this guess is not fantastic, but you know, the default cardinality estimator is not fantastic, so we don’t get an impossibly good guess here, but we don’t necessarily, you know, the nested loops join thing is fine here, okay?

The big deal here is that SQL Server sticks a stream aggregate between one nested loop and the other, right? So we have this combination of query plan operators in this plan that we don’t have in this plan, right? Remember I said from this one, it just goes from nested loop to nested loop.

For this one, SQL Server says, you know what? I’ve thought about it, and I think we need to trim this down. And so SQL Server has to do far less work in here.

This goes from taking, let’s go look at that again. This goes, that took, ah, get out of the way tooltip. No one needs you.

We’re all hitting F4 and getting properties when we need to see that stuff. So this goes from taking one minute and 28 seconds to taking one millisecond. Now granted, this cardinality estimate is still off by a little bit, but this aggregate saves our buttocks in a very serious way.

And so this query working off the temp table only ends up taking 10 milliseconds. It’s a big difference, all right? So when you’re, I beg of you, if you are writing queries in SQL Server, and you are passing comma separated or anything separated lists, even if you’re passing XML or JSON or anything like that, and you need to relate that blob of string text to something in a query, in a joiner where clause, or a subquery, or an exists, or really anything relational, anything with relational meaning, please, I beg of you, do not do the string splitting in the query.

Take the XML or the JSON or the comma separate, anything separated list of things you’re passing, dump them into a temp table.

You can use any string splitting facility you want for that. You can use the built-in one that Microsoft gave you in 2016. You can use the one that you copied and pasted that’s a multi-statement table valued function with a while loop in it. You can take that one that you copied and pasted from the internet in 2002 and use that, and it will not profoundly affect your query because you’re dumping it into a temp table.

And then you’re going to use that temp table in your query instead of doing the string split in the query, and you’re going to look so smart in front of everybody. You’re going to look so smart that your boss will be like, Hey, you know what?

You get to go to pass summit this year, and you get to use Erik Darling’s discount code DARLINGY24 to do it, because we’re going to save 150 bucks, and I’m going to put 150 bucks cold hard cash in your hands to buy Erik Darling drinks for his birthday. That’s exactly how that’s going to go.

So, for the sake of all of us, so that you get to come to pass and celebrate my birthday with me, and you get 150 bucks cold hard cash from your boss to buy me drinks, you’re going to not put string split where it would matter in your queries.

You’re going to dump your split string results into a temp table, and then you’re going to use that temp table in your query. All right? Cool.

So, with that out of the way, thank you for watching. I hope you enjoyed yourselves. I hope you learned something about string splitting, the appropriate uses for it in SQL Server queries.

If you like this video, you know what I like? Thumbs ups. I like nice comments.

I like motivational comments. Those are cool, too. If you like this sort of SQL Server content, you are free to subscribe to this channel and join, let’s see, let’s get the most current tally on this, nearly 3,775 other data darlings out there by subscribing to the channel and getting a delightful notification, very helpful notification every time I publish one of these videos.

And you know what that gets you? 150 bucks off pass. And the pleasure of my company.

And, I don’t know, cool way to kill 15 minutes? I don’t know. Maybe you’re like, you’re in the bathroom, you got nothing else to do. You finish playing Wordle.

You finish playing Words with Friends. And you’ve gone through your RSS feed. And you’re just like, what am I going to do for 20 minutes? Earbuds in.

Watch my video. Soothing. It’s meditative, practically. Reduce blood pressure. Give you a good laugh. I don’t know.

Whatever. It’s a cure for what ails you, basically. It’s like drinking a Guinness in the morning. Right? The smart thing to do. Anyway. That’s enough for me.

I’m going to go back to work. Maybe go to the gym. Give this Adidas t-shirt a workout. Right? Get sweaty a little bit.

And I don’t know. I’m going to think about what to record next. Because I’m always thinking for you. Anyway. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Some Notes On Query Tuning With Batch Mode In SQL Server

Some Notes On Query Tuning With Batch Mode In SQL Server



Thanks for watching!

Video Summary

In this video, I delve into some interesting aspects of query tuning in SQL Server, particularly focusing on batch mode execution and parallelism. Starting off with a humorous anecdote about re-recording due to a weekend brain fog, I share the excitement of being voted “the classiest SQL Server consulting shop ever” by BeerGut Magazine—a moment that adds a light-hearted touch to what can sometimes be complex technical discussions. The video then explores how batch mode execution works and its limitations, especially in achieving full rowstore parallelism. Through detailed examples, I demonstrate techniques like using temporary objects and query hints to optimize performance, highlighting the nuances of SQL Server’s heuristics and the importance of understanding these mechanisms for effective query tuning.

Full Transcript

Erik Darling here with Darling Data, re-recording the video that I uploaded Friday where it sounded like my throat was full of weird bugs. So we’re going to try this again and hopefully I remember to say all the right stuff because it’s been a weekend and after a full weekend everything that was in my brain the week before disappears. So we’re celebrating a very very special day here at Darling Data. We were voted by BeerGut Magazine to be the classiest SQL Server consulting shop ever. We’ve got some drivers apparently Intel is telling me about. Thanks Intel. Get right on that. And yeah, we’re going to talk today about, I don’t know, some batch mode stuff, some parallelism stuff, some parallelism stuff. You know, some general query tuning stuff. Nothing, there’s nothing overly specific about this video. It’s kind of a funny thing about it. It’s just, it’s just the stuff that I find interesting and you’re just going to have to deal with that today. You might find value in, you know, a few different places here, but there is no overriding theme or point to this video aside from the fact that you can see here.

I find it interesting. So let’s look at a query. Let’s do that. All right. We’re going to make sure we’re doing all the right stuff there. We have query plans turned on and now let’s run this. And what we’re going to do is in about eight seconds, we are going to see the most offensive query plan ever. Highly offensive. I might have to blur it out. That’s how bad it is. Because Tipper Gore is going to be all over this channel when you see this thing. Look at this. Behold. Behold the terror, the tragedy that is this query plan. Look at this. Look at this thing.

So we have two, it’s so awful. We have a parallel merge join, which of course should not exist. Right? That’s just ugly. And for some reason, SQL Server decided that it wanted to sort a 53 million row table to make that parallel merge join happen. And no, just no SQL Server. You’re bad. You’re awful. Why? Why? I’m downloading Postgres as we speak. Postgres would never do this to me. Actually, I’m lying. It probably would.

But any database engine can forsake you like this. None of them are, you know, none of them are beyond this level of offense. But holy cow. It’s a wild one, isn’t it? It’s a real wild one. Now, whenever I’m tuning store procedures or queries and I see stuff like this, one of the first things that I always try to do is just see, like, what will happen if I use batch mode. Because oftentimes, batch mode will help us avoid calamitous query plans like that. And, you know, for queries that do a lot of CPU work, have a lot of CPU effort, batch mode can be very useful.

Because we work on batches of rows. We don’t work on a single row at a time. Sorry, something is itching my forehead and I don’t know what it is. It’s a bit distracting. Anyway, so what I usually, you know, first try to do, and, you know, if I’m using a SQL Server version that has in-memory tempDB turned on, where I can’t use a temp table with a columnstore index because I’ll get an error, which you don’t have to worry about in managed instance because it’s, like, apparently 15-year-old software and that option isn’t available in it.

Don’t use managed instance. But this is one of the first things that I’ll try here because this will, you know, give me at least some hope of maybe getting a batch mode plan that’ll be helpful. Right? And so what I’ll usually do is something like this, where I’ll, you know, create a temp table.

There will be a clustered columnstore index on it. And then I’ll do sort of a funky, tricky, little lefty join-y thing like that just to, you know, let the optimizer get a little width of a columnstore index so that it’ll be a little bit more lenient about using some batch mode operators.

Now, one thing that I’ve been sort of lazy about showing in demos when I do this is I usually give the column in whatever, you know, either temp table, columnstore helper object, or if I can’t use a temporary object because of in-memory tempDB, and I have to create, like, a dbo.t or dbo.more descriptive name, like columnstore helper table, I’ll always give this column kind of a weird name because if we were just, if we were being real lazy, and let’s say we were doing select star here, or we were doing, like, we had a CTE, we were doing, like, select into, we would get potentially duplicate column name errors.

Or if, like, we were working on just a real shamble of a query where no one was aliasing stuff, we could get, like, weird, ambiguous column name errors. If, like, this column was just, like, called ID and we were selecting just, like, ID up here, right?

Or we were joining to just ID down here without an alias. So it just, it kind of helps you to avoid weird stuff if you give that column a weird name. So there’s that.

Again, another weird little point along the way. Something for you to marinate on later. So let’s see what we get now because we’ve got this all set up. At least I think I created this temp table.

We’re going to find out. If this throws an error, it’s fine. If it doesn’t, then it would have thrown an error when I ran this query. So either way, you were going to see some, you were probably bound to see some red text if I didn’t do that, right? If I didn’t preemptively do that.

And the problem we run into here is that this trick doesn’t work here. SQL Server still chooses this godawful plan where, again, we have a parallel merge join, which is just grotesque beyond compare.

And SQL Server has once again decided to sort 53 million rows to get us, to allow for this, right? And that is, this is just no good. All of this, bad.

If I were a better artist, I would draw like a thumbs down thing here. But it’s terribly difficult to draw with Zoomit. And, you know, I think this is bordering on obscene. So I’m going to stop that.

But anyway, so this is one of those kind of funny things that happens is we’ve actually costed ourselves out of a better execution plan. So let’s run these two back to back, right? We’re going to have our eight-second monstrosity.

And then we’re going to have a slightly faster plan below it. And the reason we’re going to get a slightly better plan below it is because we have told SQL Server, please, no more any kind of join other than hash join.

And when you force SQL Server to use the hash join plan, it is also far more likely to engage batch mode because batch mode and hash joins, well, they go together like Forrest and Jenny, right? They make quirky little babies.

And we’re going to, okay, so now let’s look at the query plans. All right. So we have our same monstrosity up here. Il monstro, we hate you.

You are awful. If I saw you in real life, I would tell whoever was running SQL Server to get a refund. And down here, we have a slightly better version of the query plan where we have these two operators, which I’ll show you in a minute.

But these two operators, we got our hash join and we have a sort here. But the important thing here is that we didn’t get a parallel merge join. And the reason we were getting parallel merge joins before is because costing is stupid and broken.

And if we look at the estimated subtree cost of this query at $3,945 query bucks, and we look at the estimated cost of this query at $4,512.88 query bucks, we can at least understand why SQL Server chose this awful monstrosity of a plan.

It thought that the cost would be much lower. But it wasn’t. Well, I mean, it was.

I mean, the estimated cost was much lower. But it was still wrong. So, like, one thing that I’ve said in a billion videos at this point is, you know, you can have queries with a very high estimated cost that are fast, and you can have queries with a very low estimated cost that are slow.

So, cost really has nothing to do with how fast or slow a query is. Right? Cost is just a bunch of estimates that the optimizer uses to assemble a query plan and hope it’s the best one.

The problem is that all those estimated costs are a lot like lottery numbers. If you knew what the actual numbers were going to be ahead of time, the results would have been much better. Like, I think that if SQL Server understood what it would take to run this query, the cost would not have been so prohibitive that it chooses this awful monstrosity up here.

So, that’s that. Now, the thing that I want to show you in this bottom query plan, actually, is that we have a sort that runs in batch mode.

And, like I said, we have the hash join that runs in batch mode. But that’s it. Right? This clustered index scan on the post table, I believe. Yeah, post.

And this clustered index scan on the go away SQL prompt on the votes table, these both run in row mode, which isn’t how it has to be. Now, if you’re…

Okay, so here’s, like, the list of, like, things that have to align perfectly for you to get batch mode on rowstore. You have to be on SQL Server 2019 and up.

You have to be in compat level 2019 or better. And you have to be on Enterprise Edition or an equivalent SKU, which could be, you know, developer edition, which I wouldn’t blame you if you wanted to use that.

Because that’s how Microsoft screws you into buying Enterprise Edition in production. When, like, you’re developing stuff on developer edition and you’re like, wow, all this stuff works great and is fast.

And look at this nice thing that SQL Server does. And then you, like, release it on standard edition and everything kind of falls apart. And you’re like, wait a minute, what happened? That’s what happened.

So, you know, I sort of sympathize with people who are like, oh, well, we’re just going to use developer edition. You know? Who am I to argue?

Screw job like that? Get your revenge. So, anyway. So, like, it doesn’t have to be that bad. So, what you can do on newer versions of SQL Server is, let’s say you, like, that turned out pretty all right, right?

Where you have 3.6 seconds, like, down from 8 point, I mean, at worst, this thing was like 9 point something seconds. So, we got it down to 3.6 seconds just by getting a couple things in batch mode and using a hash join instead of a merge join and not sorting 53 million rows and spilling and, God almighty, what’s wrong with you?

But the thing that’s interesting to me is that batch mode tricks do not get you the full batch mode on rowstore experience that you could get, right?

So, if we, what you can do is you can use these compatibility level query hints to get you higher compat levels than what your database is set to, right? So, this is, like, a good way to sort of override some things and be like, hey, I got a better idea.

So, let’s run this query now. And, unfortunately, we still need the hash join hint there because if we don’t put that there, SQL Server will still cost itself into that crap hound of a merge join plan.

So, batch mode on rowstore has problems because it is all based on heuristics. And those heuristics might kick in, they might not. And even with me doing my batch mode trick there in the higher compat level, I still don’t get batch mode where I should.

I still get costed into that awful merge join plan. So, now if I look at this execution plan, this one’s a full second faster. And where it gets faster is on this clustered index scan on the votes table.

For some reason on the post table, SQL Server, oh, come on, give me the tooltip. I don’t need a missing index in. So, for some reason, SQL Server on the post table is like, no, we’re going to scan this in row mode still. That’s whatever.

And down here, come on, go away. Zoom it, catch up. Do try to keep up, zoom it. This actually does get scanned in batch mode. So, this ends up faster. But there’s something interesting going on in this query plan.

And then, so if I go and look at the properties here, this is something that we have looked at before, you and I, all of us lovely folkses together.

And this is something kind of interesting, right? Another, again, something interesting along the way. Hmm. Neat stuff.

So, we have eight threads that cooperatively scanned the post table, right? All these numbers, I promise you, will add up to this number. But only one thread, oops, I hit the wrong button there. Only one thread produced any rows.

Produced 47,575 rows. So, only thread two. This lucky devil right here did any real work. All right? Well, it produced any real work. All right?

So, you know, this thing goes into the hash join. And then, I think down here, it’s sorted out pretty well. I kind of forget, though. Oh, so only, yeah. So, down here, two threads end up doing some work. So, this is, again, some sort of a bit of lopsided parallelism.

Unfortunately, batch mode stuff does not fix this sort of parallel skew issues that can come up. Now, one thing that you can do to get around that is you can use a temporary object, right? Because sometimes, if you can’t, you can’t always rewrite a query to get it to do exactly what you want and fix every single problem.

So, sometimes using a temporary object is the right choice. So, let’s dump our post table stuff into a temp table. And notice, this is a whole heck of a lot faster now, right?

Remember, there was like 1.7-something seconds. Now, that’s only 500 milliseconds. So, we save almost a second and a half just by doing this. And what’s nice about this now is that rather than just ending up with all our rows on one single thread, we actually spread things out quite a bit.

So, we don’t have that same parallel skew problem just dumping that into a temp table. So, we’re doing well here, right? I think we’re doing better, right?

We’ve made this a little bit faster. But now, watch this, right? This is, you’re going to get a kick out of this. I know you’re going to get a kick out of this because you watch my videos. And if I get a kick out of something, you’ll probably get a kick out of it too.

Now, watch what happens here. Now, we get a batchy, modey plan, right? I think, yeah, batch mode, batch mode. Okay, so that turned out pretty good.

You know, our left join to the little temp table over there did us some good. And now, we’re working up this temp table. But watch what happens down here now, right? This one’s okay, right?

This one’s kind of okay. This number sticks out a little bit further than the rest. But now, look at, watch, ready for this? Ready for the filter? Actually, let me select this so this stays.

You ready for this filter operator? Bam! We just shifted the parallel skew problem over here. All right, that’s interesting.

Okay. So this thing, all in all, this takes 1.8 seconds. 1.889 seconds in full right there. All right.

All right. So now, batch mode, you know, for all the great good that it does to queries and SQL Server, sometimes it is, sometimes it doesn’t do as much as it should when you’re only using the batch mode tricks. Right?

Right? And what, the reason, I mean, I don’t, is there a reason why? I suppose if there’s a reason why, it’s because, you know, we are using the older compat level where batch mode on rowstore isn’t available. Right?

So if we were in compat level 160 or better, like, we would get, like, a better batch mode experience, but only if batch mode on rowstore, like, chooses us. Like, just using a batch mode trick doesn’t necessarily guarantee you batch mode on rowstore. Now, let’s race the batch mode version of this query versus the row mode version of this query.

Right? Because, like, the only two operators in here that get batch mode are these two. You know, this, all this, all the rest of this stuff still happens in row mode.

And, of course, the parallel exchange operators, they don’t, they don’t have any batch mode support. So their row mode is hell anyway. Now, look at down here, though.

That’s 1.9 seconds now. This one is just one second. Our fully row mode hash join plan is only one second. Pretty wild, isn’t it?

Now, it’s not because one of them gets a bitmap and the other one doesn’t. This filter operator is an optimized bitmap. There we are. Optimus bitmap-ius. And so we have a bitmap down here.

Remember, in these query plans, bitmaps, they get made up here, but then they get used down here. Right? So they go this way and get used to filter out rows here, usually. Sometimes they’ll get stuck at the partition stream.

Sometimes they’ll get stuck somewhere else. But most of the time, when you have a nice, effective bitmap, they’ll get pushed down from here to there. Thereabouts. You know, you can figure it out when you look at your query plan.

But this is a lot faster. And we don’t have the sort of same weird skew issue. At least I don’t remember having the same weird skew issue on this one. This one spreads all out.

So this one, the parallelism works better in the row mode version of this plan. That might not always be the case, but it’s certainly the case here. Now, if we run this query and we ask for this to happen in compat level 160, we should get a fully, a more batch mode-y version of this plan, where now we get actual batch mode on rowstore. So now we’re actually able to read once again from the votes table in batch mode.

And this gets us down to about under a second. So if we were to recap this a little bit, the query with just batch mode is about 1.8 to 2 seconds. The rowstore version of that query is 1 second.

But the fully batch mode-ed on row stored version of that query is about 800 milliseconds. So that turns out a lot better. Now, one of the reasons why Enterprise Edition is so important for batch mode workloads is because Microsoft, once again, I wouldn’t blame you if you use Developer Edition, hobbles the hell out of you in Standard Edition, where batch mode stuff is limited to a DOP of 2 in a parallel plan.

So a lot of times when I’m tuning stuff on Standard Edition, if you have a pretty big aggregation query that’s running at DOP 8, and you’re like, oh, we can try batch mode, you know, you’ll be pretty lucky if you end up with it running for like an equivalent amount of time because you lose like six DOPs if you’re running at DOP 8. So there’s a pretty big chance of a downgrade.

If you’re only running at like DOP 4 and you get downgraded at DOP 2, maybe less. But if you’re running at DOP 8 or higher, the DOP downgrade on Standard Edition with batch mode can be pretty painful. That’s why when like tuning Standard Edition stuff, it has to do big aggregations.

I’m far more likely to lean towards like indexed views and stuff like that because you don’t have those DOP limitations on indexed views. Granted, you have an indexed view, so you have a different set of problems. But, you know, an indexed view that has good indexes to help it, you know, get maintained is, you know, in the words of my wine distributor in New Zealand, it is no more overhead than another nonclustered index.

So, you know, hopefully that works out. Now, there are all sorts of things. I mean, say there are all sorts of things. There are like literally two things you can do to trick SQL Server into using batch mode.

One of them is what we did up above where we had a temp table with a clustered columnstore index on it. And we did a left join on 1 equals 0 to that temp table. And SQL Server smelled a columnstore index.

And it was like, ah, maybe batch mode. Another thing you can do if you’re not free to rewrite queries in that way is to create an index like this, non-clustered columnstore index that’s filtered.

And if you filter it to something that cannot possibly exist, then this index will not actually contain any data, but it will exist on the table as a columnstore object and give SQL Server some, you know, some impetus to use batchy, modey stuff.

You know, and if you’re, you know, tuning sort of regular things, you know, supporting indexes, you know, they’re good things. They’re there for a reason.

You know, they can help you with your joins and they can help you with your filtering and other stuff like that that are useful. And sometimes creating indexes might be, like, you know, the final answer, but the batch mode stuff is usually pretty useful as well.

So anyway, just a few notes on, you know, using batch mode, testing different stuff out when you’re trying to tune kind of bigger, crazier queries and query plans. Granted, this one wasn’t big and crazy, but that parallel merge join plan, man, that just set me off.

And I had to do the rest of this stuff because I feel like if I don’t talk about how awful parallel merge joins are, no one will. They’re truly despicable creatures.

I wish that they would be removed from query optimization, just to be honest with you there. So anyway, a little bit of a hodgepodge there, but again, stuff that interests me. Hopefully it interests you.

If it doesn’t, you don’t need to watch this. You can go do something else with 20 minutes of your life. It’s okay. Either way, my feelings won’t be hurt. So anyway, I hope you enjoyed yourselves.

I hope you did find at least some parts of this interesting. I hope you learned something, maybe a few things, because this was, again, a potpourri of knowledge. Yeah, what was I going to say?

Anyway, if you like this video, I like thumbs ups are good, as well as, you know, comments that don’t make me cry. At least, I mean, tears of joy are fine, but like tears of sadness. We don’t want sorrow, Eric, on these videos.

If you like this sort of SQL Server content, you can hit the subscribe button. It’s apparently a bell-shaped object. And you can join nearly 3,762 other data darlings.

Never going to be able to say that with a straight face. Who have subscribed to this channel and apparently find me interesting enough to keep getting notified when I’ve talked about SQL Server. So anyway, thank you for watching.

I do apologize for the audio issue on Friday’s video. There was no fixing it. Like, once it’s done, it’s done. So I just had to re-record the whole thing. I had lots of funny jokes about Justin Timberlake getting arrested in a tuxedo.

And you should have been there. It was great. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Hot Tips For Safer Dynamic SQL In SQL Server

Hot Tips For Safer Dynamic SQL In SQL Server



Thanks for reading!

Video Summary

In this video, I delve into the world of dynamic SQL security and how to keep your data safe from potential threats like SQL injection. I walk through a practical example where we create a table called `DropMe`, insert a row, and then execute a query that inadvertently drops the table due to SQL injection. This serves as a stark reminder of why it’s crucial to parameterize dynamic SQL properly. I also discuss how to safely handle cases where you need to pass in column names or other schema-related elements using SQL Server’s system views like `sys.columns`, `sys.tables`, and `sys.schemas`. By leveraging these views, we can ensure that only valid objects are used within our dynamic SQL queries, significantly reducing the risk of accidental or malicious commands being executed. Additionally, I touch on some practical tips for maintaining security, such as using string functions like `STRING_SPLIT` (available in SQL Server 2016 and later) to safely parse input parameters and logging invalid inputs for further review if needed. Overall, this video provides a comprehensive guide on how to secure your dynamic SQL queries against potential threats while ensuring they remain performant and efficient.

Full Transcript

Erik Darling here with Darling Data, and my microphone is on, but the sound bar just looks funny. Okay, well, we’ll deal with that, hopefully. Anyway, today was my oldest daughter’s fifth grade graduation, which I know sounds a little weird. A lot of people just have like a kindergarten and then an eighth grade, but, you know, in New York they shuffle you from a K-5 to a 6 to 8 school, so fifth grade is apparently a big deal celebration here. And, of course, you know, being here in America, they sang, they do the Pledge of Allegiance and they sang the National Anthem with the Star Spangled Banner and it kind of, kind of struck me sitting there in the audience that, um, there are a few words in the Star Spangled Banner that, um, people, they, like, aren’t common words anymore and just people might not know the meaning of. Like, uh, Gallantly or Rampart or even Spangled might send some people running for the dictionary. Who knows? Kind of, I don’t know, might need, might need some, uh, might need to throw some TikTok lingo in there if we want the Star Spangled Banner to remain a true top 40 hit. So, anyway, uh, in this video, uh, from Erik Darling of Darling Data, we’re going to talk about how you can keep your Darling Data, uh, safer, uh, in the, in the face of Dynamic SQL. And I’m going to show you a few things that I, I will sometimes do when I’m, when I’m working with Dynamic SQL to make sure that no, there’s no, there’s no, there’s no hijinks, there’s no shenanigans, there’s no tomfoolery going on and, and when, when, when queries get executed. So, uh, just, just the first thing we’re going to do is a small primer on, uh, the, the perils, the potential perils of the data.

So, uh, we’re going to create a table called DropMe. We’re going to insert a row into DropMe. And what we’re going to do is just, just run a very simple query where someone has SQL injected a drop table command onto a, onto a, onto a per, onto an input. And we’re going to execute that. And we’re going to see that, uh, at the very end of this, we no longer have our table called DropMe. In fact, we have, we have an error message saying invalid object name DropMe. And we, the reason we have that, that error message is because this is what our, our executed SQL string looked like. We had a totally valid select query, which executed, and I’ll show you that in a second.

And then after that ran, we had a drop table command. Also run. That’s, that’s unfortunate. If we go over to the results though, uh, we will see, well, there’s the result of our first select from DropMe up there. And then here’s the, here’s the result of our, oh, that jumped around a little bit. So you Dickens, zoom it. There’s the result of our first query. And there’s the result of our database name search. And you can see all the stack overflows that I have, uh, on my server, right? Obviously not good.

And obviously the solution is to parameterize the dynamic SQL. And so if we, we sort of rerun that portion of the script again, when we rewrite our dynamic SQL in a safe way, we will get no error messages, but more importantly, we don’t get any search results, which is, which is a good thing, right? Why is it a good thing? Because we didn’t execute an unsafe SQL command, right? I mean, I don’t think it’s a very, very, uh, prodigious way of working through a SQL Server.

If you, you’re okay with user searches running, uh, and returning results and also executing out of bounds commands. So if we look over in the messages tab now, there’s a printout of what we put into our, our database name, right? It’s just S percent wildcard, you know, any old thing that blows after that, uh, and a drop table command.

But in here, this, this looks different now, doesn’t it? We just have the parameter name. We don’t have that string sitting in there. And that’s great for things that you can actually parameterize.

There are lots of things in SQL Server that you can’t parameterize. At least not, like, in a meaningful way. Uh, that, uh, you might need to pass into dynamic SQL.

Like, you might, you might need to pass in column names or a schema name or a table name or even a database name. And none of those things can really be supplied as parameters to dynamic SQL. It’s not allowed.

So, here’s, here’s some stuff that I do to make sure that when, for cases where that, that does have to happen, uh, we, we stay safe. So, uh, SQL Server has all sorts of views in it that enumerate different objects. Like, we can enumerate databases from sys.databases and half a dozen other views.

Uh, we can enumerate, uh, schema from sys.schemas. We can enumerate tables from sys.tables. And we can enumerate columns from sys.columns.

So, when, if I’m going to allow someone to pass in a list of columns or a list, like, a schema and a table or a list of columns, that they want to return from a user query, I need to make sure that those are all safe as well. So, what I, what I’ll do in those cases is, uh, declare some placeholders in here.

And then I’ll do stuff like lookup the schema name and table name that got passed in. And since this, this isn’t dynamic SQL, we don’t have to worry about any filthy, dirty inputs in here. And then I’ll set the, uh, the safe schema and table names, uh, with quote name, because quote name is very important.

Um, uh, I, I was working with a database that had, database name that had a space and a dot in it earlier this week. And that was a real adventure in a number of ways. Um, um, found myself using a lot of parentheses for that one.

Uh, and this will make that safe. And then what, what’s nice to do is if you’re going to, if you’re going to use something like this over and over again, uh, rather than constructing this constantly, uh, just, you just have like one, like, thing that you concatenate the schema name and the dot.

And, you know, one of my big gripes about dynamic SQL is that, or rather dynamic SQL and SQL Server is that like, there’s no, there’s not like enough like typing for it. I mean, not like there’s a lot of like keyboard typing, but there’s no like, like types built into SQL Server to handle stuff like this.

So like, it would be cool if there was, you know, like, like an object type or something where if you passed in, like, you know, like a, like a table or a schema or a database or whatever else, that you would get back like a constructed object with, with the right, the dots in the right places. That’d be cool. It’d be nice if you could just, if you could, if you could, there was some constructor for that.

Like quote name, you, you, you pass something into that and it puts quotes around. It’d be nice if there was something to put dots in the right places too, so that you didn’t have to do foolish things like this and realize that you forgot a dot somewhere. You left a dot in and you didn’t mean to leave a dot and it becomes quite a hassle.

There’s a reason why dynamic SQL is, is a real art, is, is practice as a, is a true art form by few. A lot of hacks out there, a lot of hackers out there, but not a lot of true dynamic SQL artisans. And then I’m going to, because, you know, I’m, I’m, I’m living in the future here on SQL Server 2022, but you know, this, we have this string split function, which came around in SQL Server 2016.

And we have this string ag function, which came around in SQL Server 2017. The problem is, and this is where Microsoft really screws up again, is that you have to be in the database compatibility level that the function was released in, in order to use it. So to be, to use string split, you have to be in compat level 130 or better.

To use string ag, you have to be in 140 or better. And it doesn’t help. Now, now Microsoft was cool from like a query optimization standpoint, where it was like, oh, we’re going to give you all these use hints to like specify which like, like compatibility level you want the database, this query to be optimized in, in this database. Which allows for like, you know, some neat stuff to happen.

Like in 2016, you can like do a parallel, like insert select into a temp table or any other table, but temp table is particularly helpful for. And like, you know, like 140, 150, 150 and up, you can get all sorts of like batch mode on rowstore type stuff and other neat things, right? Like, you know, batch mode or memory grant feedback and other intelligent query processing features, which are lovely.

But, um, not, not these for some reason. For some reason, changing the compatibility level of the query doesn’t unleash these functions. So if you are using a really old version of SQL Server or a really old compatibility level, you are going to have to go the old school route of, rather than using string ag, you’re going to have to use some XML hijinks to construct whatever list, whatever comma separated list of things you want.

And if you are not using, uh, the, the 2016 or better compatibility level, you will have to work out your own string splitting scheme. Now, normally I would be pretty picky about this, right? Because this is not a good SARGA-able thing.

I see a lot of queries with string split in them or like some string splitter in them. It doesn’t have, not necessarily the built-in one. Uh, whether it’s like a, it’s a, it’s a meaningful query, not just hitting some dynamic management views. And, you know, like it’s an aware clause or a join or something else.

And, uh, that’s, it’s not a good idea for performance because SQL Server has no idea what’s in there. And it makes no attempt at, like, um, like, like, like creating a statistics histogram for the results of a string split. So you can get some really bad cardinality estimates when you start getting that stuff involved in a where clause.

But since this is just a very simple query against the dynamic management view, um, I’m not gonna get picky here. Right? This is, like, I can deal with this.

Right? We’re not performance tuning in this one. We are, we are trying to keep our query safe from the hackers. We’re concerned about the security of our stuff. Right?

Not performance tuning in this one. And then, um, if any of these come back as null, someone clearly did something either devious or mischievous or maybe just wrong. Uh, there’s a quote about, like, malice and ignorance and all that.

I suppose that could, I suppose that could, that could apply here. I’m not, I’m not sure how I feel about it. But, you know, just some, some general handling.

And if you wanted to take this further, uh, you know, especially for, like, the column list, you could even, like, you know, have a table variable or some other logging feature. Some other logging thing happened where you, like, logged a list of things that someone passed in that were incorrect. Right?

If you really cared about it, you could totally do that. I don’t care enough to do that here. It’s just gonna make the code longer. But you could absolutely, like, you know, stick a, like, a list of stuff from the column names parameter, uh, that were invalid column names. Right?

It’s a totally doable thing. And then down here, so this is what I kind of meant by, like, some things can’t be passed in as parameters. Like, you couldn’t directly pass this to dynamic SQL as a parameter. SQL Server would say, I don’t know what you’re talking about.

Likewise, you couldn’t pass this to the, you can’t pass, like, you know, database schema object names, table names, stuff like that, without SQL Server saying, no. Can’t do it, not happen. And what this allows for is for if someone wants to select the top 100 rows from the user’s table, they can do that.

And the query will look like this. And one thing that is very nice for you to do in all your dynamic SQL is to put a comment with the store procedure name in there so people know where the hell these things come from. It’s not just some buck-naked query running around in your plan cache or query store that no one can, no one can identify.

But what’s cool about this is then if we were to run our store procedure with something devious, mischievous, and malicious in it like that, it would not show up in there. Right? Our messages tab would still just show the three columns that were actually valid for the user’s table and the comment list.

So, good tips on keeping your dynamic SQL extra safe. Right? And if we look at our drop-me table, it still exists despite someone’s best efforts to drop our poor drop-me table.

Right? Now, this is just one kind of example with object names, like, you know, schema, table, database, column name, stuff like that. You might have other instances where, you know, and I grant you, I grant you every grudge you have against this store procedure for not needing, not actually needing dynamic SQL.

But you might have a more involved case, and without really, like, doing something somewhat complicated, this is kind of just a good general example of, you know, an alternate thing you could do that might make life better, might actually perform better, might cut some crap out of your query plan, stuff like that. Where, let’s say we have a store procedure where we want to search display names in the user’s table of the Stack Overflow Data. You probably won’t be working on this particular store procedure, but you might, you know, let people search like this.

And you might have dynamic SQL. Well, again, an unsanitary dynamic SQL that looks a little something like this, where, you know, you’re just passing in some bare-naked parameters to a thing like this, which is, you know, obviously this is a horrible idea. You shouldn’t, you shouldn’t trust anyone this much.

Wouldn’t, wouldn’t, I wouldn’t trust us, I wouldn’t trust anybody with something like this. But, one thing that you might find useful to do is, rather than have all this stuff in your dynamic SQL, you might find it performs better, and it keeps you safer if you do something like this, where you just, you create a temp table, right, and then you enumerate just the IDs of the things you care about. You stick that into a temp table, and again, like, this doesn’t matter much, right?

If you want to throw a recompile hint on here, whatever you want to do, you can do it to tune up this query, but, you know, for the context of what we’re doing, this is pretty quick and fast. And just dump the contents into a temp table, right? Nice little primary key on there, too, right?

Look at our primary key. Look at our nice clustered primary key on this table. Isn’t it gorgeous? Most beautiful thing you’ve ever seen? I think so.

And then we can use that temp table within the dynamic SQL block, because when you, you know, it’s state, like, we still have that temp table created in this context, and we can execute that dynamic SQL referencing that temp table, which is created outside. Now, there’s funny things about creating temp tables inside of dynamic SQL, in that, like, the first one is that this is not reciprocal. So if you create a temp table inside of dynamic SQL, you can’t reference it outside of the dynamic SQL.

But if you create a temp table outside of the dynamic SQL, you can reference it inside of dynamic SQL, which is pretty cool. I like it. I’m into it.

I dig it. I can get on board with that. The other funny thing about creating temp tables inside of dynamic SQL is that it makes them not cacheable, which might be good or bad. Paul White has three magnificent, wonderful blog posts about temp tables and temp table caching in store procedures, stored procedures.

I end up sending people to those quite a bit, because they are, as Paul is wont to do, he has written wonderful, detailed, comprehensive blog posts about these topics. So, yeah, so, like, you might have an issue where, you know, you create a temp table and SQL Server caches that temp table and part of what it caches is the histogram. And then the next time you go to use that temp table or you go to run that store procedure, SQL Server reuses the histogram for that temp table, which is a wild thing.

Absolutely wild thing. But if you create a temp table inside of dynamic SQL like this, it is not cacheable, and the histogram or nothing else will be cached along with it. Now, that, of course, has upsides and downsides, right?

If you’re running a store procedure a ton, that temp table caching might be a good thing. You might find a lot less contention in temp DB if you, you know, are reusing cached temp tables. You might, you know, you might even find that performance isn’t that big a deal because you meet the thresholds to, like, uncache or whatever you want to call it and update the statistics on that temp table.

But, you know, in general, this is kind of like a more stranger angle on performance tuning an issue with temp table caching. But fun nonetheless, because every, sometimes you get to pull these weird tricks out and, boy, they, they work, work like gangbusters. I don’t, I don’t really like that phrase that much.

I don’t know. It’s kind of, again, it’s kind of old-timey. It’s like a, it’s like a gallant rampart. If my gangbusters on this gallant rampart are all spangled.

Anyway, that’s about all I had for this one. I think some useful tips on making dynamic SQL safer in your database. Again, you know, if you’re letting people pass in database, schema, table, column names, you’re much safer if you look those up in dynamic management views based on what people pass in and either give them the yes, no based on that.

If you, you know, if you’re using other stuff, right, if you’re not using object names and you’re still writing dynamic SQL like this, you are doing yourselves a bad, you are doing your company a bad, you are doing your data a bad. And you should really start writing parameterized dynamic SQL instead. You can follow along with the examples up above if you need to.

And if you, if you really want to start performance tuning stuff in dynamic SQL, you might even find that simplifying the amount of work done in the dynamic SQL and taking, you know, larger, more complex parts out and dumping just like simple lists of keys into a temp table and using that temp table in the dynamic SQL instead gets you far better performance. So anyway, thank you for watching. Hope you enjoyed yourselves.

Hope you learned something. If you like this video, thumbs, thumbs upsies and, you know, helpful praising comments are always welcome. If you think I messed something up or I’m wrong about something, you can, you can tell me that too.

But I didn’t mess anything up and I’m not wrong about anything. So you might, you might, you might, you might be unsatisfied with the response there. If you like this sort of SQL Server content, usually it’s performance tuning, but this is sort of like a hot dynamic SQL tips because here at Darling Data, we believe in hot SQL action.

So, you know, I got all sorts of useful stuff up my sleeve. Don’t, don’t, don’t, don’t make a judgment on account of my sleeves being short. I can, can stuff a lot of useful stuff in here.

But if you like this sort of SQL Server content, you can join nearly 3,700 and, hold on, let’s wait for it, 29 other dedicated, observant, faithful SQL Server or whatever you are, professionals, by subscribing. And I like subscribers because then I get to keep saying bigger numbers every time I do one of these. So that’s nice, that’s nice for me.

It’s a good, good pat on the back for all E-Darling of Darling Data. Anyway, it’s starting to, starting to get hot in here and I, I want to open the door and let the air conditioning in. So I’m going to, going to can this one, upload it and, and hope and pray that you watch it and that you, you follow my instructions here and that you, you don’t end up in the newspapers for, for bad reasons.

So anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Adaptive Joins In SQL Server

A Little About Adaptive Joins In SQL Server



Thanks for watching!

Video Summary

In this video, I dive into a scenario where my client upgraded from SQL Server 2014 to 2022, taking advantage of the new adaptive join features. I explain how these intelligent query processing capabilities were previously hindered due to insufficiently wide indexes and the additional overhead of potential lookups or sorts. By creating more comprehensive indexes that cover all necessary columns, we successfully enabled adaptive joins for certain queries, demonstrating their importance in optimizing performance. This video is packed with practical examples and insights into what’s required for SQL Server 2022 to leverage these advanced features effectively.

Full Transcript

Erik Darling here with Darling Data. Today’s video, we’re going to talk about a situation where a client of mine, you know, the nice people who pay me, you could be one of those too if you paid me, they had switched, upgraded to SQL Server, well actually to 2022. So they went from like 2014 to 2022. So it was a pretty big jump for them. And part of the reason why they wanted to go there was because they wanted all of these crazy, fancy, new, intelligent query processing features. One of them was adaptive joins. And the reason for that is because there were a number of queries that we found where adding a join hint, like either hash join or loop join, or sometimes a loop join, or hash join, or just to exclude merge joins from the picture, or just to exclude merge joins from the picture, because screw a merge join, were like helping query performance. And one of the problems that we found once we got to SQL Server 2022 and up the compatibility level, this is Enterprise Edition, of course, because Microsoft only gives the good stuff to people who spend money. I give the good stuff to everyone for free, even though even though I was recently accused of having useless drivel in my videos. The problem is that the nice folks at Beer Gut Magazine, they pay me by the minute to do these, so I do try to stretch them out a little bit. Maybe there is some useless drivel in here, I don’t know.

If you want some real useless drivel, I’ll tell you that this spot on my shirt is from my morning protein shake, because I can’t figure out how to pop the lid, shake up the thing, and then you get protein stuff on the flappy lid, and then drink from the flappy lid, or drink from the shaker, but the flappy lid drips on me. So this is my fitness credibility badge right here, this little protein shake spot on my shirt. Alright, so that’s all the useless drivel in this video. The useless drivel about a protein drivel, right there.

So one of the things that we found was that SQL Server was not choosing adaptive join plans, and one of the reasons why it was not choosing adaptive join plans is because we did not have sufficiently wide indexes. So, to give you a little background on adaptive joins, they were introduced in SQL Server 2017, and they require some sort of batchy mode-y thing to happen, either because you have a columnstore index on your table, or you’re using the fancy pants enterprise edition of SQL Server, and you’re getting batch mode on rowstore. Or maybe you create a temporary table, or just like a helper table in your database that has no rows in it, but has a clustered columnstore index on it, and you can do a fake left join on 1 equals 0 to that thing to get some of the batchy mode-y intelligent query processing features to kick in for you.

So, that wasn’t happening. An extra sort of level of costing that makes the join ineligible for an adaptive join. SQL Server doesn’t want to choose between a hash join and a nested loops join, where on the inner side of the join, you might have to do a lookup to get rows out.

At least, I’ve never seen it choose an adaptive join where there was a lookup involved. So, you know, maybe you can find an example of it and prove me entirely wrong, but, you know, kind of doubt it. Anyway, let’s look at a couple examples of this, right?

All batch mode adaptive joins start out as hash joins. And there’s this, like, threshold for the join, where if it passes that threshold, it will remain a hash join. But if it doesn’t pass that row threshold on the outer part of the join, then it will switch to nested loops.

All right? So, we have, I think I already created this index. Let’s make sure.

I’m not a fool. And let’s, actually, you know what? We’re going to do a little prep work because there’s another index down here. I’m just going to make sure that that index, the second index definitely isn’t there, but the first index definitely is there.

And that will make the demo go a lot smoother. All right? That will at least prolong my SQL Server career.

That will keep me out of the fitness industry for, like, another week or so, I think. All right? So, we’ve got this index.

Now, this is a simple example with just a single key column index. In real life, I know your indexes are probably a bit wider, and your queries are probably even wider than that. So, this is kind of a common thing that can happen.

So, what we got here is a query that does a couple left joins, and we’re going to force optimizer compatibility level 160 right here so that we are totally and completely eligible for batch mode on rowstore. And if we run this query with query plans turned on, it’s not terribly slow, but the point of this demo isn’t to show you a slow query and then a dramatic improvement with adaptive joins.

The point of this demo is to show you what your sort of requirements are for adaptive joins kicking in. All right? So, in this query plan, we get a nested loops join, which was not inappropriate, but this plan also features a key lookup.

And so, the adaptive join is not a thing here. All right? So, we do get a batch mode hash join later on, but SQL Server does not make an adaptive join choice here.

Okay? And that’s all because SQL Server doesn’t want to have to make that choice with the additional overhead of a potential lookup involved. So, let’s create an index that covers everything in the query.

Remember, we do a lookup down here, and what we’re doing in the lookup is we have a predicate on the score column, and we output the score column. So, if we have the score column in the index, we’re going to create a brand new one here, right, that has owner, user ID, and score on it. We’re going to make a brilliant indexing choice, maybe, and we rerun this query.

Now, we get an adaptive join between the users and the post table. We still don’t get one for the comments table. Now, you can have more than one adaptive join in a plan.

SQL Server just doesn’t go for it here. SQL Server doesn’t think it’s worthwhile to invest in an adaptive join here. You can totally have multiple adaptive joins in a single execution plan. But now we get the adaptive join here between users and posts that we didn’t get before because the post table did not have an index that adequately covered all the columns that we needed for it.

Now, I did tell you earlier that merge joins were not part of this consideration. And the reason why merge joins aren’t part of the consideration is because merge joins expect sorted input. And when we need to sort input, like, say, for this query, or rather for this join, SQL Server doesn’t want to think about, oh, we needed to sort that.

I don’t want to, like, in the same way that, like, having to do a lookup is an additional costing, like, perspective for the query, having to sort data to employ a merge join is also an additional costing thing. So merge joins are not part of the picture.

And, you know, of course, because I hinted, even though we don’t need to sort for this part between users and posts, I just have a merge join hint for the whole thing, so it shows a merge join hint here. I just wanted to show you the merge join plan to show you that, you know, if you need to, like, this is why merge joins aren’t a consideration for the adaptive join.

Now, what’s interesting is that neither merge joins nor nested loops joins can operate in row mode. I’m sorry, in batch mode. They can only operate in row mode.

So only, like, if you get a hash join for an adaptive join, that will be in batch mode. But if you get an adaptive join with nested loops, that’ll be in row mode, right? Okay, so that’s all good there.

And I do want to show you one kind of funny thing about this query. Let’s see if I can get a somewhat better execution plan for it. It looks like last night it was a little bit better.

But this query plan features some very odd operator timings. Very odd indeed. And if we look at what happens in here, right? Where’s the funny part?

Where’s the funny bones? It’s kind of right at the end here. If we look at the very end of this query, we get 565 milliseconds and then 1.1 seconds here. But then if we go and look at the query time stats, the CPU time and the elapsed time agree with the operator before the gather streams.

So that’s amusing to me. Anyway, apparently the operator time code could use some work. I would imagine with the influx of summer interns, now that we’re entering June, July, and August, Microsoft will have some top interns on top of this operator timing code.

Any day now, it’ll be all fixed. Anyway, that’s all I had to say for this one. Thank you for watching.

I hope you enjoyed my useless dribble. I hope you learned something, even from the useless dribble. I hope you enjoyed yourselves. If you like this video, as usual, thumbs up and nice comments. Even if comments supportive of my useless dribble are always appreciated.

And if you like this sort of SQL Server content, you can join the… Hold on. I want a freshly up-to-date… I want a read-committed up-to-date number here.

You can join nearly 3,719 other SQL Server professionals or some sort of… I don’t know. I actually don’t know the makeup of the audience.

I say SQL Server professionals. It could be just derelicts off the street who like to watch my videos. Maybe they’re like, I’m going to get my hands on his Adidas t-shirt someday. I don’t know.

But yeah, you could join nearly 3,700 and… Yeah, it hasn’t changed. 19 people who subscribe to this channel and get helpful little bonks on the head. Every time I publish one of these…

Well, I guess you could interpret them as either… Beautiful Gems of Wisdom. Or… Beautiful Gems of Wisdom Drizzled in Useless Dribble. Drift…

Protein Shake. I don’t know. However you want to call it. Anyway, thank you very much for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.