The State Of Table Variables In 2025

The State Of Table Variables In 2025


Video Summary

In this video, I delve into the state of table variables in SQL Server as we approach their 25th anniversary since their introduction in SQL Server 2000. With Microsoft’s recent announcements about the “try us out” program and potential new features for SQL Server 2025, it’s an interesting time to revisit how these variables have evolved. I explore the nuances of table variable cardinality estimation, particularly focusing on SQL Server 2017’s interleaved execution and 2019’s deferred compilation, which offer some improvements but still fall short in providing statistical information about the data within the table variable. By walking through practical examples using SSMS, I demonstrate how these variables behave differently outside of stored procedures versus inside them, highlighting the challenges and potential pitfalls for query optimization.

Full Transcript

Erik Darling here with Darling Data. And again, looking like it’s going to stay that way for a while. My repeated attempts to get acquired for a stack overflow amount of money have gone unanswered. So unfortunately, we’re just going to have to keep doing these videos for a while, I guess. In today’s video, we are going to talk about the state of the table variable. You know, it is the year 2025. SQL Server 2025 is, I guess, sort of around the corner. You know, usually, usually these things happen around the end of the year, but you know, since Microsoft has announced the, like, you know, the, the, the, you can try us out program, you know, we can, we can safely assume that something is in the works. And, you know, people, I, so this is actually going to be interesting. Nobody’s going to talk about it. So, November 30th of this year will be the 25th anniversary of SQL Server 2000. And in SQL Server 2000, table variables were introduced. And so this November will be the 25th anniversary of people being absolutely wrong about table variables. Now, a little bit has changed, right? Back in SQL Server 2017, I believe, think about it.

It was 2017. I forget how far it got backported, but I think it was one of the IQP features from 2017. We got something called interleaved execution for functions that, that, that, for multi-statement table valued functions that, that have, you know, return a table variable, where a SQL Server would no longer just guess one row. It would now guess a thousand rows or something like that. And then SQL Server 2019 introduced another cardinality estimation help, another, maybe it’s a hack. Another part of the thing that’s part of the intelligent query processing family called table variable, table variable deferred compilation, which means that in a normal query, like you were just running a query in SSMS, like, like outside of a module, let’s say, like outside of a store procedure. SQL Server would just figure out how many rows are in the table variable. It would not, like, so like if you put a thousand rows in there, SQL Server would be like, oh, there’s a thousand rows in here.

There, there is still, however, no statistics, even with an index on your table variable, there is still no statistical information about the data, the distribution of data within the, any column on the table, in a table variable. So you’re still missing that. But then the context of a store procedure that changed a little bit more. Because now, instead of just always getting that one row estimate, again, unless you like, you know, throw a recompile hint on there or something.

What SQL Server will do is sort of parameter sniff your table variable, not for any statistical information within the column, but just the number of rows in there. So like, if you put like a one row in a table variable, the first time a store procedure is executed, it’ll still keep guessing one row. But if on that compilation, you put a thousand rows in there, SQL Server will keep guessing a thousand rows.

That could be good or bad if the number of rows in that table variable has a wild swing to it. If like on one row, it’s, sorry, if on one execution, it’s, you know, a thousand rows, the next execution, it’s a hundred thousand rows. Obviously, that’s just another parameter sniffing problem for you to deal with.

So maybe you need that recompile hint anyway. I don’t know. But anyway, we’re going to talk a little bit about some of that stuff in today’s video. So as always, if you would like to support my grand endeavors and designs in producing this content, you can go to the video description and you can sign up for a membership for as low as $4 a month.

If you don’t have four bucks a month, your mom cut your allowance off, I don’t know, something happened, you know, dad lost his job at the mill. So if you want to ask me questions, go to that link and there will be like, you know, some stuff about question asking. Right. So more information there. Good things to know.

If you need help with your SQL Server, if, you know, you’re watching these videos and you’re like, man, Eric maybe seems to know what he’s talking about a little bit there. You would be correct because as Beer Gut Magazine has established many years in a row now, I am the best SQL Server consultant in the world outside of New Zealand. If you need any of help, if you need any help with any of this stuff, I am available and my rates are reasonable.

If you would like some very reasonably priced, very reasonably priced, apparently speaking clearly is not a reasonable offering for me today. But if you would like some very reasonably priced SQL Server training, you can, of course, get all mine. Again, video description has everything you need for about $150 USD when you apply that magnificent discount code right there.

Upcoming events, we, of course, still have SQL Saturday, New York City 2025 coming up on May the 10th at the Microsoft offices in Times Square. So be there or I guess you’re going to be a square either way, aren’t you? All right. Anyway, let’s move on with the show.

We have SSMS open. Let’s make sure our database is in the correct compatibility level to get table variable deferred compilation happening. And we can also see, as far as database scope configurations go, that this deferred compilation underscore TV.

Do not adjust your television set. They couldn’t be bothered to write out table variable or, you know, call this something a little bit more human friendly. But it is on.

By default, it is on. You can see that here. One thing that I am sort of excited about in Management Studio version 21, right? This is SSMS 20. Management Studio 21 has all of the data, has like a GUI page for all of the database scope configurations.

So, you know, I guess it is about time there. So let’s make sure that we have the right index in place. And while we do that, I’m going to walk through a little bit of what’s going on here.

Now, I’ve got this query with, I mean, there’s a little bit of sort of superfluous stuff in there that I use to make my life a little bit easier down below, which I’ll show you in a moment. But these are the top six parent IDs in the post table, right? So you can see parent ID zero.

Now, again, this is a normalization problem in Stack Overflow. They don’t have a questions and answers. Rather, they don’t have separate tables for questions and answers. They only have posts.

And posts is all the questions and all the answers. And you have this parent ID column that tracks which question answers belong to. So if you have a parent ID of zero, you are a question.

If you have a non-zero parent ID, you are an answer or some other type of thing that’s tied to a question. And, you know, there’s a pretty steep fall off here, right? So, like, there are six million questions and, well, six million fifty thousand eight hundred and twenty questions in the post table.

And then a few of the posts have a sort of high number of answers. What kind of post would generate five hundred and eighteen answers? I don’t know.

I mean, on the database administrator stack exchange, you’re lucky to get one answer that’s an answer. Well, I guess not lucky, but getting two good answers is, you know, extremely rare. But on Stack Overflow, actually, I’m not saying that any of these answers beyond one of them is good.

I’m not even saying that the answer that’s currently marked as the answer is even still correct. Who knows? This might be 20 years old. And, well, I guess 2008 can be 20 years old, almost 20 years old.

And it could be woefully out of date at this point. It could be all sorts of wrong. But this is this is what we’ve got, right?

Just none of that really makes a difference here. All we care about is that there are some numbers. So let’s outside of a store outside of a module like a store procedure. We’re going to look at what table variables do.

And I have a recompile hint here because I just I just want you to see that the recompile hint doesn’t help us statistically in any way. But I’ve got a table variable up here that has a clustered primary key on it. Right. So I’m creating an index on the table variable because I want you to understand that SQL Server does not have any statistical information about what’s in the table variable no matter what.

So but what what I what I thought was sort of interesting is if we run this query passing in just one parent ID. Right. And I’m going to save the zero one for later because that’s when things get sort of interesting. Like this is like this is a good enough query slash query plan.

You know, SQL Server like like one row is coming out of here. Yes, that’s actually correct for this. We only we did only put one row in the table variable. But then SQL Server guesses three rows are going to come out of this join to posts.

In reality, we get 518 rows. If we were using a temp table with no pound sign temp table where I’m sorry, pound sign temp table or however they do it, then SQL Server would be able to generate statistics, would be able to do better cardinality estimation for the join.

But what caught my eye a little bit was that every time you add a row to the table variable and this is I think I want to say this is only true for the new the default, the newer cardinality estimator. I want to say default because I hate I hate calling that thing the default.

But I want to say this is only for goes for this where they do the course join alignment. It could be true for the legacy one. I didn’t get around to testing that.

But for every row you add in there, SQL Server basically guesses that three more rows are going to match here. So now instead of one row, we get three rows. And instead of SQL Server guessing that we were going to get three rows as a result of the join, now we get nine. And if we put in the last two, the last two rows for this set that we might care about, you’ll see that now the SQL Server is going to guess 15 rows.

Right. So for for when we put five rows in there, SQL Server is like, oh, well, I think 15 are going to come out. Right. So we get that number down here. So this this still isn’t a very good guess.

Right. Because, you know, like, granted, this is a pretty small number of matches. So like this kind of being off by this, like even though it’s like 11,700 percent off, like the index is good enough and the query plan is good enough that like it’s not a big difference for this. Where that gets real messed up, though, is if we change this, of course.

Right. Because if we like this is like even though these these questions have the most answers as the current state of the 2013 SAC overflow database, like that’s still not that many. Right. We still get like like seventeen hundred and fifty five rows for all of them, which isn’t a ton. Right. So guessing 15 and getting 17, 55 is far from the end of the world.

Right. You’re not going to you’re not you’re not going to get the worst query plan in the world for that. Right. So let’s look at this now, though, with one row, but that one row is zero. So this is where things do start to make a difference, because now we have we put one row in the table variable, but SQL Server doesn’t know what that one row is.

So SQL Server is going to guess three rows again. But that’s that’s a way off guess. Right. So SQL Server is like, oh, well, I think one row is going to I think one row is going to come out of here, which it’s right about.

It just doesn’t know what the value for that one row is. So we can’t look at the statistics that are available on the post table and say, hey, how is the value coming out of this thing going to match up with the values down in this thing? Right. So SQL Server knows one row is coming out of here.

The one row estimate is correct, but it doesn’t know what the value of that one row is. So when we come in, when SQL Server is doing its initial compilation of the query and it’s like, well, how many rows are going to match in here? It has no way of like looking at the histogram in here and saying, OK, like I have a histogram value either for zero or that is like has zero in its range.

I can make a better guess about how many rows I think are going to come out of this. So it chooses the same query plan as it did for the smaller number of rows and things just get all mucked in here. Right. So this is a terrible this is a terrible execution plan for this for for the value of zero.

But because SQL Server doesn’t have anything statistically to line this up with this, but just like we just can’t we just can’t. There’s really no better way of modeling this sort of thing. There’s no better assumption you could make.

Right. So that’s it. That’s like keep in mind this is, you know, with a with a recompile hint and with the cluster primary key on this thing. So SQL Server can’t make a very good guess here at all. Right. This is lost in the woods.

Now, like and like like I said, that does change a little bit in the context of stored procedures. Right. And I’m taking recompile hint out of the equation here because I want to show you that the number of rows that the store procedure compiles within a table variable persists from one execution to the other.

So what I’ve done is I’ve created a table type called pattern ID so that I can pass in a table valued parameter here. Now, table valued parameters do behave somewhat differently than table variables. They actually had this behavior before.

So like if I so like even so like if I declared a table variable inside of the store procedure and dump this table variable into that table variable, the same thing would happen. Right. So I just want you to want you to understand that a little bit.

But I think I just did this correctly. We’ll see that table type is already created. I’ve already I’ve already run through and tested this. So let’s do this once.

Let’s declare we have to declare our table variable. We have to map it to our table type. Right. That’s what we’re doing up here. Then we’re going to insert our five row matches, five rows of matches into the table variable. And then we’re going to execute the store procedure passing in the table value parameter to it.

So when we do this SQL Server in the execution plan here is like, ah, there are five rows in you. Wonderful. We’ve got you’ve got five rows.

We’ve got that all figured out. Right. Cool. But now when we run this for zero, you know, we’re still we’re going to get the bad query plan again, because even in a store procedure where SQL Server, you know, uses the table value parameter, which is backed by a table variable, we are unable to get good cardinality.

Now, I think this is especially important because let’s just look at the query plan real quick. We get that same thing. This all takes about 10 seconds. SQL Server, again, it cached the query plan where it thought five where we had five rows initially.

But now we get we only have the one row in there. But SQL Server still doesn’t know what that one row is. Right. So even in the context of a store procedure, table variables are still pretty messed up. Now, where this, I think, gets important is sometimes people think that the table value parameter situation is different because they’re backed by a table type.

So notice when I created this type here, parent, I created a type called parent IDs as a table. People sometimes get this confused with this and think that this is creating a like a real backing table for the table value parameter. But it’s not. Right.

So even though we have a clustered primary key on this thing, SQL Server is not generating statistics on this either. Right. There’s no statistical information about this because we’re using the table variable to hold this or to sort of be assigned that type. Right. So the table variable here is really where you run into the issue because SQL Server is still not giving any love to table variables, histogramically, statistically, whatever you want to call it.

So like even table value parameters are backed by a table type. They still don’t get statistics. SQL Server 2019 deferred table variable compilation only gives you the table cardinality, the number of rows in the table, not the contents of the columns. Right. So still no histogram, still no table level distribution stats.

And you can still run into all of the same problems with table variables if you have skewed data or rather you join to that table variable in some way. Right. It could be a join, some query exists, whatever. You know, however you want to correlate to the to the table variable, you still run into the same problem where SQL Server doesn’t is unaware of the contents of the table variable.

And the cardinality estimates that you get are not going to reflect the sort of reality of what you’ve done in there. So anyway, that’s my spiel. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And I do hope that you will stick around and try to learn some more SQL Server stuff from me. I do enjoy I do enjoy talking about it, don’t I? You can tell because I just went on for about 18 minutes about table variables again.

But anyway, what can you do? Some some horses deserve to be fed. And because, you know, I still have to help clients with this table variable stuff, I feel like I need to keep talking to the greater world about this table variable stuff because God almighty, no one seems to be learning their lesson.

So anyway, I’m out of here. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Fun Plan Shape With Aggregates In SQL Server

A Fun Plan Shape With Aggregates In SQL Server


Video Summary

In this video, I delve into a fascinating query optimization scenario that caught me off guard due to its clever and efficient plan shape. While exploring a query on the Stack Overflow database for an advanced databases course taught by Andy Pablo at Carnegie Mellon University, I stumbled upon this intriguing SQL Server query transformation. The optimizer’s ability to unnest subqueries in a way that avoids redundant table scans is particularly noteworthy. This optimization not only demonstrates the power of SQL Server’s query optimizer but also piques my curiosity about how other database systems handle similar scenarios. I discuss the specific trace flags and hints involved, as well as the unique plan shape that results from these optimizations. By sharing this insight, I hope to inspire further exploration into query optimization techniques across different databases.

Full Transcript

Erik Darling here with Darling Data, and we’re going to be looking at what I think is a really fun plan shape in today’s video. And I came across this because I was watching a video series. So a little bit, little background is a really smart fellow named Andy Pablo who works for Carnegie Mellon University. He is the professor of databaseology there. And he, every, you know, semester, he does like an intro to databases course and advanced databases course. He’s running running running one right now on, uh, query optimization and they’re all, they’re all available for free on YouTube. So if you list, look, look up like CMU database group, you’re bound to come across that stuff on YouTube. Uh, and he was, there was a video where he was talking about like, uh, you know, different query optimizers and like stuff that they’re good at and stuff that they’re maybe not good at. And, um, one of the things he said, is that like, uh, like Postgres and Oracle are, or some collection of databases are better at SQL Server, uh, at this one specific like type of query, like unnesting subquery thing. And I was like, whoa, I’m the one who gets to say Microsoft’s not good at stuff here. Not, not you. I can say whatever he wants, but, uh, yeah. So I was like, hmm, well, this, this, I gotta see. So I went and I, I wrote what I thought was like a pretty reasonable approximation of the query using the Stack Overflow database.

Now I was quite surprised by the plan shape because it wasn’t bad. And, uh, I think it actually has kind of a cool optimization in it. So we’re going to talk about that. Um, before we get into all that stuff though, uh, you, you may have noticed this slide has finally changed for the first time in like months. And the change here is that, uh, I will be, uh, accepting office hours style questions anonymously. Uh, if you go to that link right there, this, this will be in the video description.

Of course, like everything else is like when in doubt, refer to the video description. This will be in the video description. So if you, uh, would, if you would like to ask me anonymous questions that you would like answered on YouTube, this is given, keep in mind, this is not private consulting help. If you need that, you have to hire me and pay me. But if you want to ask an anonymous question, I will do my very best to answer it here on YouTube. Uh, but aside from that, all the same stuff applies. If you would like to support my endeavors with this channel, you can sign up for a membership again. Link for that is in the video description.

If you do not have the four bucks a month that it costs to show me a little bit of cup of coffee, love, uh, you can like, you can comment, you could subscribe. And now, now you can take up more of my time by asking me questions too. So lucky you, it’s your big day. Uh, it’s your turn to shine. You absolute champion. Uh, if you need help with your SQL Server, I am the best SQL Server consultant in the world. If anyone tells you different, they are liars, they are charlatans, they are frauds. Uh, do not, do not buy their line of nonsense.

Uh, so if you need help, these are my services and my rates are reasonable. If you would like some other reasonable stuff from me, like say SQL Server training, you can get all of mine. It’s about 24 hours of content. Uh, again, refer to the video description for, for, for more links. Uh, you can get all 24 hours of mine, uh, for about 150 US dollars. It is the best deal on the internet as far as I’m concerned, uh, upcoming events. Uh, I’ll have more of these soon.

I’m waiting on dates to get locked down for some stuff, but right now, SQL Saturday at New York City is happening on May 10th, 2025 at the Microsoft offices in Times Square. Uh, it’s going to be a lot of fun. Um, I, I’d like to say something funny, like I’m signing autographs, but, uh, A, no one wants my autograph and B, I’m not signing autographs. So we’re just going to have to deal with that. Anyway, let’s get on and let’s look at this fun plan shape.

Now, uh, I’ve got some stuff down at the bottom and the stuff down at the bottom, if we talk about this from the bottom up, uh, we’ve got some, we’ve got a couple of trace flags there. Um, one of them, 8619 is there. So we’ll output in the messages tab, uh, applied transformation rules to the query. Because I want to show you one of those, uh, we’ve got 3604, which will means that stuff will get output to the console, the messages tab.

I’ve got a recompile hint. So SQL Server has to apply rules every single time. And I’ve got the query compat level set to 160, not because you need 160, not because you need the new cardinality estimator for this plan shape to happen. And only because when I use 160, it runs in batch mode and it runs a little bit faster than it does in row mode.

And sometimes time is of the essence, isn’t it? But the plan shape that I want to show you today is, uh, from this section right in here. Now, uh, you know, you have this section of the query up here, and this is just a simple join from the post table to the votes table.

And then down here, you have this thing saying where the score in the post table is equal to the max score in the post table, right? So this is another reference to the post table. We have one, two references to the post table here. Now, what I think is really interesting about this query plan optimization is that when you look at it, you might be expecting that we will have, uh, two references to the post table in the query plan, right?

Because after all, we’re selecting from the post table up here. And we are saying where the score in the post table is equal to the max score in the post table. So you would expect to see where you expect to have to touch the post table once for the from, and then again for this from, but you, you actually don’t have that.

If you go and look at the execution plan, you have this, right? You touch the post table once here, you touch the votes table once here, and there is no other reference to the post table in here.

SQL Server applies kind of a neat transformation where, um, it grabs the top one with ties in here, and it sorts, uh, the data for that by score descending here. So we pull out all of, um, we pull out all of the, the score table stuff here, and that’s, that’s all, that’s actually all that we need and all that we use in the plan, right?

We don’t, we don’t have to actually do anything. SQL Server takes the, takes this query and transforms it into just a couple aggregates. And then way over here, uh, we have a filter, but, and you would think maybe, maybe this filter is where we’re applying this, but, but we’re not.

The thing that we’re filtering on is for the count. So like part of the query that I have down here at the bottom is where count big is greater than one, right? And that’s what this filter is applying.

We are not actually applying any filter regarding that, comparing the min score to the max score, right? Like that, like, or rather, sorry, just comparing the score to the max score. We don’t actually need a separate access of the post table to do this.

Like the reason why this plan shape caught me off guard is because A, it’s pretty good. It’s a pretty smart one, right? It’s, and this is not new. Uh, Paul White blogged about, uh, something similar 15 years ago.

This is not a new addition to SQL Server. This is not a new thing, but, uh, you can see the rules that got applied over here. And, uh, where is, uh, believe it’s this one, uh, that does that, uh, that does the, the, the, this is the rule behind responsible for the, the transformation, uh, that does this.

Right? So we actually just kind of aggregate this data once and then get the min from, from that as well. So SQL Server is able to do something really smart here and like not have to touch the post table twice.

And this got me thinking like, what are these other query engines doing? That’s better than this. Like I’m, I’m now, I’m now I’m just absolutely curious, furiously curious is what these other things do. So I don’t know.

I’m going to have to, maybe I have to go figure that out. Cause I, cause I’m, I’m just very interested now. Now there are a couple other things that I want to show you here. Uh, one of them is this is obvious.

This would obviously change the meaning of the query because this is correlating, uh, where the owner user ID out here equals the owner user user ID up here. This doesn’t, this will change the query results. Right.

Um, I’m actually not sure if I want to run this with it to actually get results. It might be smarter to just get the estimated plan. But notice that like, even with that sort of additional complication added, added in SQL Server doesn’t still doesn’t need to touch the post table twice. What does get added is this segment operator.

So sort of like when you have a row number with a partition by SQL Server adds a segment operator to mark those segments and do give you like the partitioning, uh, groups for like row number dense rank or whatever. So SQL Server just adds a segment operator to this. You still have the exact same plan shape just with this one additional operator added in.

Now this is only, uh, available for direct equality predicates. So like where score equals something, if you were to change this to like greater than or equal to, or less than or equal to, or any other, uh, non equality part, or even like not equal to, or something. Uh, then SQL Server would no longer have that available to it, right?

Like now we have two accesses of the post table where SQL Server like aggregates to get one part of it here. And then does like this whole part where it aggregates to get the next part of it. So there’s like, there, it can’t be applied to every single query form.

It’s really only a quality predicates right here. But usually this is, I don’t know, I don’t know, this is the way most of these queries get written anyway, unless, unless you’re dealing with like average and you want to find like where something’s greater than the average. Finding where something is greater than the max is pretty, uh, pretty kind of a stupid query actually.

And just finding anything that’s greater than the minimum, maybe a little bit more realistic, but usually when you see stuff like this, they are using averages. And, you know, usually you don’t want to find things that are like exactly average. You want to find things that are either like above average or below average by, you know, some amount.

So anyway, I thought this was a cool optimization. I thought this was, um, not, not the worst thing that I’ve ever seen SQL servers query optimizer do. And, uh, yeah, so no, I, I was, I was honestly, again, just a little bit puzzled as to what these other query engines could be doing.

That’s so much better. Uh, especially because in the video there was like this little teaser moment where he’s just like, but, but Microsoft’s about to get better at this. And now I’m like, well, I mean, it would be hard for you to get much better than this.

I think, I think this is a pretty ideal situation for the optimizer. So we’re going to give Microsoft a thumbs up for this query optimization. We’re going to say, good job, optimizer team.

You did a good. We’re proud of you. Keep up the good work. Keep stuff like this. Great to see. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you over in the next video, which will be, um, well about something else entirely. Because, uh, there’s only so often you can discover, or there’s only so often that you can be introduced to, uh, to new plant shapes, I guess.

Um, at, at, at this point in my life. So, yep. All right.

Well, we’re out of here. All right. Thank you. All right.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

When Subquery Query Plans Are Suboptimal In SQL Server (And How To Fix Them)

When Subquery Query Plans Are Suboptimal In SQL Server (And How To Fix Them)


Video Summary

In this video, I return after a three-week hiatus to Paris, sharing my experiences with you as I recorded content before embarking on my trip. The video delves into the nuances of suboptimal subquery plans in SQL Server, explaining how these plans can lead to excessive row returns and poor query performance. I walk through an example where a query initially returns 4.8 million rows, far more than necessary for practical use, highlighting the issues with row goals and estimated versus actual row counts. By tweaking the query and experimenting with different hints and index usage, we explore how to optimize these plans for better performance, demonstrating that sometimes subqueries can be effectively tuned rather than outright avoided.

Full Transcript

Erik Darling here with Darling Data, and this is my first video back after a three-week hiatus in Paris. And I know that it was seamless to you because I recorded three weeks’ worth of content before going away. Intel is back up to its old tricks. Not being able to find drivers. Some things never change, I guess. But if I look any different or I sound any different, if I have perhaps developed a slight French accent, that’s why. So anyway, today’s video, we’re going to get gently back into things and talk about when sub-query plans are suboptimal. I know, happens to the best of us, especially when you have to deal with SQL Server being its silly self. But anyway, let’s carry on with a little bit about how you and I can cooperate together. If you would like to become a member of this channel, a paid member, you can join like 50 other people who have been so kind as to donate $4 a month to the content that I produce here. There’s a link right down in the video description where you can do that. Very easy, very clickable, like me. If $4 a month is just too rich for your blood, wherever your blood may originate from, you can like, you can comment, you can subscribe, and you can make different numbers go up in different ways.

that make me a happy, healthy human being. If you need help with SQL Server, if the things that I talk about during these videos, ring a bell, and you’re like, wow, my SQL Server could use some help in that area. I do all of these things and more. And as always, my rates are reasonable. If you would like some more reasonable stuff from me, you can get all of my training content, performance tuning training content, I don’t have anything else, for about $150 USD. It’s all available at that URL up there, with that discount code down there. And of course, as usual, there’s a very clickable link down in the video description for you to do all that. Upcoming events, there is still SQL Saturday, New York City 2025. Taking place May 10th at the Microsoft offices in Times Square. So if you haven’t been robbed by someone dressed as a Sesame Street character in a while, or someone dressed as Spider-Man in a while, come to Times Square, come to Times Square, get your money’s worth, or lose your money’s worth, I don’t know, whatever. There’s probably a joke about congestion pricing in there somewhere. I just don’t know what it is yet.

But anyway, let’s talk about these suboptimal subquery plans. Now, I’m going to show you the starting version of this query, because there are some important things in there for you to understand about it. But this is not going to be the final query that we end up with. Now, this query runs for about 12 seconds, and it returns 4.8 million rows, which is far more rows than you will ever need to return to an end user for them to do anything reasonable with. Way too many, right? Way too many, right? It’s an insane number of rows. Why would anyone ever need that many rows for anything, right? Unless you’re going to put it in an Excel file and do something else with it, right? It’s just an absurd number of rows.

So this is obviously not a very good example of when subqueries have problems, because we return a lot of rows, and stuff like this part of the plan are completely sensible. So these two index seeks into the votes table, which have absolutely correct estimates, right? Absolutely. We had 100% on both of those, come from this part of the query, right? We have an upvote count and a downvote count. And of course, upvotes are signified by a vote type of 2, and downvotes are signified by a vote type of 3.

So, like, this is the part of the plan that we’re going to concentrate on for now, or rather for most of the video. But then I just want you to understand down in here is where the rest of the stuff happens. So, like, in the rest of the query, we have another subquery that goes to the badges table, and another subquery that goes to the comments table.

And down here in the query, we join users to posts, right? So there’s some stuff going on in here. But all of that, like, the two things that hit the votes table are up here, right? So, like, the order that we wrote the query in is not the order that SQL Server arranged the query plan in, which is going to come in handy for us to understand later.

And then down in this part of the query, this is where all the rest of the stuff happens. We hit badges, we hit users, we hit posts, and we hit comments, right? And SQL Server, for better or worse, has chosen parallel merge joins for all of these things.

But we don’t necessarily care about this part. Most queries that you’re going to have to tune are going to limit rows in some way, whether it’s by top or offset fetch. And for the sake of typing, we’re going to change that up here to top 5,000.

Oh, not 54,000, top 5,000. And that’s going to be nice and sensible. The thing is, when we change the query to the top 5,000 version, things don’t necessarily get all that much better.

So let’s run this, and let’s wait very patiently for SQL Server’s suboptimal optimizer to come up with a query and query plan for us and return some results. So the query that I ran that returned 4.8 million rows, the query execution time was about 12 seconds. And we had a big parallel plan with lots of joins and stuff.

In this plan, SQL Server has changed its mind about parallelism. It has not changed its mind about this part of the plan, though. Notice we still have a merge join.

The two subqueries that hit votes are still up here. The other funny thing that has now happened to our query is we have very, very bad estimates in here. But these are not bad estimates in the typical sense.

They’re not like SQL Server had bad statistics or out-of-date statistics. Of course, I just created these indexes, so the statistics are full scan up to date. The poor estimates here come from row goals.

Now, whenever you do something like top or offset fetch, sometimes exists or not exists. There are all sorts of things in SQL linguistically that add something called a row goal to your query plan. And if we right-click on these operators, what you’re going to see, not in the thing up here.

This is where it doesn’t show you anything useful. But down in here, we have an estimated rows without row goal. And this estimate is correct, right?

See, this is the right number of rows. What SQL Server made a bad estimate on is how many rows it would have to read out of here in order to get, in order to meet the row goal that we set by saying top 5,000. So, and it did that for both of these, right?

So, if we look at this one, this one will also have an estimated rows without row goal. And this is estimated rows without the row goal. And that is absolutely correct. But where things fell off was, of course, between the estimates and the actuals.

That’s a little bit easier to see in here, where you have the number of rows hiding somewhere. And then the estimated number of rows. We have the actual number of rows read up here.

Sorry about that. Got a little bit lost in my own thoughts. And then we have the estimated number of rows to be read. But this is, of course, without the row goal, which it doesn’t say. But the estimated number of rows is 3,300.

But the actual number of rows is 2, 1, 4, 1, 8, 1, 9. That is a seven-digit number of rows. So, we read a lot more rows than we thought we would. The row goal is kind of messed up in that.

But none of the other index accesses in here, if we look at the properties, at least, yeah. So, this one does. This one does not.

This one does not. And this one should not either. So, none of these have the estimated rows of the row goal. The post one does. So, where things get interesting is that we got a, probably when we shouldn’t have got a serial execution plan there.

Now, I’ve created an index on the post table, which it should have used. Because this index fully covers everything that we want to do. But SQL Server costed this index out of existence. Now, let’s look at the estimated cost for this plan.

Right? SQL Server thinks it would cost 48 query bucks. So, my cost threshold for parallelism is 50. So, this costs just under the cost threshold for parallelism.

So, SQL Server was like, ah, not quite parallel plan territory. Also, it didn’t use my nice index on the post table. On the post table, it uses the clustered primary key.

Right? So, for some strange reason. And that strange reason can be revealed by adding a hint to the post table. Now, the index that I created up here is called P.

So, we’re going to, down here, we’re going to say with index equals P. And then, we’re going to rerun this. And I’m going to show you something kind of interesting.

Because sometimes these are the wrestling matches that you have to have with the optimizer in order to get queries to perform well. Now, this query plan is parallel. And this takes, well, I don’t know.

Let’s just say about half as long. The last one was like 9 point something seconds. This is 5 and a half seconds. So, this one’s about twice as fast. Not quite. I realize my percentage math is not great, especially on the spot.

But if we look over here, the part that SQL Server got this query expensive enough to be eligible for a parallel plan was using the nonclustered index. And SQL Server deciding that it needed to resort the data that came out of that index to make this join more efficient.

So, it put the data that came out of our nonclustered index in order by the ID column in the post table. So, it reordered the data.

And now this query plan, actually, I forgot to show you this, has a cost of 596 query bucks. So, this is why SQL Server didn’t choose the nonclustered index in the first place. SQL Server thought that choosing the nonclustered index and sorting that data would make things too expensive, was way more expensive than just using the clustered index.

Of course, we can see a big difference in execution time here, right? So, again, about twice as fast when we get a parallel plan. We still have problems up here, though.

The problems that we have up here are that even though we’re doing seeks into these, right, we’re still doing seeks into our index, the row goal is still making a bad guess at the number of rows it’s going to take to satisfy the row goal set by the top 5,000, okay?

So, one thing that you can do if you want to affect how SQL Server is doing things and you’re very confident not only in your indexes but also in the plan shape that you’re after generally is you could say option loop join.

And what SQL Server will do is give us the plan shape that we would really want from queries that perform sub-queries. In other words, we don’t want SQL Server doing this part of the plan before we’ve done the part of the plan that actually does all our filtering, right?

We don’t want SQL Server doing this stuff. So, if we run this query again with the option loop join hint, we’re going to get the plan shape that we’re after. The plan shape that we’re after comes up here where all the stuff that we’re like doing our initial filtering on, right?

Like this is really the crux of our query. Selecting from users, joining the posts, and then filtering the post table on these creation date columns. So, this is really where we want SQL Server to start and narrow down the rows as much as possible.

And this is the plan shape that we’re after where taking that result, we now do loop joins across all of this stuff. And this query takes about one second, right?

That’s much better than the five seconds that it took when SQL Server chose a bad plan shape. And that’s much faster than the, well, I guess it was the same plan shape. So, we’re going to say this is a much better plan shape generally for our query, right?

And I forget, actually, you know what? We’re going to do a little experiment here. We’re just going to see what happens when I take that index hint out of the mix.

Let’s just do a little experiment because I actually forget what happens. So, this one takes actually 1.6 seconds. So, this one’s slowed down by about half a second.

And SQL Server goes back to scanning the clustered index on the post table. So, we really, but we still get a parallel plan, right? So, at least as far as query tuning goes, this is still better than what we were getting before. I’m going to leave this index hint in because I generally like the speed that I get with this index hint.

If you are not confident that the query plan shape should only ever use loop joins and you’re thinking to yourself, well, I want the optimizer to be free to choose other types of joins. One thing that you can do is set more row goals.

Now, let’s think logically for a minute about how many rows a subquery can return. One, especially when we’re doing an aggregate like count or sum or average or min or max or something.

Those are all things that have, that really just return one row. If a subquery ever returned more than one row, well, guess what? Throw an error. So, what we can do is rather than say SQL Server, you always have to use loop joins. We can say, SQL Server, we want more row goals.

Give us our row goals. And if we put a top one in both of these, SQL Server will finally see the light and will give us a query, give us the query plan shape that we want without having to say you always have to use a loop join.

Now, SQL Server does naturally choose loop joins here. But now, if we look down to where we access the votes table, we do have a top for both of these, right? It’s a tiny little top here and a tiny little top here.

And we have the same index seeks that we got before. Now, granted, the estimates in this query is still off a little bit, right? Like, I mean, we’re under by 10% here. We’re over by 43% here.

We’re over by, wow, like 1,700% here. We were, you know, over by 5,500% here. We’re over by 24% here.

Math, math, math, math, math. But we get a fast enough query with a good enough plan shape that we want. So whenever you’re tuning queries or whenever you’re, you know, having to deal with someone who’s like, wow, subqueries always suck.

Never use subqueries. Sometimes they just don’t know enough about databases to be able to say that. And they don’t know enough about query tuning to be able to, you know, make that make that judgment across the board about every subquery ever.

Sometimes you just need to give subqueries a little bit of help. So in the future, whenever you’re looking at LinkedIn or whatever other social media you like, and, you know, some, you know, data engineers like starts posting one of those, you know, LLM lists of top 10 things not to do with a query.

And you see things like avoid distinct and avoid select star and like the same list of things and like avoid subqueries, use CTE. You can just firmly know in your head that they are complete ding dong idiots, regardless of which giant company they work for.

Specifically, probably meta, which has like the highest quantity of idiot data engineers on the planet. You can say to yourself, hey, you’re wrong about all that stuff. Because Erik Darling taught me that you’re all wrong duty heads.

So you have that to look forward to. And you can, you can point them to this video where I will, I will gladly, I will gladly fight with them. Anyway, sometimes some queries just need a little bit of help.

Sometimes you have to grease the wheels a little bit, but you can always make a query better and faster. And it does not always involve taking subqueries out or adding in CTE or whatever other nonsense people tell you about on the internet.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I hope that you will pick fights with data engineers online because it’s fun to do because they’re all cock-a-doodie heads.

Anyway, cool. All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Better Filtering With YEAR and MONTH Functions In SQL Server

Better Filtering With YEAR and MONTH Functions In SQL Server



This week, I’ve got a coupon that offers an even steeper discount on my training than the one at the bottom of the post.

It brings the total cost on 24 hours of SQL Server performance tuning content down to just about $100 USD even. Good luck finding that price on those other Black Friday sales.

It’s good from December 23rd-27th, or until I remember to turn it off. Hurry along now.

Thanks for watching!

Video Summary

In this video, I delve into the world of better filtering techniques in SQL Server queries, specifically focusing on how to handle year and month functions more effectively. Often, I encounter queries that dissect dates into their constituent parts (year and month) for comparison, which can lead to performance issues and slower execution times. By demonstrating a cleaner approach using the `DATEFROMPARTS` function, I show how to avoid these problems and improve query sargability. This method not only makes your code more readable but also ensures that SQL Server can leverage existing indexes efficiently, leading to faster execution.

Full Transcript

Erik Darling here. Guess who? Darling data. That’s what science says. In today’s video, we’re going to talk about better filtering with year and month functions. Now, I run into a lot of queries that do something like take a year and a month, very rarely a day, but usually year and month, is either parameters or parameters or sometimes they even go so far as like to dissect a date and pull the year and month out. I don’t know why. It’s blasphemous, but they do it. And then they take some columns in their table and they like a date column or a date time column and they say where year, month, that column equals the month thing and then where year column equals the year thing. And without fail, those queries suck and are slow and have problems and I have to fix them. So I’m hoping that by recording this video, I can avoid having to fix those problems in the future. Great. How much would you pay for that? How about four bucks a month? You click the link in the video description. You can sign up for a membership and get all of this for four bucks a month. You can sign up for a membership and get all of this for four bucks a month. You can sign up for a membership.

a month or more if you choose to be more generous, but four bucks a month is the minimum. Anything less than that, I don’t know. What do you do with it?

I live in New York. Four bucks technically just allows you to take a breath. If you are just a fan of this content, you can like, you can comment, you can subscribe.

I’ll thank you in some other way. If you are looking for help with your SQL Server, perhaps with some performance tuning issues like with wrapping year and month in columns, I am best in the world at all of these things.

You can hire me instead of some idiot who doesn’t know what to do and who gives you bad advice. So that’ll be cool, right? Imagine that you’ll actually get stuff done. If you would like some very high quality, very low cost SQL Server training that teaches you things like I’m about to teach you now and more.

You can get all of my training for about 150 USD with that discount code. And would you believe that a link directly to go to the training site and apply that very discount code is in the video description.

So, again, it’s the end of the year. I’m not doing anything. Talk to me in 2025. With all that out of the way, let us put on our party hats and commence the partying because that is what we are here to do.

So, this is the type of query that I see people writing quite a bit. Oh, is ZoomIt? No, ZoomIt is just a little bit delayed. This is what I see people doing quite a bit.

Now, I think that part of why they don’t do things correctly is because sometimes figuring out date things is hard. And I admit I also didn’t want to figure out date things.

So, what I did is if ZoomIt will finally just go away, is I looked at Stack Overflow. And Stack Overflow had a pretty good answer for how to figure out if you’re in a leap year. So, here’s what I’m doing.

I have an index on the votes table on creation date. I just didn’t want that to accidentally run again. And then up here, I’m assigning a year and a month. Let’s just pretend these could be parameters.

They could be things that you dissect from a date that gets passed in. Whatever you want to do with it. It doesn’t matter to me. And then in here, I’m figuring out if we are in a leap year. So, if we’re in February, then we start with, to find the last day of the month, we start with 28.

And then if these things are true, we add 1 to the 28 or else we add 0 to the 28, which makes sense. Because you don’t want to add anything if it’s not in a leap year. And then there’s a little bit of figuring out down here to say, if your month is 4, 6, 9 or 11, then you have 30 days in your month.

If you are not month 4, 6, 9 or 11, then you have 31 days. All right. So, we’ve got that part figured out.

Great. Great. Wonderful. So, this is really a much better, oh, why are you moved over? How did that happen?

Oh, that just looks silly, unprofessional, didn’t it? That was terrible. Just fire whoever wrote these scripts. Really, a much better way, a much cleaner way of doing this. And again, this comes down to the concept of sargability, of being able to take whatever your search arguments are and apply them easily to whatever index you have in place.

Of course, for us, we have an index on the creation date column. If we do stuff like this, SQL Server has to run this function for every column, figure out, pull out the month, pull out the year, and figure out if that month and year is equal to the month and year that we have passed into the query. If we do this instead, using the wonderful, magnificent date from parts feature, we can assemble a date out of the year and the month.

And to find the start of the month, we just put in 01. And to find the end of the month, we just use that day variable that we used up there. Now, because I am using local variables, and local variables have lots of performance side effects, particularly usually because of poor cardinality estimation, in order to shortcut that issue, in order to force the parameter embedding option that SQL Server has in its wonderful engine, I am using the option recompile hint here.

If these are big queries that run occasionally, and you are okay with them recompiling every time, please use a recompile hint. There is nothing wrong with it. If this is code that you want to be production ready, awesome code, you may want to explore two other scenarios. One might be to have a store procedure that accepts the local variables that you declare and assign values to as parameters, because then they are treated like parameters, because they are passed in to that store procedure.

Or, you could use dynamic SQL and parameterize dynamic SQL, which is the only safe kind of dynamic SQL, and you could pass in the local variables that you declare and set values to as parameters to that dynamic SQL, thereby transmutating them through the alchemy of parameterization into parameters. Wonderful. Things that you would just never think would work, but they work.

They work well. And of course, if we run these two queries, I am just going to hit execute on this. These both come back relatively quickly, but the first one comes back relatively slower.

This one takes a full second to run, has to scan the entire index that we created on the votes table. That is this thing right here. And if we hover over the operator here and we look at the tooltip, we actually have something kind of funny happen.

Year and month are really just sort of shortcut synonyms for the date part function. Right? So, you can see that we have evaluated…

Oh, gosh. Where did my cursor go? There it is. The date part month right here and the date part year right here. And we have said where the date part month equals three, just like we assigned to that value.

And the creation date and the year 2013. Oh, that’s nice. We still had to scan that index though.

We needed a parallel query for it to be fast. If we wanted to put these on sort of even footing, and I’ll do that in a second, we could run this query at max stop one to see how long this takes. Because down here, we do have a serial query that finishes in 178 milliseconds.

Right? And granted, the plan is a little bit more complex with all this stuff in it. But in this case, it’s okay because each one of these constant scans is really just creating a row for the date from parts function that we’re feeding in here.

And if we look at this, we just have a range seek to for expression 1008, which is the date from parts function that we created from year month 01. And then down in here, we have the end of that for expression 1009, which is the date from parts function that we assembled for year month day. Right?

So when we found the end day of the month, that’s where that was. So let’s actually go back and let’s redo that. Let’s say that we want to put these on equal footing. Right?

We want to make sure there’s no cheating. Parallelism clearly made things faster and better for that first query. Right? So let’s look at just how long, like single threaded each of these takes. Right?

Let’s look at these execution plans. This takes a full five and a half seconds now. So without the benefit of parallelism, this takes 5.6 seconds. Right?

We still have to scan that entire index. It’s still doing the same thing. And this query down here is still able to, in just under 200 milliseconds, do that same work. So if you’re out there tuning queries, and you see something like this, well, stop doing it. One.

The first step, stop. Second step, start. This is a much better way of approaching this. Now, this sort of thing won’t work if you have to do something real goofy, like the year and month from the date value in one column equals the year and month from a date value in another column. Maybe I’ll do a video about that someday, but no promises right now because my time is dedicated to other things.

You might notice that I have about 11 tabs open up there. I’m going to be recording some videos about stuff that I find interesting, about how to write query, how to write T-SQL queries correctly, how to use different things in T-SQL to write queries in the best way possible. So I’m going to be recording those.

Maybe I’ll get around to that one as soon as I can, but I got a lot of stuff to talk about before I get around to that. So anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I hope that this finds you in good health because nice, happy, healthy people, they want to buy training and spend time with young, handsome consultants and gosh darn it, that’s me. All right. Cool.

Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

The Great Integer To Bigint Compression Swindle

The Great Integer To Bigint Compression Swindle



Thanks for watching!

Video Summary

In this video, I dive into the often-overlooked complexities of converting an integer column to a big int in SQL Server without downtime, debunking some common misconceptions. I explain why many blog posts and articles might lead you to believe that simply adding compression is enough for such a change, but reality paints a much more intricate picture. I share real-world scenarios where this conversion has caused significant headaches due to dependencies and other constraints that aren’t always immediately apparent. To help viewers navigate these challenges, I also highlight Andy Malin’s detailed blog post on the subject, offering a practical approach using shadow tables and triggers for minimal disruption.

Full Transcript

Alright, I lied a little bit. I had one video left to record that I forgot about. And I’m going to do that real quick because I just want it off the board. This video is about the great integer to big int compression lie. There’s a lot of blog posts out there that talk about this neat trick if you want to convert a column from an integer to a big integer with no downtime and all that. You know, oh, it just, it happens instantly, blah, blah, blah. There’s a lot of caveats to that that don’t get talked about in these blog posts. A lot of people have fooled you into thinking that it’s as simple as just, oh, just, you know, make an index with compression and all of a sudden it’s done. It’s not, not quite true. There are a lot of things that could be, a lot of other things that can get in the way of what, what seems like a very simple thing. And all these other blog posts. But before, before we do that, I need to tell you the truth about a few things. Like, like the fact that you can become a member of this channel for four bucks a month and support the fantastic unparalleled work that I put forward on YouTube to help educate you about SQL Server stuff.

There’s a link right down there. If you, if you don’t have the four bucks or more, depending on your, your, your feelings of generosity, you can like, you can comment, you can subscribe and the channel will grow in other important ways. You know, there’s, it’s not just money. What do you call it? Influence. My sphere of influence will grow. If you need help with SQL Server stuff, including making columns go from integer to big integer, you can hire me to do all of these things at a very reasonable rate. Uh, and I’ll be very happy to help you. So we’ll, we’ll both have these things to look forward to. Uh, if you would like my training, you can get all of it for about 150 us dollars at that site up there using that discount code.

And again, there is a link for that right in this area somewhere, uh, where, where you can just click on a fully assembled thing and just show up and start using it and watching it and getting smarter and being a better human being and all that other good stuff. SQL Saturday, New York city, uh, coming up in May 10th, uh, at the Microsoft offices in times square. Uh, if you would like to come see me and a whole bunch of other smart folks talk about SQL Server data stuff in general, uh, then you can buy a ticket. It’s wonderful. All you have to do is a little search for a SQL Saturday NYC 2025. Guarantee you’ll find it. But with that out of the way, let’s talk a little bit about how these people hornswoggle you into thinking that compression, uh, makes this change easy. So I’m going to create a table here.

And the, the column that we’re going to care about is this integer column, right? ID. And we’re going to have a clustered primary key on that column. And then we’re going to have some other indexes that reference that column along with one index that does not reference that column at all. All right. So let’s create this table and we’re not even going to put any data in this table because I just want to sort of show you the complexity of what we’re dealing with.

Now, uh, all of the indexes on this table, the clustered primary key and all of the nonclustered indexes have compression on them. So this should be immediate and instant. But if we run this, run this author statement and try to have, um, try to make that change. Oops. Whoopsie daisy. Uh, you’re going to see this error here. Uh, this object, uh, this index and this index and this index and this index are all dependent on this column.

So we can’t just change this column. This is just indexes on the table that we care about. So imagine if we have other stuff that references this column, say foreign keys, uh, indexed views, uh, any, anything like that. If you have, um, if you have objects created with schema binding, like functions or views or something like that, um, you, you could, the, the, the, the, the, the making this, making this change to this column is far more than a case of just saying, oh, I added compression and now I can do it.

No, it’s not. It’s a lot more involved. If like in, you know, uh, if, if you’ve got a big table and you need to get rid of indexes or you need to remove a clustered primary key, uh, from that, table, whoo, boy, you’re not going to have fun. So we would need to drop the constraint, right? The primary, the primary key constraint, which on a big table, not a good time, right? Converting that table to a heap, uh, and then drop the one, two, three nonclustered indexes that we would care about here, right? That, that, that, that, that we got the error for. And then now finally we could make this change. But again, this doesn’t take into account any other relational dependencies on this column that might exist in your database. If we had more stuff going on, this just wasn’t just like a single table that I’m creating for this demo in my useless crap database.

We would have to do a lot more work to make, to make this happen instantly. Then after, you know, at the end of that, we would have to go and add everything back. This whole thing right here, like this whole thing would be fully logged because you would have to get rid of the clustered primary key. And then adding it back, adding these three indexes back, those would also all be fully logged. So is it really worth going through all the trouble with compression to change a column instantly?

It doesn’t feel very instant to me. And this is something that I’ve had to actually help people with, not just write a blog post about that was like, oh yeah, look, it can do it. No. Doing this in real life is a lot harder and a lot more complicated. If you want to see a good blog post about how to do this with very little downtime, don’t judge me. I don’t use Bing. I don’t use Edge very much on this thing. But if you go to Andy Malin Alter Online, Andy has, and I’m going to put this link in the video description.

Andy Malin has a blog post that I think, I think, how long ago was this? 2017. And he talks about, wait, is this the right one? No, this is rebuild a very large primary key index. Man, Bing sucks. Jesus. All right, there it is. Changing a column from int to big int without downtime from 2019. Thanks, Andy. So I’m going to put the link to this correct blog post that for some reason Bing thought should be second in the results.

Oh, man. Microsoft keeps thinking it can compete with these things. Like, it thinks Bing can compete with Google. It thinks Fabric can compete with Databricks. It thinks Azure can compete with AWS.

And, oh, boy, you just waste everyone’s time and money with all this nonsense. So Andy has a great post where he walks through creating a shadow table, creating a trigger to move stuff over, and then how to do the swap and migrate data and everything. So I’m going to put this in the blog post because this is a far more reasonable approach to making the change.

I’m actually going to copy it right now in front of you because I don’t want to forget. So this is a much more reasonable change that you could make. Rather, a much more reasonable set of steps to take to make the change than just saying, oh, I added the compression.

No, it’s done. Because, man, I’ll tell you. The number of people who have needed to fix that and tried to follow the compression thing and just ended up with a sea of red text errors is huge.

I’ve had to get involved with these situations a bunch of times. No one’s very happy about it because they’re like, I read these blog posts. All these smart people told me all I needed was compression.

But no, no, no, no, no. You need a lot more than that. You need to ditch all those dependencies before compression kicks in to the point where it can help. Now, to be, I mean, not even to be fair to the other blog posts, but to be fair to people who want to make this change, Microsoft, this should just be a metadata-only change anyway.

Like, all you have to do is start accounting for new stuff in the metadata as a big int and, like, and say, like, allocate bytes based on that. And then just say the old stuff is now a big int. But even though you know it’s an int that only takes up four bytes, it doesn’t matter until you, like, rebuild or, like, even maybe even reorg the index or something like that.

Like, there’s just no reason for this to be as painful as it is. It’s pretty stupid, actually, that we have to go through this much trouble and the compression would even, like, come into the picture with all the dependencies and stuff that you still need to get rid of. So, I don’t know.

Maybe eventually someday we’ll get some pain relief in this area because, you know, integers aren’t getting any smaller. People, data isn’t getting any smaller. People keep adding new stuff in.

And, boy, this is just going to get worse and worse. So, hopefully, Microsoft addresses this major scalability flaw, stops wasting money on things like video games, search engines, crappy cloud services, and fabric, and lousy AI gamble. I don’t know.

You know, maybe put some money into your core products and stop asking and not stop treating people who have already bought it as, like, screw you. You’re already stuck with us. You know, like, help people who have had your product and been paying for it for a while out a little bit. It’s a nice thing to do.

At some point, you’ve got to dance with who brung you. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. Now I’m really taking a break and shutting everything down because I have completed my ring cycle. So, goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Saturday New York City 2025 Is Coming Your Way In May!

You Know You Wanna


After a brief hiatus due to some paperwork difficulties, SQL Saturday NYC is back in action this May 10th!

The event will be taking place at the Microsoft offices located at 11 Times Square, New York, NY 10036, with a full day precon session on May 9th.

See you there!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Computed Columns

Indexing SQL Server Queries For Performance: Computed Columns



Thanks for watching!

Video Summary

In this video, I delve into the world of computed columns in SQL Server and how they can be used to enhance query performance through indexing. After recently completing a series on writing efficient SQL queries and another on indexing for performance, I decided to bring these concepts together to explore computed columns in detail. I discuss common pitfalls when using computed columns, such as concatenating year and month fields without proper data type handling, and how converting the result to an integer can provide more predictable behavior. Additionally, I explain the importance of making computed columns deterministic for indexing purposes and demonstrate how to create indexes on these columns effectively. By sharing this knowledge, I hope you gain insights into optimizing your SQL Server queries and improving their performance through strategic use of computed columns.

Full Transcript

Erik Darling here, Darling Data, and you may notice that I look a little bit different than I did in my last video because I got a haircut. I don’t know exactly how I feel about the haircut at the moment. It’s not precisely what I requested, but we’re going to see what happens when this thing grows in. I think I need to sharpen myself up a little bit. Oh, that’s blurrier. There we go. I think that’s maybe, that’s a little bit better there. I look like I’m going to see what happens when this thing grows in. I think I need to sharpen myself up a little bit. Oh, that’s blurrier. There we go. I think that’s maybe, that’s a little bit better there. I think that’s like less of a weird roll on deodorant person with that properly adjusted there. So in this video, we are going to talk about sort of an intersection of concepts. I have, I’ve recently, most recently been recording a series of videos about indexing SQL Server queries for performance. And before this, I did a series on how to write SQL Server queries correctly. And I have another sort of ongoing, ever developing playlist called a little about, in which I go into some detail, a little detail about various concepts in SQL Server. And this video kind of meets, makes all of these things meet.

It’s a much happier thruple than the one that I described between like index views, computed columns, and filtered indexes. This one is just about computed columns. So I’m going to call this video, indexing for performance and writing computed columns query, computed column queries correctly. And then it wouldn’t fit on the slide, but in parentheses, a little about.

All right. So before we talk about this subject over here behind me, let’s talk a little bit about how you can subject me to $4 a month by becoming a member, the supporting monetary, monetary contributing member of the Darling Data family. Become one of my data darlings by clicking on the link in the video description right there. And that will take you to the Become a Member page where you can plug in whatever payment form you’ve currently, I don’t know, whatever you stole out of your mom’s wallet or something.

If you don’t have a mom or a wallet, you can like, you can comment, you can subscribe, and you can help me grow this channel in other ways, big and small, which I appreciate. I appreciate the big and the small. If you need help with SQL Server because you are running into problems with the type of stuff that I talk about in these videos, well, golly and gosh, are you in luck.

Look, I’m a consultant, and in exchange for money, I will solve performance problems on your SQL Server. Crazy, isn’t it? Any one of these things and more.

And as always, my rates, they are reasonable. And again, just because I just exude reasonableness from my being, from all of my pores, through all of my glands, just out every single hole, you can get all of my training for a very reasonable price, about $150. If you go to that URL up yonder and enter in that discount code right next to me there, you can get money off of that, and then you can just have it forever.

And, you know, unlike a haircut, my training will never go out of style, not even moments after you walk out of a barbershop. Upcoming events, of course, we have the one, oh, man, my favorite, my favorite, because I don’t have to go very far, SQL Saturday, New York City, 2025, coming May 10th. Boy, it’s going to knock your socks off.

Might even actually have to give you a pair of socks to knock off, because we’ll knock off both pairs of socks. This one, I have a feeling. But with all that out of the way, let’s party, I’m not sure I sold that one.

But anyway, let’s talk about computed columns here. Now, this does stem a little bit from a video that I made recently about how to search better for, like, dates and years in queries, where, you know, like, I talked about how you can use creation date. And the original query that I had written, because, like, I’ll be honest with you, that’s code that I had sort of, like, had sitting around sort of adopted from, like, something that I worked on a long time ago, where, you know, SQL Server 2012 wasn’t quite popular then.

And so, like, the end of month function wasn’t, like, readily available across every place where it would have been used. So, the end of month function, honestly, it just skipped my mind. So, thank you to COBOL7 for reminding me about the EOMonth function.

So, originally, in my query, I had a bunch of complicated crap to figure out when the, like, leap years and the end of the month and stuff. You can skip over all that if you just use the EOMonth function, like COBOL7 suggested there. Now, the original query that we started with was this one, where year column equals 2013 and month column equals 1.

And absolutely, sure, if you wanted to create computed columns and maintain an additional index on those computed columns, you could do something like this. And adding these computed columns in without the persisted keyword would add them in instantly. There would be no locking, no nothing on there.

They would just act as virtual columns in the table. And then, if you wanted to create a nonclustered index on both of those columns, even not persisted, you could do that very easily with an index that looks like this. Cool. Great.

What a lot of people end up trying to do, though, is trying to make sort of a calculated field from both of these things. This is where you can run into trouble. This is going to be the first bit of trouble that I’m going to show you.

This is where you can run into a column where if you were to try to directly concatenate things together. So, I have a few columns in the select list here. One where I’m just showing you the year and month that comes out of the creation date.

One where I’m showing you what happens when you just try to concatenate those things together. That’s a terrible arrow, but we’re just going to deal with it. And then one that kind of has an unexpected side effect when just left to its own devices.

And then one that is probably what you’re expecting, but takes a little bit more work by converting the result to an integer. So, if we look at the results of this thing, you’ll notice that I have the year and the month, which comes back as 2008 and 7. But if I try to just concatenate those together, like with this line here, the result is just going to say 2015.

Because what comes out of year and what comes out of month are both integers. And so, SQL Server says, oh, you just wanted to add those together. 2015.

Congratulations, dummy. Okay. Fair enough. Integers, you know, that’s what you do with the plus sign. You add one to the other. So, this isn’t what we want here at all. We could just do this.

Oops. We could just do this. And let me scroll down a little bit in here, actually. We could just do this and say, like, rtrim year plus right double zero plus rtrim month two so that we get the year portion and then a zero padded month portion here. And that could be fine.

You know, I’m okay with that. As long as, you know, the data type that comes out of that is what you’re okay with. And any queries that search on that use the correct data type thing.

But we can also convert that to an integer. And really, the only limitation on this is that the leading integer can’t be zero padded, right? So, like, if you’ve ever worked with zip codes, you’ll know that storing zip codes is an integer.

In America, where they’re all numbers, is a mistake because some of them start with zero. Yeah. And you can’t have an integer start with zero.

So, this, I think, would be pretty safe because we’re a long way off from any years started with zero. Where, you know, probably, I can’t imagine when we would have another year that starts with zero again. But, anyway, the 07 would be safe because, you know, months start with seven.

And most people, you know, I don’t know if they, I don’t know, whatever, really whatever search criteria works best for you, just go with it. Most people expect the zero padded month, but if you don’t, whatever. I’m not going to sit here and argue with you.

Where things get a little surprising for some is if you say, if you, if we were to run this store procedure, sp.describe.first.result.set, which is very handy for figuring out what data types are coming out of queries that you write or anything like that. So, you can figure out, like, oh, do I need to fix this somewhere?

Is that the one where we just have this as a string comes out as a varchar14. So, we do have to be a little bit careful there. The SQL Server does not, like, change that back to an integer or anything else unless we explicitly tell it to.

So, you know, we could do that. But what makes it tricky with the computed column stuff is that we would have to add the computed column like this. Right?

Convert integer, whatever. So, let’s do that. And let’s create two indexes on the votes table. One, we’re going to create on the year month column that we just added. And the other one, we’re just going to create straight on the creation date column that we just added.

The thing with SQL Server, and this goes for standard edition and enterprise edition, is that unless you write your query to explicitly reference the computed column that you created, you might be relying on something called expression matching for the optimizer to choose your computed column to use with the query that you write.

So, the sort of unfortunate thing is that if we were to just write our query like this, and we were to say, hey, give me the year and give me the month, like 2013, I mean, the old one is sort of unnecessary, but if we were to just write this query, the best that we would get from this currently is just a scan of the index we created on creation date.

This is not the index. This is V1. This is the index on creation date right here. This is not on the computed column that we created, or rather on the index that we created on the computed column.

We would have to write our query to very explicitly match the definition of the computed column, and have, say, where that equals 2013-01 in order to referent, in order for a SQL Server to pick up on the computed column and use it automatically, right?

Which, and I say automatically, but this is far from automatic. This is you taking a pretty big step towards, you know, creating the, rather, rewriting, like creating the computed column, rewriting queries to use it.

But at this point, like if you’re going to do all, like if your query is doing all this stuff, fine, already, fine. But if it’s, you know, if you have to rewrite it to do this, you’re better off just rewriting it to say where year month equals something.

Like, don’t go through all this trouble. Now, one place where adding computed columns can get tricky is, let’s say that we wanted to do this.

And I’m going to change this a little bit first, because I just, I want you to see where this can go wrong. So let’s put this on a new line.

And let’s put this on a new line. And let’s put this on a new line. So what I’m going to do is take out this and this to begin with.

So now we’re just going to have add to the post table, last edit date as is null, 1900, 0101, whatever. And if we do this and we try to create an index on the column like this, SQL Server is going to say it cannot be used in an index or statistics because it is non-deterministic.

And the reason why it’s non-deterministic is because we, SQL Server isn’t quite sure what to do with this. All right. We’re not telling it what to do. So it doesn’t really know.

I’m going to drop those computed columns. Actually, I’m not going to be able to drop the one on votes, am I? Because it’s dependent on that index. So we’ll just do that later. I’ll take care, fix it in post. So, but I got rid of that one.

Now let’s put these back in. All right. Let’s say convert date time, 1900, 0101. And we’re going to use the 121 culture with the convert there. And if we add that in with that in the right place, now we’ll be able to create an index on that computed column.

And the nice thing is that we don’t have to put the convert part of the computed column into any queries that we expect to use it. If we just say, hey, select count from posts where is null, last edit date, whatever is greater than this, then SQL Server will have no problem using and seeking into the computed column plus index that we created on it to support this query.

So the good news is that there is some finagling and work that you have to do to get, especially whenever you’re doing anything that involves an expression like this, you do need to really help the optimizer understand what to expect out of that column and make sure that, you know, you know, it’s like things are being converted to the proper, you know, date and all that other good stuff.

So it is possible to do, and it is possible to get expression, expression matching to happen without adding any additional logic into queries that you expect to use things in this case. But you do need to have the, you do need to make computed columns deterministic in order to index them. And if you don’t do that, SQL Server is going to say no index for you.

So there’s that. Anyway, just a little bit about indexing computed columns for performance and writing computed column queries correctly. I hope you enjoyed yourselves.

I hope you learned something and all that other good stuff. This is going to be my last recorded video. I mean, you’re not going to be able to tell because I have video scheduled out for about three or so weeks at this point. But this is, for me, exciting because this is the last video I’m recording for a bit.

I’m going to take some recording time off for the upcoming Christmas and New Year’s holidays. So I’m going to, I’m going to just shut this whole mess down and, and think about other things for a little bit. But when I come back from break, we’re going to, we’re going to talk, we, I have, I have a whole slate of things to talk about.

So, and hopefully there, there, there, there will be some, some news on SQL Server V next that I’ll be able to talk about as well. So that’s, that’s what’s upcoming. But in the meantime, I do wish everyone happy holidays, whatever you’re out there celebrating.

You know, we’re, we’re, we’re a Christmas bunch. So Merry Christmas. If you’re, if you’re feeling Christmassy too. Happy New Year.

Even if the new year you’re celebrating is a little bit after the new year that I’m celebrating. Again, it’s, it’s cool with me. Celebrate, celebrate your new years whenever it makes sense to you. You know, it’s, it’s fine.

It’s just more, more, more, more days on the calendar where it’s okay to drink, which means it’s okay with me too. But anyway, I will see you when I return from my, my, my bon voyage. And until then, you know, keep indexing SQL Server queries for performance and writing queries the correct way in SQL Server.

It’s still, you know, it’s a nice thing to do. Anyway, goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Windowing Functions

Indexing SQL Server Queries For Performance: Windowing Functions



Thanks for watching!

Video Summary

In this video, I delve into improving the performance of windowing functions in SQL Server, focusing primarily on how effective indexing can be. I start by discussing my own experience with YouTube and how you can support my channel through a membership, emphasizing that all content is free but your engagement helps make it better. The bulk of the video then explores specific techniques for optimizing queries using windowing functions, particularly cross-apply and drive join methods. I explain in detail why inducing batch mode can significantly improve performance, especially on standard edition SQL Server, where DOP limitations can be a challenge. Throughout the discussion, I highlight common pitfalls such as mismatched indexes and the importance of attention to detail in index creation. By sharing these insights, my goal is to help you understand how to better leverage windowing functions for faster query execution.

Full Transcript

Erik Darling here with Erik Darling Data, just Darling Data. I think I took the Erik out for a reason. My name is naturally, since birth, this is not an affectation spelled with a K, and asking people to remember that is apparently difficult, even in the context of like, you know, like, my name is like, name being in an email right above them. It’s always like, hey Erik with a C, and I’m like, not a lot of attention to detail on your end, is there? Not a lot of giving a crap. So, uh, took the Erik out, kept the Darling Data. Darling is at least reasonably easy to spell as long as you’re not, uh, some, uh, AI powered, uh, transcription tool, which really loves calling the name of my, really loves calling my company Darlene Data. Like, Darlene, the, the, the, the, the daughter, the rebellious daughter from Roseanne, which, not a lot of attention to detail in the AI world, is there? Not a whole lot of that. Anyway, uh, today’s video, we’re going to talk about improving the performance of windowing functions in SQL Server, uh, mostly from the, um, the context of, uh, indexing. Uh, we’re gonna look at a couple different query forms, though. Uh, one, using cross-apply, the other using a drive join. So, there’s gonna be some interesting stuff in there, but before we talk about that, we need to talk about how Erik Darling derives value from YouTube, or from you, to, the, uh, if you would like to support my efforts to bring you, um, you know, all of this wonderful SQL Server content, you can click the link down in the video description to sign up for a channel membership. Uh, we’re up to, like, almost 40 people, or, I think, maybe 40 even.

I forget what the last number was, but, 40. Wow, 4-0. Just like my last birthday. Uh, uh, and, you can, you can, uh, support this channel for as little as $4 a month. Uh, all of this content is otherwise free, and if you would like to help me make this channel the wonderful, uh, you know, gigantic, thriving community that it should be, uh, liking, commenting, and subscribing are wonderful ways to also help me do that. Uh, if you, of course, see the things that I talk about, uh, in these videos, and think that, uh, some of them might just apply to your slow-ass SQL Server, you can hire me to do all sorts of consulting-y things. I will make your SQL Server faster in exchange for money. Uh, and, as always, and forever, my rates are reasonable. In the same reasonable vein, uh, is my, my consulting rates, uh, my, my training rates are also, uh, equally reasonable. Uh, if you would like to get all of my, uh, training for, uh, about 150 USD for the remainder of your, you know, life, uh, may, may, may you live forever, uh, you can do that at that URL up there using that discount code over there.

There is also a fully, uh, formed, uh, URL down in the video description if you are more keen on just clicking on one thing than clicking on a thing or typing a thing and typing another thing. I know how the attention to detail goes. Uh, SQL Saturday, 2025, coming your way, May 10th, uh, it will be at the Microsoft offices around Times Square.

So, not only can you get a slice of, slice of Sbarro pizza, you can also get your picture taken with, uh, Elmo wearing a dirty backpack or something. And, um, you know, uh, hopefully, hopefully not offend Elmo or, or dirty backpack Elmo or dirty backpack Spider-Man or, uh, any of the, the dirty backpack Sesame Street characters that, that, that, that, that, that roam around and, uh, will, I mean, for a reasonable rate, take a photo with you. So, uh, yeah, you, you should, you should do that.

Um, I, I would encourage you to do it. It’s, it’s a fun event. We, we don’t invite the dirty backpack Sesame Street characters into the event, so it’s usually pretty safe in there. But with that out of the way, let’s talk about windowing functions in SQL Server.

Now, uh, I am not going to sugarcoat this one bit as soon as Zoomit decides to become a responsive, responsive and responsible human being. Uh, most of the time when I’m working on a query that has a windowing function in it and it is having performance problems, my, the first thing I go to is to try to induce batch mode in some way. Uh, either by encourage, by, uh, using, uh, like, you know, just some sort of column story object in the database.

It doesn’t even have to have any rows in it. It can be a temp table if, you know, if you’re into that sort of thing. Uh, it can be encouraging a higher database compatibility level for the query, um, and maintaining the legacy cardinality estimator for a query.

But most of the time, windowing functions just work better under batch mode. They just do. Um, the window aggregate operator is highly optimized or something.

And it is just, like, you, you just take all the work out of it. You don’t really even have to, like, in most circumstances, you don’t even have to change the indexing because you’ve already got good enough performance just by getting batch mode out of the way. Uh, getting batch mode, uh, involved in the query and getting row mode out of the way.

Sorry, it’s the opposite there. Um, sorry, Zuma is being a real weirdo today. Uh, but, uh, if you’re on standard edition and Microsoft hates you because you did not pay the friendship tax, the enterprise friendship tax, then, uh, batch mode is not going to be as favorable of a proposition because you are limited to a DOP of two for any parallel queries that involve batch, that involve batch mode on standard edition. Uh, it’s a real unfortunate situation.

Um, I don’t understand quite why Microsoft needs to stick the thumb in the eye of standard edition users so badly. It’s almost like they’re daring you to go use Postgres. Uh, it’s a, it’s a real, real stinking shame how much accountants can ruin.

Perfectly good software. But there are lots of windowing functions out there in the world. There are, they generally fall into three categories.

There are ranking windowing functions like row number, rank, dense rank, and end tile. There are aggregate windowing functions like sum, average, count, big, min, and max. And then there are all sorts of analytic windowing functions like first value, last value, lead, and lag.

There are a whole bunch more, uh, in the boat, but I didn’t feel like listing them all out. So, um, anyway, uh, let’s pretend that I have a couple indexes, uh, on a couple of tables in the Stack Overflow database, namely the Post table and the Votes table.

And I’m using the bigger tables because I really want to drive performance crazy. Now, uh, we’re going to start with this query and those indexes. And this is the execution plan for that.

Um, it takes about 20 seconds, uh, and we use a loop join here. And on this side, the inner side of the loop join, we already have a good index on the Votes table to seek into. We can already get that done pretty quickly.

But what we don’t have is a good index to help with our windowing function otherwise. Now, inside here on the Votes table, what we’re cross-applying to is all this wonderful stuff, uh, where the, uh, Post ID column in the Votes table equals the ID column from the Post table, where VoteType ID is in these VoteTypes and where the creation date of the votes is greater than 2008.

So, uh, if we go back to the indexes a little bit on the Votes table, we do have this seekable index on Post ID, but none of the, um, none of the, uh, columns either from the other part of our where clause or from the specification of our windowing function are in the key of the index.

Now, if you’ve been paying attention to my other videos, even specifically the last video I recorded where I was like, hey, indexes put data in order, uh, you might have come across some stuff in your head like, oh, windowing functions put data in order too. Indexes can help windowing functions.

So, uh, let’s go and change the index that we have on the Votes table. But what I want you to pay attention to here is that even though we’ve put Post ID, VoteType ID, and CreationDate into the key of the index now, Post ID was there before, we added VoteType ID and CreationDate, and that should technically help with this.

It doesn’t go as far as we want because we messed up this part. Again, the whole attention to detail thing. This is the Eric with a C of index creation, and you’ll see, notice in here that we still, even though we can seek into everything we care about here, we still have to sort that data. If we hover over the sort, we will see that we have OrderBy, VoteTypeID ascending, and then CreationDate descending. So, our index really has to closely match the specification of our windowing function for it to work out. If we change our index definition, like we have in this window, to be on PostTypeID, VoteTypeID, and then CreationDate, things will work out a lot better for this query. Right? Now, we have, we still have our PlentySeekable index over here, and we do not, we no longer have a big spilling, expensive, well, not expensive, time-consuming spilling sort on the inner side of this nested loops joint. So, we’ve improved this query in a couple ways. Not necessarily from a timing perspective so much. We are a little bit faster than this query, and we are a little bit faster, well, actually, we’re about the same speed as this query, but this query over here will have asked for a memory grant of 4300 KB, and this query over here will have asked for a memory grant of 136 KB.

So, we did not improve the speed of this query so much, but we did improve some other, that metric, specifically the memory grant. Now, one thing that, you know, sometimes works, like trying to get batch mode involved, is to do like a windowing function thing in sort of an inner context, like in here, and then, you know, like select from that and generate the whole windowing function, and then apply some predicates outside of where you generate the windowing function. But in this case, this doesn’t get us anywhere. This doesn’t actually fix the query in any meaningful way.

So, one thing that you like, you might try, or rather the way that this, these queries like this tend to naturally get written is not with cross-apply, but with a join, right? And you can totally do that, and you can totally put a derived join into your query, and you can generate the whole row number in here, and you can join to your derived join on the outside of that. Where that differs from the cross-apply thing is with cross-apply, you can push that correlation inside of the cross-apply.

So, you don’t correlate the V thing to the post thing out here, you correlate the V thing to the post thing inside of the cross-apply. So, in this query, where I was trying to get that to change a little bit, was generating the row number in here, and then sort of applying the filter after that.

This didn’t quite work out for me. But the way most people write queries like this is just with a straight join, but of course you can’t put this thing in here, right? You can’t do that. This doesn’t work with just a straight join. You have to do the join and then say on out here.

So, in this case, we will have generated this whole result set, and then we will have filtered it to, then we’ll join the results of this thing to the post table outside of generating this result set, and then apply some filters to the stuff out here.

This does okay, but we’re back to a memory grant and we’re back to a spilling sort. Now, if we wanted to get this to be better, we would have to change the index that we have on the votes table to better suit writing this as a join rather than an apply. So, we would need something that helps the windowing function first, and then helps us join after we have developed that result for the windowing function. If we do this, this actually brings our query down to about 16 seconds. One of the big things with, there are two big things with this. One is the type of join changes, right? If we look at all of the windowing function, sorry, the windowing function versions of this, oops, that’s not the right one. We did change the join type for this one with where we did this, but we did not change the much else good about it. But for all of the cross applies, SQL Server has optimized that those cross applies as a nested loops join. This can be good or bad. This is not necessarily a death knell, but this is a significant amount of rows coming out of here. Generally, with apply, you want like a small result to like drive hitting a big result in here. So, if we were, if we had fewer rows coming out of the post table, this nested loops join may have been a bit more, a bit less painful. But the bigger deal with these joins is that when we just, rather with these queries, is that writing this is just a drive join. SQL Server opens up a little bit to other join types, which can be useful. So, in this one, so like this one, in the last couple of queries, we used a hash join. And the hash join is just kind of one big seek. And since we have for this particular query form, we’ve redone our index for vote type ID, creation date descending, and then post ID last. So, we can generate our whole row number and do all this stuff in here, and then join to the result of that out here. This helps SQL, like SQL Server does just one big seek, generates one big row number, and then filters that row number out down in here. Now, part of the way that I’ve engineered these queries is to not, is to have to fully do all the work inside of this, but filter out any work before we like do anything. So, really the, the, this to show you that, that inner part of the query where we do all of this stuff is where we need sort of the most help. Now, you don’t necessarily have to use row number or a windowing function in order to find the most, or like the highest number thing, or the most recent thing, or the least recent thing, or any of that stuff. You know, something that I try to get into everyone’s head when they’re writing queries is that, you know, there are, there are many ways to, many, many ways to write queries, and you can, you, and like, you don’t necessarily have to do the same thing over and over again. You should test different forms of different queries in order to make sure that you are using the fastest one available. Just for some situations like this, your options are, of course, windowing functions. You could use a top one with an order by, and you can use aggregate functions like min and max and whatever else. So, you don’t necessarily have to do windowing functions for all this. You could use top one, you could use min and max. You should test all of those things and see which one gives you the best performance. There might even be performance differences in different situations based on the indexing you have available for all these different query forms. So, let’s just take a look at this one real quick, where rather than generate a whole row number and do a whole bunch of filtering on row numbers, we just say, give me the max creation date for all this stuff, join to that out here, and now we’re going to make sure that batch mode stays involved in here by using this columnstore helper table. And now if you look at this query, it finishes in about three seconds. So, I’m all for windowing functions. I’m all for using them when they are useful and appropriate and when they, you know, help you write a query that gets you the right results.

But as soon as you start performance tuning things, you do need to keep in mind that there are other query forms that might be faster. Sometimes it might be the windowing function. Sometimes it might be top one with an order by or top whatever you need with an order by. Sometimes it might just be using a straight up max or min or max aggregate function, crunching your data down and then using it from there. So, thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

I hope that you will take my advice about trying many different query forms seriously because most of the time when I’m tuning queries, I am, I mean, I am definitely looking at indexes, but I am also looking at how that query is written because if there are alternative syntaxes available that might help, might suit the indexes that I already have better, I want to try those before I go trying to mess around with stuff in the database. A lot of the databases that I work with are really big and creating indexes is pretty, is a, can be a pretty painful experience. So, you know, if I can avoid that, I will do that by getting other things involved, like different query formats, batch mode, blah, blah, blah, blah, blah. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I will see you in the next video about computed columns.

So, cool. Thank you. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Index Sorting

Indexing SQL Server Queries For Performance: Index Sorting



Thanks for watching!

Video Summary

In this video, I dive into the fascinating world of how indexes store and sort data in SQL Server, particularly focusing on multi-key indexes. We explore why understanding this is crucial for optimizing queries and improving performance. Whether you’re a seasoned SQL professional or just starting out, grasping these concepts can significantly enhance your ability to write efficient T-SQL code. I also share some practical tips on how to leverage indexes effectively, making sure that data retrieval processes are as swift and smooth as possible. Along the way, we take a fun detour through my database of DMX tracks, using it to illustrate key points about index ordering and query optimization in an engaging and relatable manner.

Moreover, I delve into the nuances of equality versus inequality predicates in queries, explaining how they affect the need for sorting or reordering data. By walking you through detailed execution plans and practical examples, I aim to demystify these concepts and equip you with the knowledge needed to make informed decisions when designing your database schema and writing SQL queries. Whether you’re looking to fine-tune existing queries or plan new ones from scratch, this video provides valuable insights that can help streamline your workflow and improve overall performance.

Full Transcript

Erik Darling here with Darling Data. In today’s video, we’re going to get on with our indexing series and talk a little bit about how indexes store and sort data in SQL Server and how you can use this knowledge to your advantage to make queries faster and more productive and, I don’t know, I’m sure it will meet some other KPIs along the way. But before we do that, let’s talk about how Erik Darling stores and sorts money. If you would like to support my efforts bringing you this ultra-high quality SQL Server training content, you can sign up for a membership to my channel down in the video description for around $4 a month. Of course, I don’t see the full $4 a month. The IRS has something to say about that. So you can do that. If you, you know, did I use the joke about the bully stealing your lunch money already? Well, I mean, I guess not everyone watches every video, so screw it. The schoolyard bully took your lunch money and you can do that. If you don’t have, you don’t have four bucks, you can like, you can comment, you can subscribe and help me grow this magnificent channel in all sorts of other ways. If you, you know, watch these videos and you’re like, holy cow, that Erik Darling sure does know a lot about SQL Server.

You can hire the best SQL Server consultant in the world, me, to do all this stuff for you. And I would be happy to do that in exchange for money. And as always, my rates are reasonable. Speaking of reasonable, segue right into this beautiful bit. If you would like all of my SQL Server performance tuning training at 75% off, that brings the price down to about $150 for the whole enchilada. You can, you can go to that URL and punch in that coupon code and by the grace of God, check out with a major credit card and we’re all happy people after that.

There is finally an upcoming event that has been announced. SQL Saturday, New York City 2025 will be taking place on May the 10th. What a wonderful day that is.

May in New York City is just a beautiful time of year. You know, it’s not, you have left the freezing grasp of winter and you have not yet entered the sweaty armpit of summer. It’s just a perfect time of year.

So I would encourage anyone who is feeling like having a nice slice of Sbarro pizza to come on down to New York City for the SQL Saturday event. I might even be there, who knows. And with that out of the way, let’s, let’s go have ourselves a little party with SQL Server and I should probably click on the right thing, shouldn’t I?

So, uh, order data is great for all sorts of things in databases. Uh, searching, right? If you want to search through data, it’s a lot easier if it’s an, if it’s in an order that allows you to efficiently search through it.

Uh, if you want to say group by or order by or, uh, you know, have a windowing function do some stuff, which we’re going to talk about in the next video, if you couldn’t see that in the tab name right next to this one. Uh, there are all sorts of things that having indexes in place that pre-sort your data in a useful way can help with.

Uh, but we need to talk a little bit about how indexes store data on, like for multi, multi-key indexes store data. For a single key column, for a single key column index, it’s probably pretty obvious the index is sorted by that column. But I still get questions from people like, uh, I have this index on column one.

Is that the same thing as this index on column three, two, one? And I’m like, no, column one is over here. And this one, column one is the only key column in this one.

How, how would that work? It’s kind of a thing that I still get. So, uh, those, those are, those are the fun questions that Erik Darling answers, uh, off the cuff.

So, uh, in my, uh, database called CRAP, I have created a table called albums. And this album is actually just, it has two indexes on it. Uh, we have a clustered primary key on a column helpfully called surrogate ID to, just so you understand.

It has no meaning, uh, outside of being the clustered primary key and getting us all the things that we desire from clustered indexes. Um, uh, you know, like, uh, you know, being, um, you know, uh, narrow. That’s a good one, right?

Biggins, narrow. Uh, being unique, right? It’s a, it’s an identity. So every value in there, uh, unless we do something silly is going to, uh, be a unique entry. Um, it is ever increasing, meaning it’s a, it’s an, it’s an ever growing value unless we do something goofy and we need to reseed it and then it’ll be a negative growing value or something.

Uh, and it is static, meaning that we are not going to be modifying the values in this thing. We are not, we are not allowed to directly update an identity column. We can, you know, do stuff with identity insert, but we can’t, you know, update the column.

So we have this thing, which has all sorts of great clustered index attributes associated with it. And then we have this index down here, which is a nonclustered index. And this, uh, index is on artist, release year, album title, and track ID.

Um, I guess in a, you know, I guess we could have swapped release year and album title. Wouldn’t really have made that much of a difference, uh, in the table or in the index. But, uh, for the purpose of this demo, let’s just pretend this is exactly the way we want things.

Now, what I did was I put, um, a bunch of rows into this table. Uh, and I, like this, this, this, this, this, this value’s clause is a little misleading because what we end up in, what we end up with in the table actually looks like this, where we have, uh, the surrogate ID, the artist, the release year, the album title, and then the track ID for the number of tracks that are in the album that we care about, right?

So we actually have multiple entries for each album. Now, um, in rowstore indexes, key columns are stored in sorted order in order to make it easy to navigate the, uh, the B tree that they’re stored in efficiently to locate rows.

Uh, they are not stored individually like in columnstore indexes. In columnstore indexes, you really get like a column, like an index on each column that is in the index, which is the series of little indexes, uh, on those things, uh, row groups and whatnot.

Um, but, uh, you know, like way back in the, the, the bad days of, of life, um, there, a lot of people would, uh, use the phone book analogy to talk about indexes. Uh, I’m going to update that for the modern, um, I don’t even know what generation is the most current one. Uh, whatever.

Everyone makes playlists. Everyone loves a playlist. I have YouTube playlists. I have Spotify playlists. I have playlists all over the place. I assume that other people who are far more organized than I also have playlists in their lives. So let’s, uh, let’s go back to what we have on this table.

Oops. Index wise. I didn’t frame that very well. We have a clustered index, uh, a clustered primary key on surrogate ID, and we have a nonclustered index on artist, release year, album title, and track ID. Now, if I want to sort, if I want to say select star from albums, order by surrogate ID, of course, SQL Server is able to take advantage of our clustered primary key.

Uh, it’s only 200 rows. It’s not a big deal here. Performance wise.

Oops. I hit control, but snap, snap, snap, it didn’t listen. Uh, we scan the clustered index, but we don’t have a sort operator in our query plan. We have asked for this in the order of surrogate ID, but since the clustered index put surrogate ID in the order that we wanted it in, SQL Server does not have to physically implement a sort of this data. The same thing goes for if we, um, say select star from albums, order by artist.

If we look at the execution plan, we have an index scan, a nonclustered index scan this time. You can see the non-col here. It would be nice if SSMS had the ability to show us full object names the way Plan Explorer, uh, which hasn’t seen a, I don’t know, any development in who knows how long.

Uh, it would be nice if SSMS had this ability, but, uh, for now we just, we’re just left with these ellipsis cliffhangers to figure out what happened. But notice there is no physical sort operator in this plan. Uh, we can, let’s just contrast these two plans really quickly.

Where if we were to ask for SQL Server to, uh, sort our data by release here, and SSMS is being real picky about that right there. But now we do have to physically sort that data when we say order by release here, because release here is not the leading key of any index in the table. Uh, the, the clustered primary key is surrogate ID, and the nonclustered index that we have is, uh, artist.

It leads, leads with artist. So asking for this data to be ordered by release here means that we have to re-sort the data from what the index initially had it sorted, the order that the index initially had it sorted in. The same thing would go if we used any other trailing key, like album title or track ID.

If we look at the execution plans for these, come on, SSMS, help me out a little bit here. Just make this video go, like, kind of smoothly. I’ll give you $50.

We have, uh, so asking for order by album title or by track ID, uh, in these situations, both leads to a physical sorting of the data, or let’s call it a reordering of the data from the order that the index had it stored in. Now we can order by, if we, if we say order by all the columns in the table, artist really, or rather all the columns in the nonclustered index, we’re leaving surrogate ID out of this. Now, even though surrogate ID is the clustered primary key, and it is technically included in the nonclustered index, uh, they’re like, we’re just saying that’s not an important column for our results.

Right. So like, just to make it, uh, maybe perhaps make it a little bit more obvious. Um, if we were to, if we were to say, do all this stuff and also say order by surrogate ID, uh, we would still not need to sort the data because SQL Server, uh, when you create a, uh, a non-clustered, non-unique index on a table that has a clustered index, the clustered index key column or columns, right?

If you have, you can have a multi-key clustered index end up in the, in the nonclustered index is like invisible final key columns. So we still don’t need to sort data when we add surrogate key to the order by here. All right.

So, uh, if the, if this index were unique, then surrogate ID would end up as an included column. Um, and then we would probably have to sort the data, uh, by surrogate ID because it would be an include and includes are useless for ordering, but we’ll talk more about that later. So, um, this, this, this order by is still free, right?

Like all of these columns, there was no sort in this query plan. If we were to do something like this and we were to take the leading key column out of the order by elements and just say order by these three, we are back to having to sort this data because it needs to be reordered from the way the index initially stored it. Right.

Because the index stores data ordered by artist. And then for any duplicates within artist, we are ordered by release here for any duplicates within release here, we are ordered by album title. And for any, uh, duplicates with an album title, we are ordered by track ID.

So an easy way to sort of see how this works is to include an equality predicate in our query. Because right now with this query, we are, we are effectively skipping over the artist column in the, in the, in the, in the, that leads the index, that leads the nonclustered index here. But as soon as we ask for an equality predicate on the artist column, we are able to order by these three columns for free because we don’t have to reorder anything.

We seek into, uh, our nonclustered index. We find every, uh, entry where the artist is DMX and then for, uh, then we have, because that is an equality predicate for a single value, uh, from the, the leading key of the index. Uh, the next, the next columns over like, uh, release year and album title and track ID, uh, the order is maintained in the index.

We don’t have to worry about reordering data in the index because we just did a single equality seek to this. So in the results, we have DMX and because this is all DMX here, these are all, this is all a duplicate value, right? This is just, you know, uh, 127 rows of DMX.

Uh, then release year is, is, is in order within this range of duplicate values. So we have 1998 at the beginning and we have two, oops, oops. Uh, come on, scroll bar, stay where I put you.

We have 2012 at the end. So this release year column is all in order from an ascending order for DMX. And then for 1998, we actually have two albums, right?

DMX, what a, what a beast, uh, released two albums in 1998. Flesh of my flesh, blood of my blood, and it’s dark and hell is hot. And so, uh, we have DMX in order.

We have the release year 1998 in order. And then we have album title. This is, of course, F comes before I. So this is in order.

And then the track IDs, one through 16 and then one through 19 are in order for each of these album titles. What that means is that if we were to say, um, search, uh, across, uh, artist release year and album title and then order by track ID with, so we have three equality predicates. We would find, we would seek to DMX.

We would seek to release year. We would seek to flesh of my flesh, blood of my blood. And then we would have, or track ID in the exact order that we care about it stored in the index. We don’t have to see, we don’t have to sort data here, right?

So we get that data, we get that data in order for free. Inequality predicates do not preserve data because we have to cross boundaries within the index when orders would, when, when, when an order by, when order resets, right? So, uh, if we look at this query plan, we are back to having to sort this data, right?

And the reason we have to sort that data is because we are asking for, uh, artist release year album title, uh, where the artist equals DMX and the release year is greater than 1998 ordered by album title and track ID. Now, when we had an equality predicate here, it, it wasn’t a big deal. We didn’t have to resort that, but with an inequality predicate, as soon as we cross over from 19, um, sorry, I asked for this in order by album title.

So as soon as we cross a boundary and release year, say going from, uh, 1999 to 1998, we’re ordered by album title and track ID, right? This, this, this one and this one. So, uh, this data was not in order, uh, the way we wanted it after we crossed a boundary with, for a release year.

So, uh, so like if we, another thing that’s kind of funny is that sorting, uh, with duplicates is, uh, often called non-deterministic because like, look at these results. If we were to say, um, select these columns and then order by, uh, track ID, we would get all the track IDs one in here, uh, like for this chunk. But, uh, the order of the columns is a little bit out of whack.

It’s like 98, 98, 99, 2001, 2003, 2012, 2006. So these track IDs were not sorted perfectly by anything else. If we wanted, uh, a deterministic sort order, uh, to, with like tiebreakers and stuff like that, we would need to include a unique column in there.

So if we also ask for a sort by surrogate ID, then we would get, we could have asked for, I guess, release year two, but that might’ve been a little funky with, uh, oops, a little funky with, uh, the two albums that got released in 98. But if we say order by surrogate ID as well, then we get 98, 99, 2001, 2003, 2006, 2012. So there are the way that you, uh, the order that you store, uh, or create your indexes stores the, uh, data that you put in the key of the index in a specific key, in a specific order.

Uh, if you ask for a different column to be ordered by, or if you ask for a different set of columns to be ordered by, and you don’t have a quality predicates that maintain that order, you will have to resort data from the order that it’s stored in from the index. The same thing would go for, um, like an in search. So a lot of times the way that SQL Server handles in searches and the query plan makes it look like it is just two separate seek predicates, but it’s not two seeks with like ordered, uh, results after that.

Uh, like, like if this were just, you know, like when we ran this query and it just said the artist equals DMX ordered by these three columns, we didn’t have to worry about, um, sorting this data. But when we say in Blondie comma DMX, we do have to sort that data because every time we cross a boundary, right? When we go from Blondie to DMX, the release here ordering resets.

So for Blondie, uh, I only, I didn’t put all of Blondie’s albums in here cause let’s face it, not all of Blondie’s albums are worth putting on a playlist. May have had some good songs later in the career, but at a certain point, the, the, the whole, the full, the full album experience kind of lost some gusto. So, but, um, so for Blondie, the release here is perfectly ordered from 1976 down to 1982.

But then as soon as we cross over from DMX, that ordering restarts. So this is 1998 through is going to be like 2012. So we have that data in the order that we care about in the index per artist.

But as soon as we cross a boundary to a new artist, that ordering resets in the year column goes from ascending to descending. Uh, so from lowest to highest for that artist. Now let’s look at a slightly more complicated example.

And this is what we’re going to finish up with. Uh, if we have, we say, oops, I need to, uh, go into the stack overflow database for this one, don’t I? So a slightly more complicated example, we’re going to select the top 40, uh, users from the users table where reputation is greater than, I think that’s a hundred thousand.

deprivation is great because they’re Thiessen times a million. It stands for a littleack casos and it’s three times per object in them old and that’s a your circle. But the fields of definition is could have a million times time to define schemas.

It mean, it makes the video that’s healthier because it has a million times. And theункты sampling tribunk wages are very difficult. It means they’re going to meet teams and um, but if we do a bunch of things on the basis of EstoyDeal, I’m going to feature point because we’d take advantage through time for most of the reasons. We use that to visit the标志 and including data. The mushrooms data, I used to be able a lot larger information for each member, but we have a digital tool called user- STRANGO date.

the query. Now if we create this index on the users table and we rerun our query, we don’t need to sort that data anymore because we have the index that we created has a key is keyed on reputation. So now reputation is stored in the order that our query cares about it being in.

So if we look at this plan, there is no sort, there is just a top. So we go in here, we find, we seek to the top 40 rows that we care about. And since id is the clustered primary key, we don’t need to do any additional work to sort this. Now if we look at the estimated plan for this query, where we’re adding another top in here, we’re cross applying to the top one, this is the most, this is going to get the most recent badge for any given user. If we look at this query plan, we’re going to have an eager index spool in here. That’s something that I’ve talked about in this video series as well as many other times on the channel. And one of the things that a lot of, what a lot of people do when they need to fix an eager index spool is they look at the details of the eager index spool down here. And they say something like, well, I need to seek to the user ID in the badges table, and I need to output the name and the date column from the badges table. So they might create an index that looks like this on this, on the badges table in order to facilitate, in order to get rid of that eager index spool, which is, you know, if you were strictly speaking, this, this is the index that SQL Server created to facilitate faster seeking on the inner side of that nested loops joint. If we look at this query now, we do get rid of the eager index spool, but we still need to sort this data, right? We seek in here, and then, but we, but since the date column is in the includes of this index, we do not have an ordered, an ordered avenue to where this data lives in order to make that sorting free. So let’s say if we wanted to run a slightly different query, like say, select the top one from users, and then the select their top 1000 most recent badges, right, we could do this, we get John Skeet, and I forgot to, wait, I did, what on earth? Oh, yeah, good answer. Okay, so if we look at this, we still need to sort that data, and now our sort even spills a little bit. So don’t we feel foolish? Now, if we were to change this index definition a little bit, and we were to say, select user, I’m sorry, we were to put date is the second key column in the index, then we would still be able to, we would be able to get rid of the eager index spool, and we would still be able to navigate the B tree after the nested loops join, and not have to sort that data anymore. We still have a top down here, but it’s not a top end sort. So you will serve for every time, for every trip, remember that nested loops is a per row activity. So for every row that comes, that we seek to here, like one row comes out, goes into the nested loops join, comes down here, and then SQL Server seeks to that one value, right? Every time you come in here, we seek to this, and then for every time that we seek into this table, for every user ID, the date column is now in order. So because our index is on user ID, and then date, every time we have an equality predicate on user ID, like remember, like just like with the artists and albums, we had like user ID 22656, the dates are all in order for user ID 22656. So for every equality predicate, that second key column is in the order that we care about it being in. So remember that our index now is on these two, right? User ID and then date. So if we were to select the top 100, and we were to say order by user ID and date, again, this ordering is free, we have a top, but we do not have a sort in the query plan. And if we were to say, hey, let’s select the top 100, but let’s order by these three columns now, user ID, date and name. Because name is an included column, that data is not stored in any particular order. Includes have no ordering to them, they just exist down at the leaf level. So we end up with a sort back in the query plan. So understanding concepts like this is really, really important for making queries go faster. A lot of queries have either presentation or implied order bys. You know, presentation layer order by is something like this. But, you know, implied order bys might be, you know, to support certain query plan operators like stream aggregates and merge joins.

You might need to sort data to match the, what you’re asking for in a windowing function, the partition by order by clause. You might have a top with an order by in your query, there might be all sorts of reasons why having data in order would make queries go faster. It’s important to understand how equality predicates and inequality predicates can change the way that SQL Server thinks about presenting the data in the order that you’re asking it for, and all sorts of other things.

You know, fixing sorts is not just a matter of, you know, generally making queries go faster, but, you know, sorts can ask for very large memory grants. Sorts can spill if they don’t get an appropriate memory grants. Sorts can ask for way, way, way more memory than they end up using.

So there’s all sorts of reasons to get the indexing and querying right so that you don’t end up in a situation with a slow query because of that. But we’ve talked about this long enough now. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something, and I hope you stick around for the next video on windowing functions where we will we will experience even more further joy than we experience in this one. So goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Indexed Views And Non-SARGable Predicates

Indexing SQL Server Queries For Performance: Indexed Views And Non-SARGable Predicates



Thanks for watching!

Video Summary

In this video, I delve into the world of index views and how they can be used to address non-sargable predicates in SQL Server queries. We explore two different approaches using a single query as our test case, aiming to optimize performance by creating an index view that helps SQL Server process the query more efficiently. Along the way, we also discuss some quirks of query plans and the importance of considering the broader impact on other queries when introducing new indexes or index views into your database workload. If you’re interested in supporting my work, consider signing up for a membership starting at just $4 a month—every little bit helps! And remember, if you can’t afford that, sharing this video, liking it, and subscribing are all free ways to help the channel grow.

Full Transcript

Erik Darling here with Darling Data. Hooray! We did it! We survived another day. All together. All of us. Didn’t we now? Everyone except Intel, who still… Still can’t find those drivers. Don’t like it. Don’t like it one bit. In today’s video, we’re going to be talking about how you, can use index views, we’ve got some more index view stuff to talk about, I guess, to fix non-sargable predicates. Now, we’re going to talk about this in two different ways, but with one query. So we’re going to kill two birds with one query. Might be a record. You know, Randy Johnson gets to kill birds with baseballs, I get to kill birds with queries. That’s not fair, is it? Only I were six foot nine. Anyway, let’s do that. Before we get into all that good stuff, let’s talk about how you can make me nowhere near as rich as Randy Johnson. If you would like to support the work that I do on this channel, you can sign up for a membership for as little as $4 a month. Assuming that maybe you are also not as wealthy as Randy Johnson, but if you have an extra four bucks a month, you can do that.

If not, liking, commenting, subscribing, all free things you can do to help this channel grow. And maybe, maybe if you can’t afford the four bucks, maybe you doing this stuff can, can fool the algorithm into appealing to people who do have four bucks a month. Disposable income. It’s all sorts of ways you can help. If you would like assistance with your SQL Server, I have much more reasonable rates than Randy Johnson. And I can do all of this stuff. And, you know, aside from beer gut magazine making me exclude New Zealand from, from, from this sales pitch, I am the best SQL Server consultant in the world outside of New Zealand. If you would like probably the best SQL Server training on the internet, you can get all of mine for 150 US dollars. That is a fantastic deal. Especially now that probably all those other Black Friday sales are over and, you know, jacked up to thousands and thousands of dollars a year to subscribe. For me, you get everything all at once. It’s wonderful.

No upcoming events until later this year. That’s okay with me. With that out of the way, let’s talk about these things. So I’m in the Stack Overflow database, I hope. And I’ve created these three indexes, I also hope. Yes, look, we got three errors, which means I did everything right. Now, I’ve created these indexes in an attempt to assist this query in being faster. The thing is, there are, there are two things in this query that are going to, um, uh, aid and abet its slowing down. Aid and abet its slowing down, not abed.

Aid and abed, uh, I mean, you can abet a query, I guess, but, uh, a little bit, might be a little sad. But, uh, there are two things in here making life tough on this query. We have this, we have wrapped the score column in this coalesce because a lot of people make this foolish mistake, thinking that, uh, SQL Server can’t deal with nulls correctly. Uh, if, if you have a null in an integer column and you want to know where that integer column is greater than zero, nulls aren’t going to mess SQL Server up. So you don’t need to do this. Uh, and the other thing that’s going to be tough here is that we have a rather restrictive predicate on this thing. But, um, the thing is that like the SQL Server can’t calculate this until the query has run.

Right. So when we say having some is no, blah, blah, blah, blah, blah, blah, SQL Server has to calculate this thing, right? And it has to figure out what this is by joining posts to comments to votes. And, and, and then it can calculate this aggregate and then we can filter on this aggregate here. We cannot put this in the where clause here, right? We can’t because it doesn’t get calculated until the, until the select happens and projects this stuff out. And, and then, and then, then, and only then can we think about things in the having, but what, and I think what’s, what’s funny, what’s funny here too, is that, um, and I don’t know if this is a SQL Server quirk or if this is, um, or if this is, uh, a pure, like purely something for logical processing that would happen in every database.

I know that recently Oracle made it so that you can reference column aliases in the group by, uh, so we could say group by total score here. We can say order by total score in SQL Server. But if I tried to put total score in the having clause, uh, SQL, like, um, SQL Server will say, I don’t know what that is.

So we ha we do have to repeat this expression down in here, which is, which it’s like, just as a query writer, this is annoying. Like, like Microsoft, like do me a solid help, help, help a little in these things. You know, when people complain about Microsoft, you know, SQL Server and T SQL being like dusty and antiquated dinosaur languages, this is the kind of crap that like is just annoying to people who write queries.

So Microsoft could do more in this regard, but, um, we run this query and we, we have query plans turned on. So we don’t have to, we don’t have to be worried about running it twice. Uh, this will take about 10 seconds and we’ll, we’ll look at the plan briefly because quite frankly, the plan isn’t terribly interesting.

Uh, but here’s what the plan does. We scan the post table that only takes 900 milliseconds. We scan the comments table that takes a second and a half.

We scan, well, we don’t scan the table. We scan the nonclustered index and we scan the boats table. That takes 1.6. Ah, that takes 1.6 seconds, 1.4 seconds. Uh, and then like, like I said, the query goes on to do other things.

Uh, six seconds, nine seconds. Uh, I think maybe this, no, that’s still row mode. Um, so this is this, if, if you want to talk about other stuff that makes query plans real confusing, we could start here.

All right. And let’s, let’s look at this. So, uh, everything for operator times looks good enough going across to here.

All right. Uh, except, I mean, this doesn’t make any sense because 1.4 seconds and then 600 milliseconds. So that, that’s not great.

And then 1.6 seconds and then 1.1 seconds. But like I’ve said, I, I, I usually disregard the, the, the parallelism operators when I’m looking at query times because like the time’s often not going to make a lot of sense. Unless you go from like a really small number here to a really big number here, then, then I might pay attention to it.

But going from a big number to a small number, I’m just like, you’re on crack. Like this is, this is not working out for you. You should, you should think about something else, but let’s focus in over here.

Just this last portion. We have six seconds. And just to make sure, cause batch, like, like you, you might see some operators show up sometimes in batch mode and sometimes in, and in for batch mode operations, uh, SQL, SQL Server does this funny thing where, uh, it, like batch mode operators only track their execution time in wall clock time, not, not in CPU and wall clock time.

But row mode ones, it’s supposed to be cumulative going from right to left. So these numbers should add up, like, like just get higher across the board. So like that does happen here from nine to 1.1 to 1.2 to 2.8 to six.

So this, this top section is fine. This section’s a little cracky, right? Going from 1.4 down to one point, down to 599. And this section is a little cracky going from 1.6 to 1.1.

But, but like we can, we can at least sort of like add those numbers up and get round about six here. Where, where things get real good and cracky is, uh, we go from six to nine to 5.9 to 6.6, 6.6 again to 13.8 to 10.2. I realize that it’s winter time, January-ish.

But if, if Microsoft were, were perhaps looking for, uh, issues to work on in a few months when the next batch of summer interns come in to work on SQL Server, this, this might be a good portion of the code to address. Because this doesn’t make any sense. Right?

Especially, uh, this last repartition streams. Like, if you look down under this, this armpit, if I put this, this arm up a little bit, and let’s just zoom in way down here. Uh, this thing ran for 10 seconds.

Right? We, we can see the 10 right in here. And I had to draw a square on my own because this hand is what hits the button to make it draw rectangles. So we have 10 seconds over here.

But this gather streams might make it look like this query might have taken a sneaky, like 14 seconds. But then if we go and we look at the properties and we say, look at query time stats, we can see that like this agrees with the final operator in the plan. So things got real weird in, in here.

Like, uh, no arguments from me. Uh, so be careful when you’re reading these query plans. But anyway, this thing takes 10 seconds. And part of, you know, why this thing takes 10 seconds is we have this expression here.

Right? Expression 1 0 0 3, which matches up with where we’re looking for scores in the query down here. So like I said, SQL Server has to calculate this whole thing first.

Sometimes you’ll see filter expressions for having clauses. Sometimes you’ll see them if you’re filtering on like a windowing function, like row number, rank, dense rank, blah, blah, blah. Um, other times you’ll see filters if you do goofy things like left join where some column is null.

Uh, other times if you write really complicated predicates, SQL Server won’t be able to push those predicates down to when you touch tables. But like, this is like a legitimate need for having to calculate the whole result and then filter out what we don’t want. So let’s create an index view.

That takes care of our problems there. Right? We’re going to create this. And shockingly, creating this index is going to take just about the same amount of time that running that query took. That might be something you want to keep in mind if you’re going to create an index view.

Because creating the end, the query plan for creating the index view is going to look shockingly like the query plan for running the query. Right? Because we have to do all this stuff.

So you have to be real, real good and mindful when you’re, when you’re creating index views that, um, the query plans that create the query, the query plan to create the index view was pretty good. Because like we talked about in the last video, modifications are going to start getting painful. Right?

When we have to maintain that index view, modification queries that used to be real fast might get real slow. So you have to pay good attention so that when you add index views to your workload, the, the query that creates the, that maintains the index view is not awful because your modification queries are going to be awful. So that’s on you though.

This is the thing you have to know and understand. This is not the fault of the index view. This is the fault of you. You have to pay attention to this stuff. You are the query tuner. When you introduce things into the database to make queries faster, you have to make reason, take reasonable care and caution that they do not make other queries much slower.

Okay. That’s, that’s a you. So now when we run this query though, this query will be very fast.

Right? If we look at the execution plan for this, now hitting that index view and doing all that stuff takes 73 milliseconds. And of course, SQL Server chimes in and says, Hey, I think I can reduce the pain of running this by 50%.

Do we need this to be 50% faster? We just went from 10 seconds to 73 milliseconds. Now SQL Server chimes in and is like, Oh, I’ve got an idea.

Stick with me. Another index. Eh? Eh? Eh? Eh? Come on. So, uh, coming back to something I’ve said a lot. If, if you’re the type of nitwit who spends all of their time looking at missing index requests and just adding those in, uh, God should smite you.

That’s, that’s the end of it. All right. Cool. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. Uh, I hope you had some, I hope you had some, some real fun in this one.

Because there was, there were a few fun moments, I think. There were some, there were some interesting things in here that, uh, uh, make, make my day to day life just a joy when working with SQL Server, just a real joy. Um, and, uh, let’s, let’s, the next video we’re going to talk about is, uh, uh, well, clearly, uh, you can see this is number nine.

Number 10 is going to be about, uh, using indexes to, to fix sorts and query plans. And we’re going to, we’re going to have some fun stuff to talk about there too. So, uh, thank you for watching and, and, and goodbye for now.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.