When Subquery Query Plans Are Suboptimal In SQL Server (And How To Fix Them)

When Subquery Query Plans Are Suboptimal In SQL Server (And How To Fix Them)


Video Summary

In this video, I return after a three-week hiatus to Paris, sharing my experiences with you as I recorded content before embarking on my trip. The video delves into the nuances of suboptimal subquery plans in SQL Server, explaining how these plans can lead to excessive row returns and poor query performance. I walk through an example where a query initially returns 4.8 million rows, far more than necessary for practical use, highlighting the issues with row goals and estimated versus actual row counts. By tweaking the query and experimenting with different hints and index usage, we explore how to optimize these plans for better performance, demonstrating that sometimes subqueries can be effectively tuned rather than outright avoided.

Full Transcript

Erik Darling here with Darling Data, and this is my first video back after a three-week hiatus in Paris. And I know that it was seamless to you because I recorded three weeks’ worth of content before going away. Intel is back up to its old tricks. Not being able to find drivers. Some things never change, I guess. But if I look any different or I sound any different, if I have perhaps developed a slight French accent, that’s why. So anyway, today’s video, we’re going to get gently back into things and talk about when sub-query plans are suboptimal. I know, happens to the best of us, especially when you have to deal with SQL Server being its silly self. But anyway, let’s carry on with a little bit about how you and I can cooperate together. If you would like to become a member of this channel, a paid member, you can join like 50 other people who have been so kind as to donate $4 a month to the content that I produce here. There’s a link right down in the video description where you can do that. Very easy, very clickable, like me. If $4 a month is just too rich for your blood, wherever your blood may originate from, you can like, you can comment, you can subscribe, and you can make different numbers go up in different ways.

that make me a happy, healthy human being. If you need help with SQL Server, if the things that I talk about during these videos, ring a bell, and you’re like, wow, my SQL Server could use some help in that area. I do all of these things and more. And as always, my rates are reasonable. If you would like some more reasonable stuff from me, you can get all of my training content, performance tuning training content, I don’t have anything else, for about $150 USD. It’s all available at that URL up there, with that discount code down there. And of course, as usual, there’s a very clickable link down in the video description for you to do all that. Upcoming events, there is still SQL Saturday, New York City 2025. Taking place May 10th at the Microsoft offices in Times Square. So if you haven’t been robbed by someone dressed as a Sesame Street character in a while, or someone dressed as Spider-Man in a while, come to Times Square, come to Times Square, get your money’s worth, or lose your money’s worth, I don’t know, whatever. There’s probably a joke about congestion pricing in there somewhere. I just don’t know what it is yet.

But anyway, let’s talk about these suboptimal subquery plans. Now, I’m going to show you the starting version of this query, because there are some important things in there for you to understand about it. But this is not going to be the final query that we end up with. Now, this query runs for about 12 seconds, and it returns 4.8 million rows, which is far more rows than you will ever need to return to an end user for them to do anything reasonable with. Way too many, right? Way too many, right? It’s an insane number of rows. Why would anyone ever need that many rows for anything, right? Unless you’re going to put it in an Excel file and do something else with it, right? It’s just an absurd number of rows.

So this is obviously not a very good example of when subqueries have problems, because we return a lot of rows, and stuff like this part of the plan are completely sensible. So these two index seeks into the votes table, which have absolutely correct estimates, right? Absolutely. We had 100% on both of those, come from this part of the query, right? We have an upvote count and a downvote count. And of course, upvotes are signified by a vote type of 2, and downvotes are signified by a vote type of 3.

So, like, this is the part of the plan that we’re going to concentrate on for now, or rather for most of the video. But then I just want you to understand down in here is where the rest of the stuff happens. So, like, in the rest of the query, we have another subquery that goes to the badges table, and another subquery that goes to the comments table.

And down here in the query, we join users to posts, right? So there’s some stuff going on in here. But all of that, like, the two things that hit the votes table are up here, right? So, like, the order that we wrote the query in is not the order that SQL Server arranged the query plan in, which is going to come in handy for us to understand later.

And then down in this part of the query, this is where all the rest of the stuff happens. We hit badges, we hit users, we hit posts, and we hit comments, right? And SQL Server, for better or worse, has chosen parallel merge joins for all of these things.

But we don’t necessarily care about this part. Most queries that you’re going to have to tune are going to limit rows in some way, whether it’s by top or offset fetch. And for the sake of typing, we’re going to change that up here to top 5,000.

Oh, not 54,000, top 5,000. And that’s going to be nice and sensible. The thing is, when we change the query to the top 5,000 version, things don’t necessarily get all that much better.

So let’s run this, and let’s wait very patiently for SQL Server’s suboptimal optimizer to come up with a query and query plan for us and return some results. So the query that I ran that returned 4.8 million rows, the query execution time was about 12 seconds. And we had a big parallel plan with lots of joins and stuff.

In this plan, SQL Server has changed its mind about parallelism. It has not changed its mind about this part of the plan, though. Notice we still have a merge join.

The two subqueries that hit votes are still up here. The other funny thing that has now happened to our query is we have very, very bad estimates in here. But these are not bad estimates in the typical sense.

They’re not like SQL Server had bad statistics or out-of-date statistics. Of course, I just created these indexes, so the statistics are full scan up to date. The poor estimates here come from row goals.

Now, whenever you do something like top or offset fetch, sometimes exists or not exists. There are all sorts of things in SQL linguistically that add something called a row goal to your query plan. And if we right-click on these operators, what you’re going to see, not in the thing up here.

This is where it doesn’t show you anything useful. But down in here, we have an estimated rows without row goal. And this estimate is correct, right?

See, this is the right number of rows. What SQL Server made a bad estimate on is how many rows it would have to read out of here in order to get, in order to meet the row goal that we set by saying top 5,000. So, and it did that for both of these, right?

So, if we look at this one, this one will also have an estimated rows without row goal. And this is estimated rows without the row goal. And that is absolutely correct. But where things fell off was, of course, between the estimates and the actuals.

That’s a little bit easier to see in here, where you have the number of rows hiding somewhere. And then the estimated number of rows. We have the actual number of rows read up here.

Sorry about that. Got a little bit lost in my own thoughts. And then we have the estimated number of rows to be read. But this is, of course, without the row goal, which it doesn’t say. But the estimated number of rows is 3,300.

But the actual number of rows is 2, 1, 4, 1, 8, 1, 9. That is a seven-digit number of rows. So, we read a lot more rows than we thought we would. The row goal is kind of messed up in that.

But none of the other index accesses in here, if we look at the properties, at least, yeah. So, this one does. This one does not.

This one does not. And this one should not either. So, none of these have the estimated rows of the row goal. The post one does. So, where things get interesting is that we got a, probably when we shouldn’t have got a serial execution plan there.

Now, I’ve created an index on the post table, which it should have used. Because this index fully covers everything that we want to do. But SQL Server costed this index out of existence. Now, let’s look at the estimated cost for this plan.

Right? SQL Server thinks it would cost 48 query bucks. So, my cost threshold for parallelism is 50. So, this costs just under the cost threshold for parallelism.

So, SQL Server was like, ah, not quite parallel plan territory. Also, it didn’t use my nice index on the post table. On the post table, it uses the clustered primary key.

Right? So, for some strange reason. And that strange reason can be revealed by adding a hint to the post table. Now, the index that I created up here is called P.

So, we’re going to, down here, we’re going to say with index equals P. And then, we’re going to rerun this. And I’m going to show you something kind of interesting.

Because sometimes these are the wrestling matches that you have to have with the optimizer in order to get queries to perform well. Now, this query plan is parallel. And this takes, well, I don’t know.

Let’s just say about half as long. The last one was like 9 point something seconds. This is 5 and a half seconds. So, this one’s about twice as fast. Not quite. I realize my percentage math is not great, especially on the spot.

But if we look over here, the part that SQL Server got this query expensive enough to be eligible for a parallel plan was using the nonclustered index. And SQL Server deciding that it needed to resort the data that came out of that index to make this join more efficient.

So, it put the data that came out of our nonclustered index in order by the ID column in the post table. So, it reordered the data.

And now this query plan, actually, I forgot to show you this, has a cost of 596 query bucks. So, this is why SQL Server didn’t choose the nonclustered index in the first place. SQL Server thought that choosing the nonclustered index and sorting that data would make things too expensive, was way more expensive than just using the clustered index.

Of course, we can see a big difference in execution time here, right? So, again, about twice as fast when we get a parallel plan. We still have problems up here, though.

The problems that we have up here are that even though we’re doing seeks into these, right, we’re still doing seeks into our index, the row goal is still making a bad guess at the number of rows it’s going to take to satisfy the row goal set by the top 5,000, okay?

So, one thing that you can do if you want to affect how SQL Server is doing things and you’re very confident not only in your indexes but also in the plan shape that you’re after generally is you could say option loop join.

And what SQL Server will do is give us the plan shape that we would really want from queries that perform sub-queries. In other words, we don’t want SQL Server doing this part of the plan before we’ve done the part of the plan that actually does all our filtering, right?

We don’t want SQL Server doing this stuff. So, if we run this query again with the option loop join hint, we’re going to get the plan shape that we’re after. The plan shape that we’re after comes up here where all the stuff that we’re like doing our initial filtering on, right?

Like this is really the crux of our query. Selecting from users, joining the posts, and then filtering the post table on these creation date columns. So, this is really where we want SQL Server to start and narrow down the rows as much as possible.

And this is the plan shape that we’re after where taking that result, we now do loop joins across all of this stuff. And this query takes about one second, right?

That’s much better than the five seconds that it took when SQL Server chose a bad plan shape. And that’s much faster than the, well, I guess it was the same plan shape. So, we’re going to say this is a much better plan shape generally for our query, right?

And I forget, actually, you know what? We’re going to do a little experiment here. We’re just going to see what happens when I take that index hint out of the mix.

Let’s just do a little experiment because I actually forget what happens. So, this one takes actually 1.6 seconds. So, this one’s slowed down by about half a second.

And SQL Server goes back to scanning the clustered index on the post table. So, we really, but we still get a parallel plan, right? So, at least as far as query tuning goes, this is still better than what we were getting before. I’m going to leave this index hint in because I generally like the speed that I get with this index hint.

If you are not confident that the query plan shape should only ever use loop joins and you’re thinking to yourself, well, I want the optimizer to be free to choose other types of joins. One thing that you can do is set more row goals.

Now, let’s think logically for a minute about how many rows a subquery can return. One, especially when we’re doing an aggregate like count or sum or average or min or max or something.

Those are all things that have, that really just return one row. If a subquery ever returned more than one row, well, guess what? Throw an error. So, what we can do is rather than say SQL Server, you always have to use loop joins. We can say, SQL Server, we want more row goals.

Give us our row goals. And if we put a top one in both of these, SQL Server will finally see the light and will give us a query, give us the query plan shape that we want without having to say you always have to use a loop join.

Now, SQL Server does naturally choose loop joins here. But now, if we look down to where we access the votes table, we do have a top for both of these, right? It’s a tiny little top here and a tiny little top here.

And we have the same index seeks that we got before. Now, granted, the estimates in this query is still off a little bit, right? Like, I mean, we’re under by 10% here. We’re over by 43% here.

We’re over by, wow, like 1,700% here. We were, you know, over by 5,500% here. We’re over by 24% here.

Math, math, math, math, math. But we get a fast enough query with a good enough plan shape that we want. So whenever you’re tuning queries or whenever you’re, you know, having to deal with someone who’s like, wow, subqueries always suck.

Never use subqueries. Sometimes they just don’t know enough about databases to be able to say that. And they don’t know enough about query tuning to be able to, you know, make that make that judgment across the board about every subquery ever.

Sometimes you just need to give subqueries a little bit of help. So in the future, whenever you’re looking at LinkedIn or whatever other social media you like, and, you know, some, you know, data engineers like starts posting one of those, you know, LLM lists of top 10 things not to do with a query.

And you see things like avoid distinct and avoid select star and like the same list of things and like avoid subqueries, use CTE. You can just firmly know in your head that they are complete ding dong idiots, regardless of which giant company they work for.

Specifically, probably meta, which has like the highest quantity of idiot data engineers on the planet. You can say to yourself, hey, you’re wrong about all that stuff. Because Erik Darling taught me that you’re all wrong duty heads.

So you have that to look forward to. And you can, you can point them to this video where I will, I will gladly, I will gladly fight with them. Anyway, sometimes some queries just need a little bit of help.

Sometimes you have to grease the wheels a little bit, but you can always make a query better and faster. And it does not always involve taking subqueries out or adding in CTE or whatever other nonsense people tell you about on the internet.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I hope that you will pick fights with data engineers online because it’s fun to do because they’re all cock-a-doodie heads.

Anyway, cool. All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Better Filtering With YEAR and MONTH Functions In SQL Server

Better Filtering With YEAR and MONTH Functions In SQL Server



This week, I’ve got a coupon that offers an even steeper discount on my training than the one at the bottom of the post.

It brings the total cost on 24 hours of SQL Server performance tuning content down to just about $100 USD even. Good luck finding that price on those other Black Friday sales.

It’s good from December 23rd-27th, or until I remember to turn it off. Hurry along now.

Thanks for watching!

Video Summary

In this video, I delve into the world of better filtering techniques in SQL Server queries, specifically focusing on how to handle year and month functions more effectively. Often, I encounter queries that dissect dates into their constituent parts (year and month) for comparison, which can lead to performance issues and slower execution times. By demonstrating a cleaner approach using the `DATEFROMPARTS` function, I show how to avoid these problems and improve query sargability. This method not only makes your code more readable but also ensures that SQL Server can leverage existing indexes efficiently, leading to faster execution.

Full Transcript

Erik Darling here. Guess who? Darling data. That’s what science says. In today’s video, we’re going to talk about better filtering with year and month functions. Now, I run into a lot of queries that do something like take a year and a month, very rarely a day, but usually year and month, is either parameters or parameters or sometimes they even go so far as like to dissect a date and pull the year and month out. I don’t know why. It’s blasphemous, but they do it. And then they take some columns in their table and they like a date column or a date time column and they say where year, month, that column equals the month thing and then where year column equals the year thing. And without fail, those queries suck and are slow and have problems and I have to fix them. So I’m hoping that by recording this video, I can avoid having to fix those problems in the future. Great. How much would you pay for that? How about four bucks a month? You click the link in the video description. You can sign up for a membership and get all of this for four bucks a month. You can sign up for a membership and get all of this for four bucks a month. You can sign up for a membership.

a month or more if you choose to be more generous, but four bucks a month is the minimum. Anything less than that, I don’t know. What do you do with it?

I live in New York. Four bucks technically just allows you to take a breath. If you are just a fan of this content, you can like, you can comment, you can subscribe.

I’ll thank you in some other way. If you are looking for help with your SQL Server, perhaps with some performance tuning issues like with wrapping year and month in columns, I am best in the world at all of these things.

You can hire me instead of some idiot who doesn’t know what to do and who gives you bad advice. So that’ll be cool, right? Imagine that you’ll actually get stuff done. If you would like some very high quality, very low cost SQL Server training that teaches you things like I’m about to teach you now and more.

You can get all of my training for about 150 USD with that discount code. And would you believe that a link directly to go to the training site and apply that very discount code is in the video description.

So, again, it’s the end of the year. I’m not doing anything. Talk to me in 2025. With all that out of the way, let us put on our party hats and commence the partying because that is what we are here to do.

So, this is the type of query that I see people writing quite a bit. Oh, is ZoomIt? No, ZoomIt is just a little bit delayed. This is what I see people doing quite a bit.

Now, I think that part of why they don’t do things correctly is because sometimes figuring out date things is hard. And I admit I also didn’t want to figure out date things.

So, what I did is if ZoomIt will finally just go away, is I looked at Stack Overflow. And Stack Overflow had a pretty good answer for how to figure out if you’re in a leap year. So, here’s what I’m doing.

I have an index on the votes table on creation date. I just didn’t want that to accidentally run again. And then up here, I’m assigning a year and a month. Let’s just pretend these could be parameters.

They could be things that you dissect from a date that gets passed in. Whatever you want to do with it. It doesn’t matter to me. And then in here, I’m figuring out if we are in a leap year. So, if we’re in February, then we start with, to find the last day of the month, we start with 28.

And then if these things are true, we add 1 to the 28 or else we add 0 to the 28, which makes sense. Because you don’t want to add anything if it’s not in a leap year. And then there’s a little bit of figuring out down here to say, if your month is 4, 6, 9 or 11, then you have 30 days in your month.

If you are not month 4, 6, 9 or 11, then you have 31 days. All right. So, we’ve got that part figured out.

Great. Great. Wonderful. So, this is really a much better, oh, why are you moved over? How did that happen?

Oh, that just looks silly, unprofessional, didn’t it? That was terrible. Just fire whoever wrote these scripts. Really, a much better way, a much cleaner way of doing this. And again, this comes down to the concept of sargability, of being able to take whatever your search arguments are and apply them easily to whatever index you have in place.

Of course, for us, we have an index on the creation date column. If we do stuff like this, SQL Server has to run this function for every column, figure out, pull out the month, pull out the year, and figure out if that month and year is equal to the month and year that we have passed into the query. If we do this instead, using the wonderful, magnificent date from parts feature, we can assemble a date out of the year and the month.

And to find the start of the month, we just put in 01. And to find the end of the month, we just use that day variable that we used up there. Now, because I am using local variables, and local variables have lots of performance side effects, particularly usually because of poor cardinality estimation, in order to shortcut that issue, in order to force the parameter embedding option that SQL Server has in its wonderful engine, I am using the option recompile hint here.

If these are big queries that run occasionally, and you are okay with them recompiling every time, please use a recompile hint. There is nothing wrong with it. If this is code that you want to be production ready, awesome code, you may want to explore two other scenarios. One might be to have a store procedure that accepts the local variables that you declare and assign values to as parameters, because then they are treated like parameters, because they are passed in to that store procedure.

Or, you could use dynamic SQL and parameterize dynamic SQL, which is the only safe kind of dynamic SQL, and you could pass in the local variables that you declare and set values to as parameters to that dynamic SQL, thereby transmutating them through the alchemy of parameterization into parameters. Wonderful. Things that you would just never think would work, but they work.

They work well. And of course, if we run these two queries, I am just going to hit execute on this. These both come back relatively quickly, but the first one comes back relatively slower.

This one takes a full second to run, has to scan the entire index that we created on the votes table. That is this thing right here. And if we hover over the operator here and we look at the tooltip, we actually have something kind of funny happen.

Year and month are really just sort of shortcut synonyms for the date part function. Right? So, you can see that we have evaluated…

Oh, gosh. Where did my cursor go? There it is. The date part month right here and the date part year right here. And we have said where the date part month equals three, just like we assigned to that value.

And the creation date and the year 2013. Oh, that’s nice. We still had to scan that index though.

We needed a parallel query for it to be fast. If we wanted to put these on sort of even footing, and I’ll do that in a second, we could run this query at max stop one to see how long this takes. Because down here, we do have a serial query that finishes in 178 milliseconds.

Right? And granted, the plan is a little bit more complex with all this stuff in it. But in this case, it’s okay because each one of these constant scans is really just creating a row for the date from parts function that we’re feeding in here.

And if we look at this, we just have a range seek to for expression 1008, which is the date from parts function that we created from year month 01. And then down in here, we have the end of that for expression 1009, which is the date from parts function that we assembled for year month day. Right?

So when we found the end day of the month, that’s where that was. So let’s actually go back and let’s redo that. Let’s say that we want to put these on equal footing. Right?

We want to make sure there’s no cheating. Parallelism clearly made things faster and better for that first query. Right? So let’s look at just how long, like single threaded each of these takes. Right?

Let’s look at these execution plans. This takes a full five and a half seconds now. So without the benefit of parallelism, this takes 5.6 seconds. Right?

We still have to scan that entire index. It’s still doing the same thing. And this query down here is still able to, in just under 200 milliseconds, do that same work. So if you’re out there tuning queries, and you see something like this, well, stop doing it. One.

The first step, stop. Second step, start. This is a much better way of approaching this. Now, this sort of thing won’t work if you have to do something real goofy, like the year and month from the date value in one column equals the year and month from a date value in another column. Maybe I’ll do a video about that someday, but no promises right now because my time is dedicated to other things.

You might notice that I have about 11 tabs open up there. I’m going to be recording some videos about stuff that I find interesting, about how to write query, how to write T-SQL queries correctly, how to use different things in T-SQL to write queries in the best way possible. So I’m going to be recording those.

Maybe I’ll get around to that one as soon as I can, but I got a lot of stuff to talk about before I get around to that. So anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I hope that this finds you in good health because nice, happy, healthy people, they want to buy training and spend time with young, handsome consultants and gosh darn it, that’s me. All right. Cool.

Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

The Great Integer To Bigint Compression Swindle

The Great Integer To Bigint Compression Swindle



Thanks for watching!

Video Summary

In this video, I dive into the often-overlooked complexities of converting an integer column to a big int in SQL Server without downtime, debunking some common misconceptions. I explain why many blog posts and articles might lead you to believe that simply adding compression is enough for such a change, but reality paints a much more intricate picture. I share real-world scenarios where this conversion has caused significant headaches due to dependencies and other constraints that aren’t always immediately apparent. To help viewers navigate these challenges, I also highlight Andy Malin’s detailed blog post on the subject, offering a practical approach using shadow tables and triggers for minimal disruption.

Full Transcript

Alright, I lied a little bit. I had one video left to record that I forgot about. And I’m going to do that real quick because I just want it off the board. This video is about the great integer to big int compression lie. There’s a lot of blog posts out there that talk about this neat trick if you want to convert a column from an integer to a big integer with no downtime and all that. You know, oh, it just, it happens instantly, blah, blah, blah. There’s a lot of caveats to that that don’t get talked about in these blog posts. A lot of people have fooled you into thinking that it’s as simple as just, oh, just, you know, make an index with compression and all of a sudden it’s done. It’s not, not quite true. There are a lot of things that could be, a lot of other things that can get in the way of what, what seems like a very simple thing. And all these other blog posts. But before, before we do that, I need to tell you the truth about a few things. Like, like the fact that you can become a member of this channel for four bucks a month and support the fantastic unparalleled work that I put forward on YouTube to help educate you about SQL Server stuff.

There’s a link right down there. If you, if you don’t have the four bucks or more, depending on your, your, your feelings of generosity, you can like, you can comment, you can subscribe and the channel will grow in other important ways. You know, there’s, it’s not just money. What do you call it? Influence. My sphere of influence will grow. If you need help with SQL Server stuff, including making columns go from integer to big integer, you can hire me to do all of these things at a very reasonable rate. Uh, and I’ll be very happy to help you. So we’ll, we’ll both have these things to look forward to. Uh, if you would like my training, you can get all of it for about 150 us dollars at that site up there using that discount code.

And again, there is a link for that right in this area somewhere, uh, where, where you can just click on a fully assembled thing and just show up and start using it and watching it and getting smarter and being a better human being and all that other good stuff. SQL Saturday, New York city, uh, coming up in May 10th, uh, at the Microsoft offices in times square. Uh, if you would like to come see me and a whole bunch of other smart folks talk about SQL Server data stuff in general, uh, then you can buy a ticket. It’s wonderful. All you have to do is a little search for a SQL Saturday NYC 2025. Guarantee you’ll find it. But with that out of the way, let’s talk a little bit about how these people hornswoggle you into thinking that compression, uh, makes this change easy. So I’m going to create a table here.

And the, the column that we’re going to care about is this integer column, right? ID. And we’re going to have a clustered primary key on that column. And then we’re going to have some other indexes that reference that column along with one index that does not reference that column at all. All right. So let’s create this table and we’re not even going to put any data in this table because I just want to sort of show you the complexity of what we’re dealing with.

Now, uh, all of the indexes on this table, the clustered primary key and all of the nonclustered indexes have compression on them. So this should be immediate and instant. But if we run this, run this author statement and try to have, um, try to make that change. Oops. Whoopsie daisy. Uh, you’re going to see this error here. Uh, this object, uh, this index and this index and this index and this index are all dependent on this column.

So we can’t just change this column. This is just indexes on the table that we care about. So imagine if we have other stuff that references this column, say foreign keys, uh, indexed views, uh, any, anything like that. If you have, um, if you have objects created with schema binding, like functions or views or something like that, um, you, you could, the, the, the, the, the, the making this, making this change to this column is far more than a case of just saying, oh, I added compression and now I can do it.

No, it’s not. It’s a lot more involved. If like in, you know, uh, if, if you’ve got a big table and you need to get rid of indexes or you need to remove a clustered primary key, uh, from that, table, whoo, boy, you’re not going to have fun. So we would need to drop the constraint, right? The primary, the primary key constraint, which on a big table, not a good time, right? Converting that table to a heap, uh, and then drop the one, two, three nonclustered indexes that we would care about here, right? That, that, that, that, that we got the error for. And then now finally we could make this change. But again, this doesn’t take into account any other relational dependencies on this column that might exist in your database. If we had more stuff going on, this just wasn’t just like a single table that I’m creating for this demo in my useless crap database.

We would have to do a lot more work to make, to make this happen instantly. Then after, you know, at the end of that, we would have to go and add everything back. This whole thing right here, like this whole thing would be fully logged because you would have to get rid of the clustered primary key. And then adding it back, adding these three indexes back, those would also all be fully logged. So is it really worth going through all the trouble with compression to change a column instantly?

It doesn’t feel very instant to me. And this is something that I’ve had to actually help people with, not just write a blog post about that was like, oh yeah, look, it can do it. No. Doing this in real life is a lot harder and a lot more complicated. If you want to see a good blog post about how to do this with very little downtime, don’t judge me. I don’t use Bing. I don’t use Edge very much on this thing. But if you go to Andy Malin Alter Online, Andy has, and I’m going to put this link in the video description.

Andy Malin has a blog post that I think, I think, how long ago was this? 2017. And he talks about, wait, is this the right one? No, this is rebuild a very large primary key index. Man, Bing sucks. Jesus. All right, there it is. Changing a column from int to big int without downtime from 2019. Thanks, Andy. So I’m going to put the link to this correct blog post that for some reason Bing thought should be second in the results.

Oh, man. Microsoft keeps thinking it can compete with these things. Like, it thinks Bing can compete with Google. It thinks Fabric can compete with Databricks. It thinks Azure can compete with AWS.

And, oh, boy, you just waste everyone’s time and money with all this nonsense. So Andy has a great post where he walks through creating a shadow table, creating a trigger to move stuff over, and then how to do the swap and migrate data and everything. So I’m going to put this in the blog post because this is a far more reasonable approach to making the change.

I’m actually going to copy it right now in front of you because I don’t want to forget. So this is a much more reasonable change that you could make. Rather, a much more reasonable set of steps to take to make the change than just saying, oh, I added the compression.

No, it’s done. Because, man, I’ll tell you. The number of people who have needed to fix that and tried to follow the compression thing and just ended up with a sea of red text errors is huge.

I’ve had to get involved with these situations a bunch of times. No one’s very happy about it because they’re like, I read these blog posts. All these smart people told me all I needed was compression.

But no, no, no, no, no. You need a lot more than that. You need to ditch all those dependencies before compression kicks in to the point where it can help. Now, to be, I mean, not even to be fair to the other blog posts, but to be fair to people who want to make this change, Microsoft, this should just be a metadata-only change anyway.

Like, all you have to do is start accounting for new stuff in the metadata as a big int and, like, and say, like, allocate bytes based on that. And then just say the old stuff is now a big int. But even though you know it’s an int that only takes up four bytes, it doesn’t matter until you, like, rebuild or, like, even maybe even reorg the index or something like that.

Like, there’s just no reason for this to be as painful as it is. It’s pretty stupid, actually, that we have to go through this much trouble and the compression would even, like, come into the picture with all the dependencies and stuff that you still need to get rid of. So, I don’t know.

Maybe eventually someday we’ll get some pain relief in this area because, you know, integers aren’t getting any smaller. People, data isn’t getting any smaller. People keep adding new stuff in.

And, boy, this is just going to get worse and worse. So, hopefully, Microsoft addresses this major scalability flaw, stops wasting money on things like video games, search engines, crappy cloud services, and fabric, and lousy AI gamble. I don’t know.

You know, maybe put some money into your core products and stop asking and not stop treating people who have already bought it as, like, screw you. You’re already stuck with us. You know, like, help people who have had your product and been paying for it for a while out a little bit. It’s a nice thing to do.

At some point, you’ve got to dance with who brung you. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. Now I’m really taking a break and shutting everything down because I have completed my ring cycle. So, goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Saturday New York City 2025 Is Coming Your Way In May!

You Know You Wanna


After a brief hiatus due to some paperwork difficulties, SQL Saturday NYC is back in action this May 10th!

The event will be taking place at the Microsoft offices located at 11 Times Square, New York, NY 10036, with a full day precon session on May 9th.

See you there!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Computed Columns

Indexing SQL Server Queries For Performance: Computed Columns



Thanks for watching!

Video Summary

In this video, I delve into the world of computed columns in SQL Server and how they can be used to enhance query performance through indexing. After recently completing a series on writing efficient SQL queries and another on indexing for performance, I decided to bring these concepts together to explore computed columns in detail. I discuss common pitfalls when using computed columns, such as concatenating year and month fields without proper data type handling, and how converting the result to an integer can provide more predictable behavior. Additionally, I explain the importance of making computed columns deterministic for indexing purposes and demonstrate how to create indexes on these columns effectively. By sharing this knowledge, I hope you gain insights into optimizing your SQL Server queries and improving their performance through strategic use of computed columns.

Full Transcript

Erik Darling here, Darling Data, and you may notice that I look a little bit different than I did in my last video because I got a haircut. I don’t know exactly how I feel about the haircut at the moment. It’s not precisely what I requested, but we’re going to see what happens when this thing grows in. I think I need to sharpen myself up a little bit. Oh, that’s blurrier. There we go. I think that’s maybe, that’s a little bit better there. I look like I’m going to see what happens when this thing grows in. I think I need to sharpen myself up a little bit. Oh, that’s blurrier. There we go. I think that’s maybe, that’s a little bit better there. I think that’s like less of a weird roll on deodorant person with that properly adjusted there. So in this video, we are going to talk about sort of an intersection of concepts. I have, I’ve recently, most recently been recording a series of videos about indexing SQL Server queries for performance. And before this, I did a series on how to write SQL Server queries correctly. And I have another sort of ongoing, ever developing playlist called a little about, in which I go into some detail, a little detail about various concepts in SQL Server. And this video kind of meets, makes all of these things meet.

It’s a much happier thruple than the one that I described between like index views, computed columns, and filtered indexes. This one is just about computed columns. So I’m going to call this video, indexing for performance and writing computed columns query, computed column queries correctly. And then it wouldn’t fit on the slide, but in parentheses, a little about.

All right. So before we talk about this subject over here behind me, let’s talk a little bit about how you can subject me to $4 a month by becoming a member, the supporting monetary, monetary contributing member of the Darling Data family. Become one of my data darlings by clicking on the link in the video description right there. And that will take you to the Become a Member page where you can plug in whatever payment form you’ve currently, I don’t know, whatever you stole out of your mom’s wallet or something.

If you don’t have a mom or a wallet, you can like, you can comment, you can subscribe, and you can help me grow this channel in other ways, big and small, which I appreciate. I appreciate the big and the small. If you need help with SQL Server because you are running into problems with the type of stuff that I talk about in these videos, well, golly and gosh, are you in luck.

Look, I’m a consultant, and in exchange for money, I will solve performance problems on your SQL Server. Crazy, isn’t it? Any one of these things and more.

And as always, my rates, they are reasonable. And again, just because I just exude reasonableness from my being, from all of my pores, through all of my glands, just out every single hole, you can get all of my training for a very reasonable price, about $150. If you go to that URL up yonder and enter in that discount code right next to me there, you can get money off of that, and then you can just have it forever.

And, you know, unlike a haircut, my training will never go out of style, not even moments after you walk out of a barbershop. Upcoming events, of course, we have the one, oh, man, my favorite, my favorite, because I don’t have to go very far, SQL Saturday, New York City, 2025, coming May 10th. Boy, it’s going to knock your socks off.

Might even actually have to give you a pair of socks to knock off, because we’ll knock off both pairs of socks. This one, I have a feeling. But with all that out of the way, let’s party, I’m not sure I sold that one.

But anyway, let’s talk about computed columns here. Now, this does stem a little bit from a video that I made recently about how to search better for, like, dates and years in queries, where, you know, like, I talked about how you can use creation date. And the original query that I had written, because, like, I’ll be honest with you, that’s code that I had sort of, like, had sitting around sort of adopted from, like, something that I worked on a long time ago, where, you know, SQL Server 2012 wasn’t quite popular then.

And so, like, the end of month function wasn’t, like, readily available across every place where it would have been used. So, the end of month function, honestly, it just skipped my mind. So, thank you to COBOL7 for reminding me about the EOMonth function.

So, originally, in my query, I had a bunch of complicated crap to figure out when the, like, leap years and the end of the month and stuff. You can skip over all that if you just use the EOMonth function, like COBOL7 suggested there. Now, the original query that we started with was this one, where year column equals 2013 and month column equals 1.

And absolutely, sure, if you wanted to create computed columns and maintain an additional index on those computed columns, you could do something like this. And adding these computed columns in without the persisted keyword would add them in instantly. There would be no locking, no nothing on there.

They would just act as virtual columns in the table. And then, if you wanted to create a nonclustered index on both of those columns, even not persisted, you could do that very easily with an index that looks like this. Cool. Great.

What a lot of people end up trying to do, though, is trying to make sort of a calculated field from both of these things. This is where you can run into trouble. This is going to be the first bit of trouble that I’m going to show you.

This is where you can run into a column where if you were to try to directly concatenate things together. So, I have a few columns in the select list here. One where I’m just showing you the year and month that comes out of the creation date.

One where I’m showing you what happens when you just try to concatenate those things together. That’s a terrible arrow, but we’re just going to deal with it. And then one that kind of has an unexpected side effect when just left to its own devices.

And then one that is probably what you’re expecting, but takes a little bit more work by converting the result to an integer. So, if we look at the results of this thing, you’ll notice that I have the year and the month, which comes back as 2008 and 7. But if I try to just concatenate those together, like with this line here, the result is just going to say 2015.

Because what comes out of year and what comes out of month are both integers. And so, SQL Server says, oh, you just wanted to add those together. 2015.

Congratulations, dummy. Okay. Fair enough. Integers, you know, that’s what you do with the plus sign. You add one to the other. So, this isn’t what we want here at all. We could just do this.

Oops. We could just do this. And let me scroll down a little bit in here, actually. We could just do this and say, like, rtrim year plus right double zero plus rtrim month two so that we get the year portion and then a zero padded month portion here. And that could be fine.

You know, I’m okay with that. As long as, you know, the data type that comes out of that is what you’re okay with. And any queries that search on that use the correct data type thing.

But we can also convert that to an integer. And really, the only limitation on this is that the leading integer can’t be zero padded, right? So, like, if you’ve ever worked with zip codes, you’ll know that storing zip codes is an integer.

In America, where they’re all numbers, is a mistake because some of them start with zero. Yeah. And you can’t have an integer start with zero.

So, this, I think, would be pretty safe because we’re a long way off from any years started with zero. Where, you know, probably, I can’t imagine when we would have another year that starts with zero again. But, anyway, the 07 would be safe because, you know, months start with seven.

And most people, you know, I don’t know if they, I don’t know, whatever, really whatever search criteria works best for you, just go with it. Most people expect the zero padded month, but if you don’t, whatever. I’m not going to sit here and argue with you.

Where things get a little surprising for some is if you say, if you, if we were to run this store procedure, sp.describe.first.result.set, which is very handy for figuring out what data types are coming out of queries that you write or anything like that. So, you can figure out, like, oh, do I need to fix this somewhere?

Is that the one where we just have this as a string comes out as a varchar14. So, we do have to be a little bit careful there. The SQL Server does not, like, change that back to an integer or anything else unless we explicitly tell it to.

So, you know, we could do that. But what makes it tricky with the computed column stuff is that we would have to add the computed column like this. Right?

Convert integer, whatever. So, let’s do that. And let’s create two indexes on the votes table. One, we’re going to create on the year month column that we just added. And the other one, we’re just going to create straight on the creation date column that we just added.

The thing with SQL Server, and this goes for standard edition and enterprise edition, is that unless you write your query to explicitly reference the computed column that you created, you might be relying on something called expression matching for the optimizer to choose your computed column to use with the query that you write.

So, the sort of unfortunate thing is that if we were to just write our query like this, and we were to say, hey, give me the year and give me the month, like 2013, I mean, the old one is sort of unnecessary, but if we were to just write this query, the best that we would get from this currently is just a scan of the index we created on creation date.

This is not the index. This is V1. This is the index on creation date right here. This is not on the computed column that we created, or rather on the index that we created on the computed column.

We would have to write our query to very explicitly match the definition of the computed column, and have, say, where that equals 2013-01 in order to referent, in order for a SQL Server to pick up on the computed column and use it automatically, right?

Which, and I say automatically, but this is far from automatic. This is you taking a pretty big step towards, you know, creating the, rather, rewriting, like creating the computed column, rewriting queries to use it.

But at this point, like if you’re going to do all, like if your query is doing all this stuff, fine, already, fine. But if it’s, you know, if you have to rewrite it to do this, you’re better off just rewriting it to say where year month equals something.

Like, don’t go through all this trouble. Now, one place where adding computed columns can get tricky is, let’s say that we wanted to do this.

And I’m going to change this a little bit first, because I just, I want you to see where this can go wrong. So let’s put this on a new line.

And let’s put this on a new line. And let’s put this on a new line. So what I’m going to do is take out this and this to begin with.

So now we’re just going to have add to the post table, last edit date as is null, 1900, 0101, whatever. And if we do this and we try to create an index on the column like this, SQL Server is going to say it cannot be used in an index or statistics because it is non-deterministic.

And the reason why it’s non-deterministic is because we, SQL Server isn’t quite sure what to do with this. All right. We’re not telling it what to do. So it doesn’t really know.

I’m going to drop those computed columns. Actually, I’m not going to be able to drop the one on votes, am I? Because it’s dependent on that index. So we’ll just do that later. I’ll take care, fix it in post. So, but I got rid of that one.

Now let’s put these back in. All right. Let’s say convert date time, 1900, 0101. And we’re going to use the 121 culture with the convert there. And if we add that in with that in the right place, now we’ll be able to create an index on that computed column.

And the nice thing is that we don’t have to put the convert part of the computed column into any queries that we expect to use it. If we just say, hey, select count from posts where is null, last edit date, whatever is greater than this, then SQL Server will have no problem using and seeking into the computed column plus index that we created on it to support this query.

So the good news is that there is some finagling and work that you have to do to get, especially whenever you’re doing anything that involves an expression like this, you do need to really help the optimizer understand what to expect out of that column and make sure that, you know, you know, it’s like things are being converted to the proper, you know, date and all that other good stuff.

So it is possible to do, and it is possible to get expression, expression matching to happen without adding any additional logic into queries that you expect to use things in this case. But you do need to have the, you do need to make computed columns deterministic in order to index them. And if you don’t do that, SQL Server is going to say no index for you.

So there’s that. Anyway, just a little bit about indexing computed columns for performance and writing computed column queries correctly. I hope you enjoyed yourselves.

I hope you learned something and all that other good stuff. This is going to be my last recorded video. I mean, you’re not going to be able to tell because I have video scheduled out for about three or so weeks at this point. But this is, for me, exciting because this is the last video I’m recording for a bit.

I’m going to take some recording time off for the upcoming Christmas and New Year’s holidays. So I’m going to, I’m going to just shut this whole mess down and, and think about other things for a little bit. But when I come back from break, we’re going to, we’re going to talk, we, I have, I have a whole slate of things to talk about.

So, and hopefully there, there, there, there will be some, some news on SQL Server V next that I’ll be able to talk about as well. So that’s, that’s what’s upcoming. But in the meantime, I do wish everyone happy holidays, whatever you’re out there celebrating.

You know, we’re, we’re, we’re a Christmas bunch. So Merry Christmas. If you’re, if you’re feeling Christmassy too. Happy New Year.

Even if the new year you’re celebrating is a little bit after the new year that I’m celebrating. Again, it’s, it’s cool with me. Celebrate, celebrate your new years whenever it makes sense to you. You know, it’s, it’s fine.

It’s just more, more, more, more days on the calendar where it’s okay to drink, which means it’s okay with me too. But anyway, I will see you when I return from my, my, my bon voyage. And until then, you know, keep indexing SQL Server queries for performance and writing queries the correct way in SQL Server.

It’s still, you know, it’s a nice thing to do. Anyway, goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Windowing Functions

Indexing SQL Server Queries For Performance: Windowing Functions



Thanks for watching!

Video Summary

In this video, I delve into improving the performance of windowing functions in SQL Server, focusing primarily on how effective indexing can be. I start by discussing my own experience with YouTube and how you can support my channel through a membership, emphasizing that all content is free but your engagement helps make it better. The bulk of the video then explores specific techniques for optimizing queries using windowing functions, particularly cross-apply and drive join methods. I explain in detail why inducing batch mode can significantly improve performance, especially on standard edition SQL Server, where DOP limitations can be a challenge. Throughout the discussion, I highlight common pitfalls such as mismatched indexes and the importance of attention to detail in index creation. By sharing these insights, my goal is to help you understand how to better leverage windowing functions for faster query execution.

Full Transcript

Erik Darling here with Erik Darling Data, just Darling Data. I think I took the Erik out for a reason. My name is naturally, since birth, this is not an affectation spelled with a K, and asking people to remember that is apparently difficult, even in the context of like, you know, like, my name is like, name being in an email right above them. It’s always like, hey Erik with a C, and I’m like, not a lot of attention to detail on your end, is there? Not a lot of giving a crap. So, uh, took the Erik out, kept the Darling Data. Darling is at least reasonably easy to spell as long as you’re not, uh, some, uh, AI powered, uh, transcription tool, which really loves calling the name of my, really loves calling my company Darlene Data. Like, Darlene, the, the, the, the, the daughter, the rebellious daughter from Roseanne, which, not a lot of attention to detail in the AI world, is there? Not a whole lot of that. Anyway, uh, today’s video, we’re going to talk about improving the performance of windowing functions in SQL Server, uh, mostly from the, um, the context of, uh, indexing. Uh, we’re gonna look at a couple different query forms, though. Uh, one, using cross-apply, the other using a drive join. So, there’s gonna be some interesting stuff in there, but before we talk about that, we need to talk about how Erik Darling derives value from YouTube, or from you, to, the, uh, if you would like to support my efforts to bring you, um, you know, all of this wonderful SQL Server content, you can click the link down in the video description to sign up for a channel membership. Uh, we’re up to, like, almost 40 people, or, I think, maybe 40 even.

I forget what the last number was, but, 40. Wow, 4-0. Just like my last birthday. Uh, uh, and, you can, you can, uh, support this channel for as little as $4 a month. Uh, all of this content is otherwise free, and if you would like to help me make this channel the wonderful, uh, you know, gigantic, thriving community that it should be, uh, liking, commenting, and subscribing are wonderful ways to also help me do that. Uh, if you, of course, see the things that I talk about, uh, in these videos, and think that, uh, some of them might just apply to your slow-ass SQL Server, you can hire me to do all sorts of consulting-y things. I will make your SQL Server faster in exchange for money. Uh, and, as always, and forever, my rates are reasonable. In the same reasonable vein, uh, is my, my consulting rates, uh, my, my training rates are also, uh, equally reasonable. Uh, if you would like to get all of my, uh, training for, uh, about 150 USD for the remainder of your, you know, life, uh, may, may, may you live forever, uh, you can do that at that URL up there using that discount code over there.

There is also a fully, uh, formed, uh, URL down in the video description if you are more keen on just clicking on one thing than clicking on a thing or typing a thing and typing another thing. I know how the attention to detail goes. Uh, SQL Saturday, 2025, coming your way, May 10th, uh, it will be at the Microsoft offices around Times Square.

So, not only can you get a slice of, slice of Sbarro pizza, you can also get your picture taken with, uh, Elmo wearing a dirty backpack or something. And, um, you know, uh, hopefully, hopefully not offend Elmo or, or dirty backpack Elmo or dirty backpack Spider-Man or, uh, any of the, the dirty backpack Sesame Street characters that, that, that, that, that, that roam around and, uh, will, I mean, for a reasonable rate, take a photo with you. So, uh, yeah, you, you should, you should do that.

Um, I, I would encourage you to do it. It’s, it’s a fun event. We, we don’t invite the dirty backpack Sesame Street characters into the event, so it’s usually pretty safe in there. But with that out of the way, let’s talk about windowing functions in SQL Server.

Now, uh, I am not going to sugarcoat this one bit as soon as Zoomit decides to become a responsive, responsive and responsible human being. Uh, most of the time when I’m working on a query that has a windowing function in it and it is having performance problems, my, the first thing I go to is to try to induce batch mode in some way. Uh, either by encourage, by, uh, using, uh, like, you know, just some sort of column story object in the database.

It doesn’t even have to have any rows in it. It can be a temp table if, you know, if you’re into that sort of thing. Uh, it can be encouraging a higher database compatibility level for the query, um, and maintaining the legacy cardinality estimator for a query.

But most of the time, windowing functions just work better under batch mode. They just do. Um, the window aggregate operator is highly optimized or something.

And it is just, like, you, you just take all the work out of it. You don’t really even have to, like, in most circumstances, you don’t even have to change the indexing because you’ve already got good enough performance just by getting batch mode out of the way. Uh, getting batch mode, uh, involved in the query and getting row mode out of the way.

Sorry, it’s the opposite there. Um, sorry, Zuma is being a real weirdo today. Uh, but, uh, if you’re on standard edition and Microsoft hates you because you did not pay the friendship tax, the enterprise friendship tax, then, uh, batch mode is not going to be as favorable of a proposition because you are limited to a DOP of two for any parallel queries that involve batch, that involve batch mode on standard edition. Uh, it’s a real unfortunate situation.

Um, I don’t understand quite why Microsoft needs to stick the thumb in the eye of standard edition users so badly. It’s almost like they’re daring you to go use Postgres. Uh, it’s a, it’s a real, real stinking shame how much accountants can ruin.

Perfectly good software. But there are lots of windowing functions out there in the world. There are, they generally fall into three categories.

There are ranking windowing functions like row number, rank, dense rank, and end tile. There are aggregate windowing functions like sum, average, count, big, min, and max. And then there are all sorts of analytic windowing functions like first value, last value, lead, and lag.

There are a whole bunch more, uh, in the boat, but I didn’t feel like listing them all out. So, um, anyway, uh, let’s pretend that I have a couple indexes, uh, on a couple of tables in the Stack Overflow database, namely the Post table and the Votes table.

And I’m using the bigger tables because I really want to drive performance crazy. Now, uh, we’re going to start with this query and those indexes. And this is the execution plan for that.

Um, it takes about 20 seconds, uh, and we use a loop join here. And on this side, the inner side of the loop join, we already have a good index on the Votes table to seek into. We can already get that done pretty quickly.

But what we don’t have is a good index to help with our windowing function otherwise. Now, inside here on the Votes table, what we’re cross-applying to is all this wonderful stuff, uh, where the, uh, Post ID column in the Votes table equals the ID column from the Post table, where VoteType ID is in these VoteTypes and where the creation date of the votes is greater than 2008.

So, uh, if we go back to the indexes a little bit on the Votes table, we do have this seekable index on Post ID, but none of the, um, none of the, uh, columns either from the other part of our where clause or from the specification of our windowing function are in the key of the index.

Now, if you’ve been paying attention to my other videos, even specifically the last video I recorded where I was like, hey, indexes put data in order, uh, you might have come across some stuff in your head like, oh, windowing functions put data in order too. Indexes can help windowing functions.

So, uh, let’s go and change the index that we have on the Votes table. But what I want you to pay attention to here is that even though we’ve put Post ID, VoteType ID, and CreationDate into the key of the index now, Post ID was there before, we added VoteType ID and CreationDate, and that should technically help with this.

It doesn’t go as far as we want because we messed up this part. Again, the whole attention to detail thing. This is the Eric with a C of index creation, and you’ll see, notice in here that we still, even though we can seek into everything we care about here, we still have to sort that data. If we hover over the sort, we will see that we have OrderBy, VoteTypeID ascending, and then CreationDate descending. So, our index really has to closely match the specification of our windowing function for it to work out. If we change our index definition, like we have in this window, to be on PostTypeID, VoteTypeID, and then CreationDate, things will work out a lot better for this query. Right? Now, we have, we still have our PlentySeekable index over here, and we do not, we no longer have a big spilling, expensive, well, not expensive, time-consuming spilling sort on the inner side of this nested loops joint. So, we’ve improved this query in a couple ways. Not necessarily from a timing perspective so much. We are a little bit faster than this query, and we are a little bit faster, well, actually, we’re about the same speed as this query, but this query over here will have asked for a memory grant of 4300 KB, and this query over here will have asked for a memory grant of 136 KB.

So, we did not improve the speed of this query so much, but we did improve some other, that metric, specifically the memory grant. Now, one thing that, you know, sometimes works, like trying to get batch mode involved, is to do like a windowing function thing in sort of an inner context, like in here, and then, you know, like select from that and generate the whole windowing function, and then apply some predicates outside of where you generate the windowing function. But in this case, this doesn’t get us anywhere. This doesn’t actually fix the query in any meaningful way.

So, one thing that you like, you might try, or rather the way that this, these queries like this tend to naturally get written is not with cross-apply, but with a join, right? And you can totally do that, and you can totally put a derived join into your query, and you can generate the whole row number in here, and you can join to your derived join on the outside of that. Where that differs from the cross-apply thing is with cross-apply, you can push that correlation inside of the cross-apply.

So, you don’t correlate the V thing to the post thing out here, you correlate the V thing to the post thing inside of the cross-apply. So, in this query, where I was trying to get that to change a little bit, was generating the row number in here, and then sort of applying the filter after that.

This didn’t quite work out for me. But the way most people write queries like this is just with a straight join, but of course you can’t put this thing in here, right? You can’t do that. This doesn’t work with just a straight join. You have to do the join and then say on out here.

So, in this case, we will have generated this whole result set, and then we will have filtered it to, then we’ll join the results of this thing to the post table outside of generating this result set, and then apply some filters to the stuff out here.

This does okay, but we’re back to a memory grant and we’re back to a spilling sort. Now, if we wanted to get this to be better, we would have to change the index that we have on the votes table to better suit writing this as a join rather than an apply. So, we would need something that helps the windowing function first, and then helps us join after we have developed that result for the windowing function. If we do this, this actually brings our query down to about 16 seconds. One of the big things with, there are two big things with this. One is the type of join changes, right? If we look at all of the windowing function, sorry, the windowing function versions of this, oops, that’s not the right one. We did change the join type for this one with where we did this, but we did not change the much else good about it. But for all of the cross applies, SQL Server has optimized that those cross applies as a nested loops join. This can be good or bad. This is not necessarily a death knell, but this is a significant amount of rows coming out of here. Generally, with apply, you want like a small result to like drive hitting a big result in here. So, if we were, if we had fewer rows coming out of the post table, this nested loops join may have been a bit more, a bit less painful. But the bigger deal with these joins is that when we just, rather with these queries, is that writing this is just a drive join. SQL Server opens up a little bit to other join types, which can be useful. So, in this one, so like this one, in the last couple of queries, we used a hash join. And the hash join is just kind of one big seek. And since we have for this particular query form, we’ve redone our index for vote type ID, creation date descending, and then post ID last. So, we can generate our whole row number and do all this stuff in here, and then join to the result of that out here. This helps SQL, like SQL Server does just one big seek, generates one big row number, and then filters that row number out down in here. Now, part of the way that I’ve engineered these queries is to not, is to have to fully do all the work inside of this, but filter out any work before we like do anything. So, really the, the, this to show you that, that inner part of the query where we do all of this stuff is where we need sort of the most help. Now, you don’t necessarily have to use row number or a windowing function in order to find the most, or like the highest number thing, or the most recent thing, or the least recent thing, or any of that stuff. You know, something that I try to get into everyone’s head when they’re writing queries is that, you know, there are, there are many ways to, many, many ways to write queries, and you can, you, and like, you don’t necessarily have to do the same thing over and over again. You should test different forms of different queries in order to make sure that you are using the fastest one available. Just for some situations like this, your options are, of course, windowing functions. You could use a top one with an order by, and you can use aggregate functions like min and max and whatever else. So, you don’t necessarily have to do windowing functions for all this. You could use top one, you could use min and max. You should test all of those things and see which one gives you the best performance. There might even be performance differences in different situations based on the indexing you have available for all these different query forms. So, let’s just take a look at this one real quick, where rather than generate a whole row number and do a whole bunch of filtering on row numbers, we just say, give me the max creation date for all this stuff, join to that out here, and now we’re going to make sure that batch mode stays involved in here by using this columnstore helper table. And now if you look at this query, it finishes in about three seconds. So, I’m all for windowing functions. I’m all for using them when they are useful and appropriate and when they, you know, help you write a query that gets you the right results.

But as soon as you start performance tuning things, you do need to keep in mind that there are other query forms that might be faster. Sometimes it might be the windowing function. Sometimes it might be top one with an order by or top whatever you need with an order by. Sometimes it might just be using a straight up max or min or max aggregate function, crunching your data down and then using it from there. So, thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

I hope that you will take my advice about trying many different query forms seriously because most of the time when I’m tuning queries, I am, I mean, I am definitely looking at indexes, but I am also looking at how that query is written because if there are alternative syntaxes available that might help, might suit the indexes that I already have better, I want to try those before I go trying to mess around with stuff in the database. A lot of the databases that I work with are really big and creating indexes is pretty, is a, can be a pretty painful experience. So, you know, if I can avoid that, I will do that by getting other things involved, like different query formats, batch mode, blah, blah, blah, blah, blah. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I will see you in the next video about computed columns.

So, cool. Thank you. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Index Sorting

Indexing SQL Server Queries For Performance: Index Sorting



Thanks for watching!

Video Summary

In this video, I dive into the fascinating world of how indexes store and sort data in SQL Server, particularly focusing on multi-key indexes. We explore why understanding this is crucial for optimizing queries and improving performance. Whether you’re a seasoned SQL professional or just starting out, grasping these concepts can significantly enhance your ability to write efficient T-SQL code. I also share some practical tips on how to leverage indexes effectively, making sure that data retrieval processes are as swift and smooth as possible. Along the way, we take a fun detour through my database of DMX tracks, using it to illustrate key points about index ordering and query optimization in an engaging and relatable manner.

Moreover, I delve into the nuances of equality versus inequality predicates in queries, explaining how they affect the need for sorting or reordering data. By walking you through detailed execution plans and practical examples, I aim to demystify these concepts and equip you with the knowledge needed to make informed decisions when designing your database schema and writing SQL queries. Whether you’re looking to fine-tune existing queries or plan new ones from scratch, this video provides valuable insights that can help streamline your workflow and improve overall performance.

Full Transcript

Erik Darling here with Darling Data. In today’s video, we’re going to get on with our indexing series and talk a little bit about how indexes store and sort data in SQL Server and how you can use this knowledge to your advantage to make queries faster and more productive and, I don’t know, I’m sure it will meet some other KPIs along the way. But before we do that, let’s talk about how Erik Darling stores and sorts money. If you would like to support my efforts bringing you this ultra-high quality SQL Server training content, you can sign up for a membership to my channel down in the video description for around $4 a month. Of course, I don’t see the full $4 a month. The IRS has something to say about that. So you can do that. If you, you know, did I use the joke about the bully stealing your lunch money already? Well, I mean, I guess not everyone watches every video, so screw it. The schoolyard bully took your lunch money and you can do that. If you don’t have, you don’t have four bucks, you can like, you can comment, you can subscribe and help me grow this magnificent channel in all sorts of other ways. If you, you know, watch these videos and you’re like, holy cow, that Erik Darling sure does know a lot about SQL Server.

You can hire the best SQL Server consultant in the world, me, to do all this stuff for you. And I would be happy to do that in exchange for money. And as always, my rates are reasonable. Speaking of reasonable, segue right into this beautiful bit. If you would like all of my SQL Server performance tuning training at 75% off, that brings the price down to about $150 for the whole enchilada. You can, you can go to that URL and punch in that coupon code and by the grace of God, check out with a major credit card and we’re all happy people after that.

There is finally an upcoming event that has been announced. SQL Saturday, New York City 2025 will be taking place on May the 10th. What a wonderful day that is.

May in New York City is just a beautiful time of year. You know, it’s not, you have left the freezing grasp of winter and you have not yet entered the sweaty armpit of summer. It’s just a perfect time of year.

So I would encourage anyone who is feeling like having a nice slice of Sbarro pizza to come on down to New York City for the SQL Saturday event. I might even be there, who knows. And with that out of the way, let’s, let’s go have ourselves a little party with SQL Server and I should probably click on the right thing, shouldn’t I?

So, uh, order data is great for all sorts of things in databases. Uh, searching, right? If you want to search through data, it’s a lot easier if it’s an, if it’s in an order that allows you to efficiently search through it.

Uh, if you want to say group by or order by or, uh, you know, have a windowing function do some stuff, which we’re going to talk about in the next video, if you couldn’t see that in the tab name right next to this one. Uh, there are all sorts of things that having indexes in place that pre-sort your data in a useful way can help with.

Uh, but we need to talk a little bit about how indexes store data on, like for multi, multi-key indexes store data. For a single key column, for a single key column index, it’s probably pretty obvious the index is sorted by that column. But I still get questions from people like, uh, I have this index on column one.

Is that the same thing as this index on column three, two, one? And I’m like, no, column one is over here. And this one, column one is the only key column in this one.

How, how would that work? It’s kind of a thing that I still get. So, uh, those, those are, those are the fun questions that Erik Darling answers, uh, off the cuff.

So, uh, in my, uh, database called CRAP, I have created a table called albums. And this album is actually just, it has two indexes on it. Uh, we have a clustered primary key on a column helpfully called surrogate ID to, just so you understand.

It has no meaning, uh, outside of being the clustered primary key and getting us all the things that we desire from clustered indexes. Um, uh, you know, like, uh, you know, being, um, you know, uh, narrow. That’s a good one, right?

Biggins, narrow. Uh, being unique, right? It’s a, it’s an identity. So every value in there, uh, unless we do something silly is going to, uh, be a unique entry. Um, it is ever increasing, meaning it’s a, it’s an, it’s an ever growing value unless we do something goofy and we need to reseed it and then it’ll be a negative growing value or something.

Uh, and it is static, meaning that we are not going to be modifying the values in this thing. We are not, we are not allowed to directly update an identity column. We can, you know, do stuff with identity insert, but we can’t, you know, update the column.

So we have this thing, which has all sorts of great clustered index attributes associated with it. And then we have this index down here, which is a nonclustered index. And this, uh, index is on artist, release year, album title, and track ID.

Um, I guess in a, you know, I guess we could have swapped release year and album title. Wouldn’t really have made that much of a difference, uh, in the table or in the index. But, uh, for the purpose of this demo, let’s just pretend this is exactly the way we want things.

Now, what I did was I put, um, a bunch of rows into this table. Uh, and I, like this, this, this, this, this, this value’s clause is a little misleading because what we end up in, what we end up with in the table actually looks like this, where we have, uh, the surrogate ID, the artist, the release year, the album title, and then the track ID for the number of tracks that are in the album that we care about, right?

So we actually have multiple entries for each album. Now, um, in rowstore indexes, key columns are stored in sorted order in order to make it easy to navigate the, uh, the B tree that they’re stored in efficiently to locate rows.

Uh, they are not stored individually like in columnstore indexes. In columnstore indexes, you really get like a column, like an index on each column that is in the index, which is the series of little indexes, uh, on those things, uh, row groups and whatnot.

Um, but, uh, you know, like way back in the, the, the bad days of, of life, um, there, a lot of people would, uh, use the phone book analogy to talk about indexes. Uh, I’m going to update that for the modern, um, I don’t even know what generation is the most current one. Uh, whatever.

Everyone makes playlists. Everyone loves a playlist. I have YouTube playlists. I have Spotify playlists. I have playlists all over the place. I assume that other people who are far more organized than I also have playlists in their lives. So let’s, uh, let’s go back to what we have on this table.

Oops. Index wise. I didn’t frame that very well. We have a clustered index, uh, a clustered primary key on surrogate ID, and we have a nonclustered index on artist, release year, album title, and track ID. Now, if I want to sort, if I want to say select star from albums, order by surrogate ID, of course, SQL Server is able to take advantage of our clustered primary key.

Uh, it’s only 200 rows. It’s not a big deal here. Performance wise.

Oops. I hit control, but snap, snap, snap, it didn’t listen. Uh, we scan the clustered index, but we don’t have a sort operator in our query plan. We have asked for this in the order of surrogate ID, but since the clustered index put surrogate ID in the order that we wanted it in, SQL Server does not have to physically implement a sort of this data. The same thing goes for if we, um, say select star from albums, order by artist.

If we look at the execution plan, we have an index scan, a nonclustered index scan this time. You can see the non-col here. It would be nice if SSMS had the ability to show us full object names the way Plan Explorer, uh, which hasn’t seen a, I don’t know, any development in who knows how long.

Uh, it would be nice if SSMS had this ability, but, uh, for now we just, we’re just left with these ellipsis cliffhangers to figure out what happened. But notice there is no physical sort operator in this plan. Uh, we can, let’s just contrast these two plans really quickly.

Where if we were to ask for SQL Server to, uh, sort our data by release here, and SSMS is being real picky about that right there. But now we do have to physically sort that data when we say order by release here, because release here is not the leading key of any index in the table. Uh, the, the clustered primary key is surrogate ID, and the nonclustered index that we have is, uh, artist.

It leads, leads with artist. So asking for this data to be ordered by release here means that we have to re-sort the data from what the index initially had it sorted, the order that the index initially had it sorted in. The same thing would go if we used any other trailing key, like album title or track ID.

If we look at the execution plans for these, come on, SSMS, help me out a little bit here. Just make this video go, like, kind of smoothly. I’ll give you $50.

We have, uh, so asking for order by album title or by track ID, uh, in these situations, both leads to a physical sorting of the data, or let’s call it a reordering of the data from the order that the index had it stored in. Now we can order by, if we, if we say order by all the columns in the table, artist really, or rather all the columns in the nonclustered index, we’re leaving surrogate ID out of this. Now, even though surrogate ID is the clustered primary key, and it is technically included in the nonclustered index, uh, they’re like, we’re just saying that’s not an important column for our results.

Right. So like, just to make it, uh, maybe perhaps make it a little bit more obvious. Um, if we were to, if we were to say, do all this stuff and also say order by surrogate ID, uh, we would still not need to sort the data because SQL Server, uh, when you create a, uh, a non-clustered, non-unique index on a table that has a clustered index, the clustered index key column or columns, right?

If you have, you can have a multi-key clustered index end up in the, in the nonclustered index is like invisible final key columns. So we still don’t need to sort data when we add surrogate key to the order by here. All right.

So, uh, if the, if this index were unique, then surrogate ID would end up as an included column. Um, and then we would probably have to sort the data, uh, by surrogate ID because it would be an include and includes are useless for ordering, but we’ll talk more about that later. So, um, this, this, this order by is still free, right?

Like all of these columns, there was no sort in this query plan. If we were to do something like this and we were to take the leading key column out of the order by elements and just say order by these three, we are back to having to sort this data because it needs to be reordered from the way the index initially stored it. Right.

Because the index stores data ordered by artist. And then for any duplicates within artist, we are ordered by release here for any duplicates within release here, we are ordered by album title. And for any, uh, duplicates with an album title, we are ordered by track ID.

So an easy way to sort of see how this works is to include an equality predicate in our query. Because right now with this query, we are, we are effectively skipping over the artist column in the, in the, in the, in the, that leads the index, that leads the nonclustered index here. But as soon as we ask for an equality predicate on the artist column, we are able to order by these three columns for free because we don’t have to reorder anything.

We seek into, uh, our nonclustered index. We find every, uh, entry where the artist is DMX and then for, uh, then we have, because that is an equality predicate for a single value, uh, from the, the leading key of the index. Uh, the next, the next columns over like, uh, release year and album title and track ID, uh, the order is maintained in the index.

We don’t have to worry about reordering data in the index because we just did a single equality seek to this. So in the results, we have DMX and because this is all DMX here, these are all, this is all a duplicate value, right? This is just, you know, uh, 127 rows of DMX.

Uh, then release year is, is, is in order within this range of duplicate values. So we have 1998 at the beginning and we have two, oops, oops. Uh, come on, scroll bar, stay where I put you.

We have 2012 at the end. So this release year column is all in order from an ascending order for DMX. And then for 1998, we actually have two albums, right?

DMX, what a, what a beast, uh, released two albums in 1998. Flesh of my flesh, blood of my blood, and it’s dark and hell is hot. And so, uh, we have DMX in order.

We have the release year 1998 in order. And then we have album title. This is, of course, F comes before I. So this is in order.

And then the track IDs, one through 16 and then one through 19 are in order for each of these album titles. What that means is that if we were to say, um, search, uh, across, uh, artist release year and album title and then order by track ID with, so we have three equality predicates. We would find, we would seek to DMX.

We would seek to release year. We would seek to flesh of my flesh, blood of my blood. And then we would have, or track ID in the exact order that we care about it stored in the index. We don’t have to see, we don’t have to sort data here, right?

So we get that data, we get that data in order for free. Inequality predicates do not preserve data because we have to cross boundaries within the index when orders would, when, when, when an order by, when order resets, right? So, uh, if we look at this query plan, we are back to having to sort this data, right?

And the reason we have to sort that data is because we are asking for, uh, artist release year album title, uh, where the artist equals DMX and the release year is greater than 1998 ordered by album title and track ID. Now, when we had an equality predicate here, it, it wasn’t a big deal. We didn’t have to resort that, but with an inequality predicate, as soon as we cross over from 19, um, sorry, I asked for this in order by album title.

So as soon as we cross a boundary and release year, say going from, uh, 1999 to 1998, we’re ordered by album title and track ID, right? This, this, this one and this one. So, uh, this data was not in order, uh, the way we wanted it after we crossed a boundary with, for a release year.

So, uh, so like if we, another thing that’s kind of funny is that sorting, uh, with duplicates is, uh, often called non-deterministic because like, look at these results. If we were to say, um, select these columns and then order by, uh, track ID, we would get all the track IDs one in here, uh, like for this chunk. But, uh, the order of the columns is a little bit out of whack.

It’s like 98, 98, 99, 2001, 2003, 2012, 2006. So these track IDs were not sorted perfectly by anything else. If we wanted, uh, a deterministic sort order, uh, to, with like tiebreakers and stuff like that, we would need to include a unique column in there.

So if we also ask for a sort by surrogate ID, then we would get, we could have asked for, I guess, release year two, but that might’ve been a little funky with, uh, oops, a little funky with, uh, the two albums that got released in 98. But if we say order by surrogate ID as well, then we get 98, 99, 2001, 2003, 2006, 2012. So there are the way that you, uh, the order that you store, uh, or create your indexes stores the, uh, data that you put in the key of the index in a specific key, in a specific order.

Uh, if you ask for a different column to be ordered by, or if you ask for a different set of columns to be ordered by, and you don’t have a quality predicates that maintain that order, you will have to resort data from the order that it’s stored in from the index. The same thing would go for, um, like an in search. So a lot of times the way that SQL Server handles in searches and the query plan makes it look like it is just two separate seek predicates, but it’s not two seeks with like ordered, uh, results after that.

Uh, like, like if this were just, you know, like when we ran this query and it just said the artist equals DMX ordered by these three columns, we didn’t have to worry about, um, sorting this data. But when we say in Blondie comma DMX, we do have to sort that data because every time we cross a boundary, right? When we go from Blondie to DMX, the release here ordering resets.

So for Blondie, uh, I only, I didn’t put all of Blondie’s albums in here cause let’s face it, not all of Blondie’s albums are worth putting on a playlist. May have had some good songs later in the career, but at a certain point, the, the, the whole, the full, the full album experience kind of lost some gusto. So, but, um, so for Blondie, the release here is perfectly ordered from 1976 down to 1982.

But then as soon as we cross over from DMX, that ordering restarts. So this is 1998 through is going to be like 2012. So we have that data in the order that we care about in the index per artist.

But as soon as we cross a boundary to a new artist, that ordering resets in the year column goes from ascending to descending. Uh, so from lowest to highest for that artist. Now let’s look at a slightly more complicated example.

And this is what we’re going to finish up with. Uh, if we have, we say, oops, I need to, uh, go into the stack overflow database for this one, don’t I? So a slightly more complicated example, we’re going to select the top 40, uh, users from the users table where reputation is greater than, I think that’s a hundred thousand.

deprivation is great because they’re Thiessen times a million. It stands for a littleack casos and it’s three times per object in them old and that’s a your circle. But the fields of definition is could have a million times time to define schemas.

It mean, it makes the video that’s healthier because it has a million times. And theункты sampling tribunk wages are very difficult. It means they’re going to meet teams and um, but if we do a bunch of things on the basis of EstoyDeal, I’m going to feature point because we’d take advantage through time for most of the reasons. We use that to visit the标志 and including data. The mushrooms data, I used to be able a lot larger information for each member, but we have a digital tool called user- STRANGO date.

the query. Now if we create this index on the users table and we rerun our query, we don’t need to sort that data anymore because we have the index that we created has a key is keyed on reputation. So now reputation is stored in the order that our query cares about it being in.

So if we look at this plan, there is no sort, there is just a top. So we go in here, we find, we seek to the top 40 rows that we care about. And since id is the clustered primary key, we don’t need to do any additional work to sort this. Now if we look at the estimated plan for this query, where we’re adding another top in here, we’re cross applying to the top one, this is the most, this is going to get the most recent badge for any given user. If we look at this query plan, we’re going to have an eager index spool in here. That’s something that I’ve talked about in this video series as well as many other times on the channel. And one of the things that a lot of, what a lot of people do when they need to fix an eager index spool is they look at the details of the eager index spool down here. And they say something like, well, I need to seek to the user ID in the badges table, and I need to output the name and the date column from the badges table. So they might create an index that looks like this on this, on the badges table in order to facilitate, in order to get rid of that eager index spool, which is, you know, if you were strictly speaking, this, this is the index that SQL Server created to facilitate faster seeking on the inner side of that nested loops joint. If we look at this query now, we do get rid of the eager index spool, but we still need to sort this data, right? We seek in here, and then, but we, but since the date column is in the includes of this index, we do not have an ordered, an ordered avenue to where this data lives in order to make that sorting free. So let’s say if we wanted to run a slightly different query, like say, select the top one from users, and then the select their top 1000 most recent badges, right, we could do this, we get John Skeet, and I forgot to, wait, I did, what on earth? Oh, yeah, good answer. Okay, so if we look at this, we still need to sort that data, and now our sort even spills a little bit. So don’t we feel foolish? Now, if we were to change this index definition a little bit, and we were to say, select user, I’m sorry, we were to put date is the second key column in the index, then we would still be able to, we would be able to get rid of the eager index spool, and we would still be able to navigate the B tree after the nested loops join, and not have to sort that data anymore. We still have a top down here, but it’s not a top end sort. So you will serve for every time, for every trip, remember that nested loops is a per row activity. So for every row that comes, that we seek to here, like one row comes out, goes into the nested loops join, comes down here, and then SQL Server seeks to that one value, right? Every time you come in here, we seek to this, and then for every time that we seek into this table, for every user ID, the date column is now in order. So because our index is on user ID, and then date, every time we have an equality predicate on user ID, like remember, like just like with the artists and albums, we had like user ID 22656, the dates are all in order for user ID 22656. So for every equality predicate, that second key column is in the order that we care about it being in. So remember that our index now is on these two, right? User ID and then date. So if we were to select the top 100, and we were to say order by user ID and date, again, this ordering is free, we have a top, but we do not have a sort in the query plan. And if we were to say, hey, let’s select the top 100, but let’s order by these three columns now, user ID, date and name. Because name is an included column, that data is not stored in any particular order. Includes have no ordering to them, they just exist down at the leaf level. So we end up with a sort back in the query plan. So understanding concepts like this is really, really important for making queries go faster. A lot of queries have either presentation or implied order bys. You know, presentation layer order by is something like this. But, you know, implied order bys might be, you know, to support certain query plan operators like stream aggregates and merge joins.

You might need to sort data to match the, what you’re asking for in a windowing function, the partition by order by clause. You might have a top with an order by in your query, there might be all sorts of reasons why having data in order would make queries go faster. It’s important to understand how equality predicates and inequality predicates can change the way that SQL Server thinks about presenting the data in the order that you’re asking it for, and all sorts of other things.

You know, fixing sorts is not just a matter of, you know, generally making queries go faster, but, you know, sorts can ask for very large memory grants. Sorts can spill if they don’t get an appropriate memory grants. Sorts can ask for way, way, way more memory than they end up using.

So there’s all sorts of reasons to get the indexing and querying right so that you don’t end up in a situation with a slow query because of that. But we’ve talked about this long enough now. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something, and I hope you stick around for the next video on windowing functions where we will we will experience even more further joy than we experience in this one. So goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Indexed Views And Non-SARGable Predicates

Indexing SQL Server Queries For Performance: Indexed Views And Non-SARGable Predicates



Thanks for watching!

Video Summary

In this video, I delve into the world of index views and how they can be used to address non-sargable predicates in SQL Server queries. We explore two different approaches using a single query as our test case, aiming to optimize performance by creating an index view that helps SQL Server process the query more efficiently. Along the way, we also discuss some quirks of query plans and the importance of considering the broader impact on other queries when introducing new indexes or index views into your database workload. If you’re interested in supporting my work, consider signing up for a membership starting at just $4 a month—every little bit helps! And remember, if you can’t afford that, sharing this video, liking it, and subscribing are all free ways to help the channel grow.

Full Transcript

Erik Darling here with Darling Data. Hooray! We did it! We survived another day. All together. All of us. Didn’t we now? Everyone except Intel, who still… Still can’t find those drivers. Don’t like it. Don’t like it one bit. In today’s video, we’re going to be talking about how you, can use index views, we’ve got some more index view stuff to talk about, I guess, to fix non-sargable predicates. Now, we’re going to talk about this in two different ways, but with one query. So we’re going to kill two birds with one query. Might be a record. You know, Randy Johnson gets to kill birds with baseballs, I get to kill birds with queries. That’s not fair, is it? Only I were six foot nine. Anyway, let’s do that. Before we get into all that good stuff, let’s talk about how you can make me nowhere near as rich as Randy Johnson. If you would like to support the work that I do on this channel, you can sign up for a membership for as little as $4 a month. Assuming that maybe you are also not as wealthy as Randy Johnson, but if you have an extra four bucks a month, you can do that.

If not, liking, commenting, subscribing, all free things you can do to help this channel grow. And maybe, maybe if you can’t afford the four bucks, maybe you doing this stuff can, can fool the algorithm into appealing to people who do have four bucks a month. Disposable income. It’s all sorts of ways you can help. If you would like assistance with your SQL Server, I have much more reasonable rates than Randy Johnson. And I can do all of this stuff. And, you know, aside from beer gut magazine making me exclude New Zealand from, from, from this sales pitch, I am the best SQL Server consultant in the world outside of New Zealand. If you would like probably the best SQL Server training on the internet, you can get all of mine for 150 US dollars. That is a fantastic deal. Especially now that probably all those other Black Friday sales are over and, you know, jacked up to thousands and thousands of dollars a year to subscribe. For me, you get everything all at once. It’s wonderful.

No upcoming events until later this year. That’s okay with me. With that out of the way, let’s talk about these things. So I’m in the Stack Overflow database, I hope. And I’ve created these three indexes, I also hope. Yes, look, we got three errors, which means I did everything right. Now, I’ve created these indexes in an attempt to assist this query in being faster. The thing is, there are, there are two things in this query that are going to, um, uh, aid and abet its slowing down. Aid and abet its slowing down, not abed.

Aid and abed, uh, I mean, you can abet a query, I guess, but, uh, a little bit, might be a little sad. But, uh, there are two things in here making life tough on this query. We have this, we have wrapped the score column in this coalesce because a lot of people make this foolish mistake, thinking that, uh, SQL Server can’t deal with nulls correctly. Uh, if, if you have a null in an integer column and you want to know where that integer column is greater than zero, nulls aren’t going to mess SQL Server up. So you don’t need to do this. Uh, and the other thing that’s going to be tough here is that we have a rather restrictive predicate on this thing. But, um, the thing is that like the SQL Server can’t calculate this until the query has run.

Right. So when we say having some is no, blah, blah, blah, blah, blah, blah, SQL Server has to calculate this thing, right? And it has to figure out what this is by joining posts to comments to votes. And, and, and then it can calculate this aggregate and then we can filter on this aggregate here. We cannot put this in the where clause here, right? We can’t because it doesn’t get calculated until the, until the select happens and projects this stuff out. And, and then, and then, then, and only then can we think about things in the having, but what, and I think what’s, what’s funny, what’s funny here too, is that, um, and I don’t know if this is a SQL Server quirk or if this is, um, or if this is, uh, a pure, like purely something for logical processing that would happen in every database.

I know that recently Oracle made it so that you can reference column aliases in the group by, uh, so we could say group by total score here. We can say order by total score in SQL Server. But if I tried to put total score in the having clause, uh, SQL, like, um, SQL Server will say, I don’t know what that is.

So we ha we do have to repeat this expression down in here, which is, which it’s like, just as a query writer, this is annoying. Like, like Microsoft, like do me a solid help, help, help a little in these things. You know, when people complain about Microsoft, you know, SQL Server and T SQL being like dusty and antiquated dinosaur languages, this is the kind of crap that like is just annoying to people who write queries.

So Microsoft could do more in this regard, but, um, we run this query and we, we have query plans turned on. So we don’t have to, we don’t have to be worried about running it twice. Uh, this will take about 10 seconds and we’ll, we’ll look at the plan briefly because quite frankly, the plan isn’t terribly interesting.

Uh, but here’s what the plan does. We scan the post table that only takes 900 milliseconds. We scan the comments table that takes a second and a half.

We scan, well, we don’t scan the table. We scan the nonclustered index and we scan the boats table. That takes 1.6. Ah, that takes 1.6 seconds, 1.4 seconds. Uh, and then like, like I said, the query goes on to do other things.

Uh, six seconds, nine seconds. Uh, I think maybe this, no, that’s still row mode. Um, so this is this, if, if you want to talk about other stuff that makes query plans real confusing, we could start here.

All right. And let’s, let’s look at this. So, uh, everything for operator times looks good enough going across to here.

All right. Uh, except, I mean, this doesn’t make any sense because 1.4 seconds and then 600 milliseconds. So that, that’s not great.

And then 1.6 seconds and then 1.1 seconds. But like I’ve said, I, I, I usually disregard the, the, the parallelism operators when I’m looking at query times because like the time’s often not going to make a lot of sense. Unless you go from like a really small number here to a really big number here, then, then I might pay attention to it.

But going from a big number to a small number, I’m just like, you’re on crack. Like this is, this is not working out for you. You should, you should think about something else, but let’s focus in over here.

Just this last portion. We have six seconds. And just to make sure, cause batch, like, like you, you might see some operators show up sometimes in batch mode and sometimes in, and in for batch mode operations, uh, SQL, SQL Server does this funny thing where, uh, it, like batch mode operators only track their execution time in wall clock time, not, not in CPU and wall clock time.

But row mode ones, it’s supposed to be cumulative going from right to left. So these numbers should add up, like, like just get higher across the board. So like that does happen here from nine to 1.1 to 1.2 to 2.8 to six.

So this, this top section is fine. This section’s a little cracky, right? Going from 1.4 down to one point, down to 599. And this section is a little cracky going from 1.6 to 1.1.

But, but like we can, we can at least sort of like add those numbers up and get round about six here. Where, where things get real good and cracky is, uh, we go from six to nine to 5.9 to 6.6, 6.6 again to 13.8 to 10.2. I realize that it’s winter time, January-ish.

But if, if Microsoft were, were perhaps looking for, uh, issues to work on in a few months when the next batch of summer interns come in to work on SQL Server, this, this might be a good portion of the code to address. Because this doesn’t make any sense. Right?

Especially, uh, this last repartition streams. Like, if you look down under this, this armpit, if I put this, this arm up a little bit, and let’s just zoom in way down here. Uh, this thing ran for 10 seconds.

Right? We, we can see the 10 right in here. And I had to draw a square on my own because this hand is what hits the button to make it draw rectangles. So we have 10 seconds over here.

But this gather streams might make it look like this query might have taken a sneaky, like 14 seconds. But then if we go and we look at the properties and we say, look at query time stats, we can see that like this agrees with the final operator in the plan. So things got real weird in, in here.

Like, uh, no arguments from me. Uh, so be careful when you’re reading these query plans. But anyway, this thing takes 10 seconds. And part of, you know, why this thing takes 10 seconds is we have this expression here.

Right? Expression 1 0 0 3, which matches up with where we’re looking for scores in the query down here. So like I said, SQL Server has to calculate this whole thing first.

Sometimes you’ll see filter expressions for having clauses. Sometimes you’ll see them if you’re filtering on like a windowing function, like row number, rank, dense rank, blah, blah, blah. Um, other times you’ll see filters if you do goofy things like left join where some column is null.

Uh, other times if you write really complicated predicates, SQL Server won’t be able to push those predicates down to when you touch tables. But like, this is like a legitimate need for having to calculate the whole result and then filter out what we don’t want. So let’s create an index view.

That takes care of our problems there. Right? We’re going to create this. And shockingly, creating this index is going to take just about the same amount of time that running that query took. That might be something you want to keep in mind if you’re going to create an index view.

Because creating the end, the query plan for creating the index view is going to look shockingly like the query plan for running the query. Right? Because we have to do all this stuff.

So you have to be real, real good and mindful when you’re, when you’re creating index views that, um, the query plans that create the query, the query plan to create the index view was pretty good. Because like we talked about in the last video, modifications are going to start getting painful. Right?

When we have to maintain that index view, modification queries that used to be real fast might get real slow. So you have to pay good attention so that when you add index views to your workload, the, the query that creates the, that maintains the index view is not awful because your modification queries are going to be awful. So that’s on you though.

This is the thing you have to know and understand. This is not the fault of the index view. This is the fault of you. You have to pay attention to this stuff. You are the query tuner. When you introduce things into the database to make queries faster, you have to make reason, take reasonable care and caution that they do not make other queries much slower.

Okay. That’s, that’s a you. So now when we run this query though, this query will be very fast.

Right? If we look at the execution plan for this, now hitting that index view and doing all that stuff takes 73 milliseconds. And of course, SQL Server chimes in and says, Hey, I think I can reduce the pain of running this by 50%.

Do we need this to be 50% faster? We just went from 10 seconds to 73 milliseconds. Now SQL Server chimes in and is like, Oh, I’ve got an idea.

Stick with me. Another index. Eh? Eh? Eh? Eh? Come on. So, uh, coming back to something I’ve said a lot. If, if you’re the type of nitwit who spends all of their time looking at missing index requests and just adding those in, uh, God should smite you.

That’s, that’s the end of it. All right. Cool. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. Uh, I hope you had some, I hope you had some, some real fun in this one.

Because there was, there were a few fun moments, I think. There were some, there were some interesting things in here that, uh, uh, make, make my day to day life just a joy when working with SQL Server, just a real joy. Um, and, uh, let’s, let’s, the next video we’re going to talk about is, uh, uh, well, clearly, uh, you can see this is number nine.

Number 10 is going to be about, uh, using indexes to, to fix sorts and query plans. And we’re going to, we’re going to have some fun stuff to talk about there too. So, uh, thank you for watching and, and, and goodbye for now.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Indexed Views And Filtered Indexes

Indexing SQL Server Queries For Performance: Indexed Views And Filtered Indexes



Thanks for watching!

Video Summary

In this video, I dive into the intricacies of using index views in SQL Server as a workaround for limitations with computed columns and filtered indexes. I explore how these three features—computed columns, filtered indexes, and indexed views—should ideally work together but often don’t due to restrictions imposed by SQL Server. By creating an indexed view with a computed column that mimics the behavior of a filtered index, we can optimize queries without running into errors or suboptimal performance. I also discuss how this approach can be particularly useful on standard edition where batch mode limitations make it challenging to achieve optimal query performance.

Full Transcript

Erik Darling here with Darling Data. I don’t know why my hair looks the way it does. It’s very strange to me. It’s clean, it just looks plastic or something. I don’t know what’s happening to me. Anyway, today’s video we have exciting things to talk about. Actually, we have frustrating things to talk about as usual. Today’s video we’re going to talk about how you can use index views sort of like a filtered index. But that seems sort of obvious, right? But where I want to talk about is how you can use index views, sort of like a filtered index. But that seems sort of obvious, right? I don’t know what I’m talking about using it. So Microsoft has these three things that they put in the SQL Server that are really useful. The problem is these three things don’t get along very well. You have indexed views, you have computed columns, and you have filtered indexes. So what’s awesome is if you create a computed column, you can save your queries all sorts of time calculating crappy expressions, especially if those crappy expressions are in like where clauses or joins or something. But you can’t create a filtered index on a filtered index on a filtered column. Then you have index views, which are wonderful, because you can like make this whole complicated query, like presented as a table that you don’t have to like aggregate every single time, which is wonderful, but you can’t create a filtered index on a index view. So it’s like you have these things that like should be in this lovely like thruple relationship, just like getting along, everyone’s laundry is clean, everyone’s got dinner, everyone’s happy. No, we all get We get to like, we get to like, we get to watch all the TV shows and movies we like, but no, no, we don’t get that. We don’t get that. We have like, everyone’s like, separate bedrooms angry at each other, right? It’s no good. It’s not a good situation. So if you like the content that I create, you would like to support my endeavors, like to hear more about not non traditional database relationships, you can you can become a member of this channel for as little as $4 a month, and make me feel great about myself. If you don’t have four bucks a month, the school bully stole your lunch money, you know, it’s gone out of your pocket. If you like, comment or subscribe to like, like my videos, comment on my videos and subscribe to my channel. Then that makes me feel all comfy cozy too. If you are in need of SQL Server consulting, you might you might watch these videos and they the the topics that I cover and them may seem like things that you could use help with in your SQL Server, I am available to do all of these things at a reasonable rate. Hmm. If you would like some fantastic SQL Server training that is far less expensive than what you will find elsewhere on the internet, you can get all of mine for about 150 USD for the rest of your long happy life.

And I mean, honestly, these this this these videos make great heirloom gifts to you can leave these to your kids loved ones. Maybe not even the other people’s kids, you can give them give them away all over the place. They’re great. They’re stocking stuffers. up Let’s get started rumor lindo and then you would like to answer whatever I’d like of life! arded about jakiś photo with SPEED drive!

Press the B video app on wifi Drop your Acts gotta make sure that’s the I’m obviously, well, come on, zoom it.

Kick in, buddy. We’re going to get rid of these two things first, because if we don’t, we might hit some issues. Now I’m going to create a table. Didn’t I just drop you?

What’s happening in the world? Drop table if exists, and then it’s just like, oh, yeah, let’s say we’ll exist all right. Thanks, database.

Selling me down the river here. So I’m going to create a table called index view, and that index view is going to have a computed column in it, or I’m just going to multiply the ID column by two. Now we don’t even have to put any data into this table to see what happens here.

It’s just very annoying. But I cannot create a computed column, or rather I cannot create a filtered index on my computed column, filtering it to where not fizzbuzz equals zero without SQL Server throwing up on me.

All right. So this is what, this is the error we get. Filtered index n cannot be created on table dbo.index view because the column, not fizzbuzz, in the filter expression is a computed column.

Rewrite the filter expression so that it does not include this computed column. Well, then I wouldn’t have a filter expression, would I? Would I, SQL Server?

There would be no filter expression. You need to get over yourself. Allow the nice people who use you to create filtered indexes on computed columns. I mean, fine.

If you need some rules in place for that, I get it. But it should be a thing that you can do. Now, we could, of course, create an indexed view, right?

And let’s say that we do that, right? We get real cool. We say, all right, we’re creating this view called computed column, and we’re going to create it with schema binding so that we can index it.

And then we add a unique clustered index to that indexed view. So now we actually have an indexed view, right? We did this. If I try to create this again, it’ll say it already exists, right? So we indexed this view.

It is alive and existing as an indexed view. But I can’t create a filtered index on that indexed view, right? Now, if I go look at what this whole error message says, I’m going to make this a little bit nicer for all of us because I like to zoom in and why Microsoft doesn’t pretty print error messages and just sends them sprawling across the screen and beyond me, you know, we all make choices.

But the filtered index, NFB, cannot be created on the object dbo.computed column because it is not a user table. Filtered indexes are only supported on tables.

Indexed views are not tables, even though they are sort of presented as such when you query them. If you are trying to create a filtered index on a view, consider creating an index view with a filter expression incorporated in the view definition.

Well, okay. We can do that, can’t we? All right. So one way you can get around this is, of course, by doing this, right?

So we’re going to change our view. And I’m going to tell you something very funny about indexed views. If you alter them, so I have creator alter here. So if you alter an indexed view, it drops the indexes on it.

So if you have an indexed view with like a clustered index and maybe you’ve created other nonclustered indexes on that indexed view in order to make queries faster that touch the indexed view, if you alter that view, your indexes disappear.

You have to add your indexes back. So we’re going to create or alter this view. And we’re going to include where not fizzbuzz equals zero in our where clause here. And now we’re going to create our index.

We get to re-index that table. Isn’t that fun? It’s great. The problem that we run into sometimes is, though, is that with indexed views, for various reasons, especially if you’re on…

So for various reasons, your query, even though it matches the indexed view, might not match the indexed view, right? Like SQL Server’s optimizer might get hung up in one of a couple places.

One, if you’re on standard edition, you’re probably going to need the no expand hint so that your indexed views get touched. If you’re on any edition of SQL Server, you might have a trivial plan or…

Well, a trivial plan accompanied by simple parameterization of your query that might mess things up. And I would say that almost on any edition of SQL Server, you should be using the no expand hint when you touch indexed views because SQL Server does not automatically create statistics on your indexed views without using the no expand hint.

So like you’ll have statistics on whatever column you created your clustered index on, right? Your unique clustered index because you created an index so you get a statistics object. But if you touch other columns in the index view, SQL Server does not auto-generate stats on those unless you use no expand.

So if we run this query and we say ask forward not fizzbuzz equals zero, we actually… You can see in the query plan that there are a bunch of square brackets in here that I did not type.

And there are a bunch of spaces around periods that I did not type. And we have a little parameter looking thing over here which I certainly did not type. And we have touched our indexed view right here.

Or sorry, we have touched our table called indexed view here. Remember, our table is named indexed view and our indexed view is named computed column. I went through great lengths to make this very clear and straightforward.

And the reason why this happens, we use a couple rocket sciencey trace flags here, 8607 to look at compilation information and 3604 to print that information to the messages tab.

We will see over in the messages tab that we got a trivial plan. So it’s possible with… Even if we just got a trivial plan that SQL Server may not have gone to the point of expression matching our query to the indexed view definition.

SQL Server might have just been like, oh, whatever. I don’t know. This is stupid. But then if we scroll way down, we will have two little messages here. The query is marked as cacheable and the query is marked as safe for auto-parameterization.

Oh, sad. So there are two ways that we can get around this issue, right? We can say where not fizzbuzz equals zero and we can use the no expand hint, right?

And this will allow us to touch our indexed view named computed column. Even though we have all this stuff, now SQL Server is like, oh, right.

Got it. Got it. Or we could say where one equals select one and… Oh, there’s nothing in the messages tab there. But the one equals select one prevents… Actually, I should run these next to each other, right?

Do this. And let’s look at these execution plans. And these both hit the index view, right? These both hit compute… Oh, go away, tooltip.

No one asked for you. These both hit computed column. The one up here… Actually, the no expand hint does that? No, it doesn’t. Great. So the no expand hint does keep the optimization level trivial and does have this in there.

But SQL Server now trivializes our query to touch the index view. Down here where we use one equals select one, we’re going to have this additional compute scalar in there, but it’s like nothing.

And now we will have full optimization and full optimization will be like, hey, we found an indexed view. Lucky us. Right?

So good news there. So if we were to look at this information, the trace flag information for either of these, you would see that this is safe for auto-parameterization, but we have said to SQL Server, you will not expand our views.

So this was considered safe there. And then if we look at this and the messages have, we will not see the simple, the trivial plan information up here and we will not see…

Oops, zoom in too well there. We will not see the safe for auto-parameterization message here. So you can do things with indexed views that you cannot traditionally do with filtered indexes and computed columns.

You do… You still cannot create a filtered index on an indexed view, which is annoying as hell, but it is nice at least that you can create an indexed view that has a computed column in the where clause and then like use that in your queries generally to filter to where that indexed…

To where that computed column hits some values that you really care about. I’ve used this a few times over the years. It’s certainly not something that I need terribly often, but it is something that when it comes up, it is incredibly useful.

So I do hope that you can take something away from this that you will perhaps use in your endeavors, tuning queries and indexes. Stuff like this is especially useful if you’re on standard edition.

You know, a lot of stuff on enterprise edition, just using batch mode is like a good enough shortcut to getting good performance on big aggregate queries without having like this additional index structure to maintain.

So oftentimes like, you know, I’ll short be like, okay, let’s just see where batch mode gets us. If batch mode gets us like, you know, you know, 70, 80, 90% speed up, we’re probably not going to go this far unless we absolutely need to.

But if you’re on standard edition, we’re batch mode because Microsoft hates you, is limited to a .book 2, you’re going to have a bad time. Like it’s not like that query. It might, it might at best break even at .2 using batch mode.

But, you know, if you’re running it like .4, 6, 8, or higher on standard edition in your aggregation query and index views don’t have that limitation on them. Somehow Microsoft let that one slide.

I don’t know. The bean counters must have figured, hey, no one uses these things anyway. But yeah, like if you’re on standard edition, the, like, like there’s no limitation on . when you query index views.

So if you have a big parallel plan or something, those might turn out better with an index view than using batch mode. So stuff worth thinking about, stuff worth considering. And as always, it’s worth protesting outside your local Microsoft office to get them to treat standard edition users with a little bit more respect.

You know, with all the 2025 announcements that have started bubbling around, I still haven’t seen anything about standard edition being able to use more than 128 gigs of RAM, which is just a laughable situation in the year that we’re in.

So I don’t know. Maybe, maybe, maybe we’ll get some love on that. But I kind of doubt it because, you know, you need to pay that friendship task tax.

If you don’t pay the friendship tax, Microsoft isn’t going to be your friend. Anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And, in the next video, we are going to talk one more time about index views, but in a slightly different way, because we have a little bit more to go over with them.

Don’t we? We do. Index views. Who knew? Who knew? Anyway. Cool. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Indexed View Maintenance

Indexing SQL Server Queries For Performance: Indexed View Maintenance



Thanks for watching!

Video Summary

In this video, I dive into the intricacies of indexed views in SQL Server, exploring their maintenance and limitations. Indexed views can significantly enhance query performance but come with a set of restrictions that make them less than ideal for certain scenarios. I discuss setting up optimal conditions for index view creation and maintenance, highlighting common issues like subquery limitations and the absence of basic aggregate functions such as `min` and `max`. Additionally, I share practical tips on how to mitigate these challenges by creating supporting indexes, ensuring efficient query performance. This video is part of a series where we’ll explore more ways to leverage indexed views for complex queries and address some of the limitations they present in SQL Server.

Full Transcript

Erik Darling here with Darling Data. And fortunately, I remember to turn my microphone on at the beginning of this video, so I don’t have to do 15 more takes of this. That’s nice. In this video, I think the next three videos we’re going to be talking about, I mean, still indexing SQL Server queries for performance, but there are three things that I want to talk about when it comes to indexed views. Indexed view maintenance, how you can use indexed indexes, where you can’t normally create filtered indexes, and what was the other one? Fixing what would normally be sort of a non-sargable predicate, or making aggregate queries faster, specifically when you have a big query that has a having clause on it. So, one thing that we’re going to have to talk about right at the outset, is that we’re going to talk about the first thing. The first thing I want to talk about the first thing, but the first thing that we’re going to talk about is the first thing that we’re going to talk about is that all the restrictions and things you can’t do with indexed views, because there are a lot of them in SQL Server, and they are depressing and painful. So, we already covered this one, so I forgot to click, or maybe, I don’t know, maybe I clicked and nothing happened. It’s hard to tell sometimes. We live in a world of mystery, don’t we?

But if you want to support the work that I do, creating content for this channel, you can click the little link in the video description right down in here somewhere, thereabouts. Sorry, my finger disappeared. If you know me, that happens a lot. And you can become a member of the channel for as little as $4 a month. These fingers don’t disappear, do they? If you don’t have $4 a month to spare, for whatever reason, maybe you spent it all on Spare Ribs, liking, commenting, subscribing, all fantastic endeavors that you can embark upon in your daily life to keep me company on the internet. If the stuff that I’m talking about, say, gets into your head, and you’re like, man, I’m having these problems with my SQL Server, maybe hiring Erik Darling, maybe he can fix these problems with my SQL Server in exchange for money. And yes, I can. I can do that. Everything that you see me do here, I can do on your server. It’s not just this server that these things work on. It’s not this magical microcosm of SQL Server where I can solve these little problems and things get better. No, no, I can do this on your servers too.

Even if they’re in Microsoft’s cloud, which is just notoriously awful. So all sorts of things that can be done here. And as always, my rates are reasonable. If you would like some fantastic SQL Server training, I have it. No one else does. I’ll be honest. I’ve seen everyone else’s. Mediocre at best. You can get all 24 hours of my training for 150 USD. It’s a much better bargain than you’re going to find anywhere else on the internet. Unless you pirate these. And then there’s nothing I can do. But if you go to the link up there and you use the discount code there, you can get all of it. And there’s no subscription. It’s just yours forever.

It’s like when you used to buy a CD or a DVD. You just you just had it. You didn’t have to like pay 10 bucks a month to keep listening or watching. Funny, I guess. Upcoming events for 2025. Again, you will know as soon as I’m allowed to tell you. With that out of the way, let’s talk a little bit about these here. Indexed view things. So like I like I said, the first thing we have to talk about is some of the stuff that like preempts using index views. So really? See, I didn’t notice this when I first cracked this open, but apparently Microsoft is now advertising on their learn page.

That’s cheesy. Who would do that? All right. Kind of weenie. What kind of weenie advertises on their website? But if you’re going to create index views, this actually goes for filtered indexes as well and computed columns. And you expect your queries to use them off the like without any sort of interference or without throwing errors. You do need to follow these set options. So anti nulls, anti padding, anti warnings, erythabort, concatenal, yields null, and quoted identifier all need to be set to on.

Right? This is all says on. And then, of course, numeric round abort, this weirdo. That has to be set to off. So that’s probably the easy part. I say probably because there was one time when I was like, I’m going to create a filtered index. It’s going to be dope.

And this cold fusion application, which uses like the weirdest driver to talk to SQL Server, just started throwing errors everywhere. And I had to get rid of the filter on the index. And that was a funny day. But hey, you live and you learn sometimes, you know.

Mistakes from years past turn us into better people, right? We don’t experience any pain. We learn nothing. You can tell by maybe the number of tattoos that I have that I’ve learned quite a bit.

So there are some other problems with indexed views as well. And there are things that you can’t do inside of them. Some of them are worse than others.

So, you know, CTE. I don’t really care about CTE anyway. So screw it. Subqueries.

This is an interesting one. Because, you know, some people do have, you know, very pathological issues with subqueries in general. But this actually doesn’t really fully explain just how many limitations this introduces.

You cannot use exists in an index view definition. You can only use joins. You can only use inner joins.

You might see, if you look down here, there are many other types of relational table joining syntaxes that you cannot use. But subqueries actually prevent the use of saying, like, where exists something in this table that correlates to this other table. Which is wild.

You cannot put an exists clause in a SQL Server Indexed View. Huh. I know. I know. Basic stuff. Right?

There’s some other things in here. So this one is particularly funny to me. You can’t use distinct, but you can group by every single column. What do we think that is?

Hmm. Hmm. What is that? Funny. Funny, funny, funny. So, derived table expressions. So, if you wanted to, like, put a derived join in your index view, you cannot do that.

So, if you want to say, like, from select from this thing join the result of this select to, like, some base table, you can’t do that. You cannot do self-joins, whatever. I don’t know.

I guess that makes sense. You cannot use table variables, inline table valued functions, or multi-statement table valued functions. This makes sense, and this makes sense. This does not make a lot of sense.

Inline table valued functions are typically very simple things. You can only put, like, a select in there anyway, so that doesn’t make a lot of sense. And then there’s some stuff down here that, like, you know, kind of, like, who cares a little bit? Pivot and unpivot.

Screw you. Table sample system time. Full text search. If you’re using full text search in SQL Server, you deserve every ounce of pain that you experience. That is nonsense. Cube rollup.

Okay, whatever. You cannot have having in there, which is actually one of the problems we’re going to solve later. And these are interesting ones, I think. Union, union all, accept and intersect.

Again, basic relational syntax. Microsoft, what are you doing? What’s happening? Where’s that $250 billion a year going? I mean, I know you’re burning a lot on AI lately, but index views have been around for a long time.

How about, like, this much of an improvement? There’s something, actually. It should be in this page, but I don’t see it immediately.

Yeah. This is what I was looking for. So you have to use count big.

You can’t use count. That’s fine. You can’t use average. You have to use count and sum as separate columns. But I think the big one in here is this.

Well, actually, probably this. You know, like, one of the single most common things that you can… And this is funny for a reason.

I’m going to tell you in a minute. Some of the most basic stuff that you would want to aggregate in having an index view are min and max. And a long time ago, a guy named…

Decided that it was not worth maintaining a nonclustered index in the index view to support min and max. So we don’t have that option available to us. And, you know, like 20-something years later, I don’t know if we’re ever going to get it.

You can use greatest and least in index views. Those are newer T-SQL syntax from, like, 2022. You can put those in index views all day and everyone’s like, fine.

Great. So there’s a lot of restrictions on index views that are really annoying. And it would be nice if Microsoft invested, like, a little… A fraction of the money that they blow on OpenAI on improving SQL Server just a little bit before, you know, SQL Server slowly turns into COBOL and Fortran and whatever.

But there’s a lot of stuff with index views that’s missing. Like, you can’t do it. You can do it anywhere else. So in this video, we’re going to talk about index view maintenance. Because one thing that you do need to consider when you create index views is, like…

So, like, under, like, optimal conditions, having an index view shouldn’t be any more maintenance for your query than having another nonclustered index on the table or whatever. But, like, you do have to consider how that index view is assembled.

So let’s look at this query, right? And I’m going to run this update in the transaction so that we don’t actually change all these scores to a higher score and have weird results in the future.

But if you look at the query plan for this, this runs pretty quickly, right? Just a little over half a second. Like, I know, this could be better with an index. We just recorded a view on improving…

Sorry, we just recorded a video. Not a view. We just recorded a video on using indexes to improve the performance of modification queries. So, yes, I know, we could make this better.

But just stick with me for now. Now we’re going to create this view. And this view, I mean, we’re updating the post table here. And this view has the post table in it.

Now, there are plenty of warnings that go along with views that reference more than one table, especially around locking and blocking. Serializable isolation levels will kick in when these views need to get maintained.

So it really does behoove you to make index view maintenance as quick as possible. So we’ve got that index view… Sorry, we’ve got that view in place. Now we have to create a unique clustered index on the view in order to get it to be an index view, a materialized view, if you prefer that parlance.

It’s fine if you do. So this takes 20 seconds or so to complete. I don’t know. Hey, 20 seconds on the nose.

Look it. Right under my armpit. There it is. 20 seconds. Wonderful. It’s almost like I’ve done this before. But now when we go to run this update query, I just got the estimated plan real quick to show you that it gets a bit more complicated, doesn’t it?

And then when we actually run the query, this thing that went from finishing in 600 and something milliseconds, well, this is going to drag on a bit.

This is going to end up being, I don’t know, probably around like 12 seconds. 12. Look at that. Oh, sorry.

Look at that. 12. 1, 2. Right there. Look at that. 12 seconds. And the query plan explains a bit why. We have some problems in this query plan, don’t we? IndexView maintenance, if you’re not paying close attention, can be quite painful.

We spend about seven seconds scanning the clustered index of the votes table. SQL Server misunderstands this hash join and this hash aggregate. This adds a fairly significant chunk of time to the query plan.

We go from seven seconds there to almost 11 seconds there. And then, well, another second or so there. And then another sorty, spilly thing here.

So the indexView maintenance in this case is kind of gnarly. But if we were to pay attention to the query plan that SQL Server has there, there’s actually a missing index request that is helpful for like once ever.

SQL Server is like, hey, psst. If we had an index on this post ID column in the votes table, I think the indexView maintenance would suck less. And SQL Server, for all its faults, is actually right on this one.

If we had an index on the post ID column, that indexView maintenance would be a lot less terrible. So we created that. This index is there now.

And now if we go look at this update, this should only take a couple seconds. And so like this went from being 600 something milliseconds. I mean, like the time did double, which maybe isn’t great.

But if this indexView is quite helpful. And look, there’s even another, hey, if you create this other index, boy howdy, we can speed this up. And of course, this one is sort of right too because that is half the time we spent there.

But this we knew about. This we knew about before, right? We were aware of the situation when we first ran that update. We knew because we watched my other video on improving modification performance with indexes that if we had an index there, we could make this faster.

So if we were to create that index, this thing would take, this thing would be pretty quick all in all. Now, indexViews do need to be created for a good reason because you do introduce this overhead, particularly when that indexView references more than one table. You know, it can, again, like really puts in weird emphasis on locking with the serializable stuff.

So you have to be careful there. But indexViews when properly used and when properly indexed for can be very powerful things. So in the next two videos, we’re going to talk about different ways you can use indexViews to get around some tricky stuff in SQL Server queries and query plans.

And some limitations with other fancy indexing features in SQL Server. So we’re going to do that. I’m going to end this one here.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I hope that you will petition Microsoft to make indexViews not suck, finally. I don’t know.

Something along those lines. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.