Annoyances With Filtered Indexes And Computed Columns In SQL Server
Thanks for watching!
Video Summary
In this video, I delve into some of my pet peeves with filtered indexes and computed columns in SQL Server. Specifically, I highlight how the query optimizer’s inability to use inverse logic when working with these features can lead to suboptimal performance, especially on larger tables. For instance, creating a filtered index where `isDeleted = 0` works fine, but trying to use `isDeleted != 1` instead triggers an optimizer error, despite the data being present in the index. Similarly, using computed columns for calculations like `upvotes + downvotes > 10,000` can result in full table scans if you mix up the column order, even though indexing the computed column could theoretically speed things up. These issues underscore the importance of careful query design and planning to ensure that your indexes and computed columns are used effectively.
Full Transcript
Erik Darling here with Darling Data. Getting younger and better looking by the day, I’m told. In this video, we’re going to talk about some of my annoyances with filtered indexes and computed columns. Granted, there are more than this, but this is just one that even a very casual observer may find quite annoying. And mostly, it has to do with the expression matching abilities of SQL Server’s query execution engine. Mostly in that it’s very hard for it to, well, actually, not hard. It is not currently implemented in SQL Server’s query optimizer to be able to use inverse logic and still match expressions to things. It’s really annoying. So we’re going to talk about that. As usual, if you would like to support this channel, there are very low-cost memberships. I think the entry-level one is like $4 a month. And that’s a great way to keep from commercials making their way into the Darling Data ecosystem. If for some reason you’re unable to partake in a $4 a month or more subscription, then great ways to show your support are likes and comments and subscribes because those numbers going up make me just as happy as I’m going to be able to share.
So I think that’s my monthly earnings. My monthly earnings going up, which right now are at $30 and 70-something cents. So, you know, I think the bare minimum for a deposit from YouTube is $100. So every four or so months, I get $100 from this channel. So the other numbers going up is equally as gratifying. We’re not talking millions here. So if you need help with SQL Server from a consulting point of view, from a man with a slightly itchy eye, this is the kind of stuff that I normally do. Health checks, performance analysis, hands-on tuning, responding to SQL Server emergencies, and developer training.
If you need something else, give me a holler. We can talk about something else. But this is the stuff that I typically help clients with. If you need low-cost SQL Server training, you can get all 24 hours of mine for about $150 with the discount code SpringCleaning. Of course, if you just click on the link in the video description, you will get a direct link that applies that coupon code for you. So you can do that.
Some upcoming events. Friday, September 6th, I will be at Data Saturday Dallas doing a full-day pre-con. And November 4th and 5th, of course, I will be at Past Data Summit splitting. Well, I mean, like, I have, like, two pre-cons each day split with Kendra Little.
So November 4th and November 5th, me and Kendra will be double-tag-teaming some pre-con material for all of you lovely folks out there who can show up in Seattle. Now let’s get on with my annoyances with filtered indexes and computed columns. We’re going to start over in my crap database.
And we’re going to create a table called isDeleted. It’s just kind of a transient junk table that I’m going to use to show you my first annoyance with these things. I’m not creating it in the Stack Overflow database because I always forget to get rid of these things.
And then I have a bunch of tables that I don’t want or need in the Stack Overflow database. And we’re just going to stick some data in there using the sys.messages built-in DMV table where SQL Server holds all the error messages that SQL Server has in them. If you’re ever exploring a new or different version of SQL Server, looking for new error messages is a really fun thing to do.
You find all sorts of new things. You also find lots of typos and grammatical errors and incomplete error messages and really unhelpful error messages. The quality of Microsoft everything has slowly been like on this decline.
Some of it more like this. Other of it kind of like a step pattern. But the quality of error messages has really been eating it over the last few years.
Some of the error messages that you get, you’re just like, I can’t possibly do anything with that. How do I fix this? So now we’re going to create an index on this table that obeys all of the rules that filtered indexes need to obey in order for us to be able to use them efficiently.
And mostly that is that we’re creating this filtered index on the column DT. We’re including the isDeleted column so that it is part of the index definition. And we have our where clause to where isDeleted equals zero here because no one needs to see things that are deleted.
So if I were to say zero equals isDeleted like in this one, we would do just fine. I’ll show you all this stuff in a second. But if you use the sort of inverse logic and you say not one equals isDeleted or you say isDeleted not equal to one or anything like this, SQL Server will just say, oh, that’s nice, but I’m just going to scan the clustered index instead of using your filtered index instead.
Now, granted, for a table with 300,000 rows in it, it’s not going to be a big performance hit. This is just to show you the behavior that I’m annoyed with. You can imagine where this would be annoying to you in real life on much bigger tables, particularly where isDeleted equals zero is a fairly sizable portion of the table data.
Or like I talked about in the considerations for soft deletes video, maybe where you’ve set up some sort of view or something in order to get back the data you care about. So we even go a step further here, and if we hint the index on isDeleted, the ISD index on isDeleted, and we tell SQL Server, please use this index. It is a perfectly feasible, usable index to satisfy this where clause.
SQL Server will throw an error, an optimizer error. The query processor, I’m on Zoom at Ketchup. The query processor cannot produce a query plan.
Wow. Wow. It’s just, nah. It freaks out that hard about things. It’s kind of ridiculous that this is still something that people have to deal with here in the year 2024. Of course, if we just say, hey, pretty please, let’s just write this the correct way, then SQL Server will seek into our nonclustered index and find those not deleted records.
So that’s my first annoyance. The optimizer is just not geared to use the inverse logic to still be able to use a filtered index, even though exactly what you’re asking for is in that filtered index. It’s a bit column, right?
There’s no two ways about it. Another thing that I find quite annoying is in relation to computed columns. Now, computed columns have nearly the same problem.
In this case, let’s say that we wanted to look at who has a lot of upvotes in the Stack Overflow database. This is just a short query to show you that there are actually people who up and downvote things a lot, including whoever ID negative one is. That seems like a really powerful user.
And, of course, you know what? This create index statement, I’m going to start running that, but then I’m going to move it up here because it should really be up here because this is the important thing. But if we look at the query plan for this, now I’ve talked about this before where it’s sort of like a sargability thing, right?
Search argumentability. This is one of those patterns that ruins search argumentability for queries, right? If you concatenate two columns, in this case, we’re adding two integer values together.
Even with a perfect index in place on upvotes and downvotes, SQL Server doesn’t track that particular correlation, right? It knows the values in upvotes and they’re in order and it knows the values in downvotes and for, you know, for duplicate values in upvotes, the downvotes are ordered in there. But it’s not tracking what upvotes plus downvotes equals, right?
So there’s no way we can seek to that data. The best that we can do is scan through that data, add upvotes and downvotes together, and then see who has more than 1,000 of them. This should actually be 10,000, I think.
Let’s make this not 10. That’s not a zero. That’s an O. If we run this, we’ll get a slightly lower number. But it’ll still be the same deal. So we scan the entire index that we created on upvotes and downvotes.
We had to read all 2.4 million columns. Sorry, all 2.4 million rows, add those columns together, and then figure out if the result of that matched our, was, you know, was correct for our predicate, right? So this plus this is greater than 10,000, right?
Best index in the world still doesn’t beat that. Now, what you can do is you can create a computed column, and you can, even without persisting it, right? Notice there is no persisted in here.
Even without persisting it, you can index that computed column, right? Another thing that slips by a lot of people is that even if you don’t persist a computed column, you can still get statistics generated on it, which is nice, right? Because, you know, persisting computed columns on big tables can be kind of painful.
But if you go and create an index on that column, then you will, thankfully, write that out. There will no longer be a virtual column in this index. It’s still a virtual column in the clustered index or the table if you don’t have a clustered index.
But in this index, all those values are written down. They are permanente in there. So, if we run this and we search for upvotes plus downvotes is greater than 10,000, we will get a nice index seek into our nonclustered index. It will be quick and fun and painless.
But if we mix up the column order in that and we say downvotes plus upvotes is greater than 10,000, SQL Server can no longer figure out just what the heck to do with itself. We are back to scanning the clustered index, reading all 2.4 some odd million rows, and figuring out how to add those columns together and what they equal.
So, when you’re using computed columns and filtered indexes, do try to write them in a way. And I just want to add, if we tried to hint that nonclustered index up there, we would get the same query processor that we saw with the filtered index example that came first.
So, when you’re making computed columns, you’re making filtered indexes, make sure that whatever queries you want to use them are the basis for those computed columns and filtered indexes. If you have a bunch of different sort of query forms that sometimes look for things on slightly different arrangements, SQL Server will, A, not be able to match them directly to the computed column or filtered index, but, B, you’ll get an optimizer error if you try to tell it to.
So, those are the two things to be very careful about with filtered indexes and computed columns, is making sure that they adhere to the queries as they’re written, or how you’re going to write the queries in the future, or how you’re planning on rewriting the queries to match these things so that they actually work together.
So, those are a couple of things that I find annoying. Actually, it’s kind of one thing that I find annoying about both of them, isn’t it? Same problem, twice.
Same problem, one problem, two different facilities in SQL Server. Two different arrangements where things are not good. So, thank you for watching. Remember, all this stuff, like, subscribe, comment, hire me, all that good stuff.
I have training, and I’m going to be doing live things. If you’ve forgotten any of the details there, you can rewind the video and watch the little housekeeping section at the beginning where I talk about all that stuff.
For now, I’m going to go… Well, apparently, I have a call starting in a couple minutes, so I’m going to go do that, and then I have to record 7 million more videos to make sure that things stay flowing in a nice, even pace.
So, thank you for watching. I’m trying to go pay the bills, as they say. All right. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.