Introducing sp_IndexCleanup!

Introducing sp_IndexCleanup!


Video Summary

In this video, I’m Erik Darling from Darling Data, and today we’re unveiling a brand new stored procedure to help you streamline your SQL Server management. SP_index_cleanup is designed to identify unused and duplicative indexes, which can be major culprits in performance issues like buffer pool contention and lock escalation. With this tool, you’ll get recommendations for tuning queries and indexes, fixing blocking and deadlocking problems, all while making your databases more efficient. The process involves removing unused indexes, deduplicating leftovers, and adding page compression to make your data smaller on disk and in memory. This not only helps reduce transaction log throughput issues but also eases the burden of managing too many indexes.

The script works through a series of well-defined rules to identify exact duplicates and superset/subset relationships between indexes, ensuring that you’re making informed decisions about which indexes to keep or remove. It then scripts out all necessary changes for you, including merging included columns into create statements, disabling unused indexes, and compressing any indexes without page compression. This iterative process helps ensure that your database is optimized for performance while minimizing the time spent on repetitive tasks. Whether you’re dealing with a single table or an entire database, SP_index_cleanup provides detailed reports to guide your decisions, making it easier than ever to get better at SQL Server management.

Full Transcript

Erik Darling here with Darling Data and today we have an exciting announcement. We are welcoming a new member to the family and I don’t mean I’m hiring someone because who the hell would want to work for me. What I mean is that we are introducing a brand new stored procedure to help you with your SQL Server nonsense. It’s been a while since I’ve written a brand new one of these. I think sometime towards the end of 2024 I spit out SP Health Parser and SP Log Hunter, which are valued members of the family, but as a wise woman once said, you’re getting too big to cuddle. So I’ve got a new one and this one is called SP underscore index cleanup. And the goal of this stored procedure is to help you identify unused and with di solipe.

performance, make recommendations, tune queries and indexes, fix blocking and deadlocking problems, and of course help you get better at SQL Server. And like most of my time gets spent in here. And as much as I love like tuning queries and indexes and like other words, like, you know, tuning queries so they go faster and creating indexes to help queries go faster, part of this analysis is looking at unused and duplicative indexes because unused and duplicative indexes cause a lot of problems. I recorded a few videos about the problems they cause around like buffer pool contention, lock escalation and like trend, like the more indexes you have, the more transaction log stuff you have to do. So having like too many indexes, especially of the like unhealthy kind, it’s like, like bad cholesterol or something. Having too many indexes of the unhealthy kind impacts SQL Server in a number of areas. So most of the databases that I look at have way more data than memory, have only ever had indexes added, it seems to be like an append only operation, then have lots of bad deadlocking problems and cleaning up unused and duplicative indexes can help reduce these things, right? So like, like you have fewer indexes, you have fewer objects competing for buffer pool space, you have fewer indexes. So you have fewer lock escalations.

And you know, if you have transaction log throughput trouble, or like you have an AG, we’re like, you know, sending all these index changes across to other nodes is a pain, getting rid of these sort of like unhealthy indexes is very useful. So the clean cleanup process typically consists of removing unused indexes, deduplicating leftovers and adding page compression.

Right? And adding page compression is great because it makes your data smaller on disk and in memory. So you make better use of the memory that you have. So the point of this script is to really clean up all the indexes you have, make them as small as possible. And then you have a very clear sense of like, you know, do you need more memory? Now you know, right? Like, like now you can figure it out.

Now, you don’t want to double work things. So whenever I’m going through this process, right? My job, like the first thing I do is find unused indexes, right? And I’m listed all the unused indexes. And then I grab all the duplicative indexes, either like exact key column matches or indexes that lead with the same key column. And then I have to go and look and see if any of the unused indexes are in there because I don’t want to double work anything. So I’m like, Oh, I can delete this one. I can do this one. And then after you finally do that, you can look at the list of indexes that you have left.

The problem is now that you’ve, now that you like have this list of duplicative indexes that you can, you can, you can analyze, you have to look at key columns, key column order. You have to look at included columns. You have to look at if they have filters on them. You have to look at if they’re unique or not. You have to compare like column sort orders, they ascending, descending. And it’s just like a lot of just like, Oh, we have to move this down. Oh, this include list goes way off the SMS screen. Let’s bring that down. Oh, this one looks like it has the same number. Oh, this one has 30 includes this one has 31 includes what’s different. It’s it’s just it’s not fun. And after all that, you still have to like edit all these create statements to merge the included columns, which means like you have to make a, like you have the create index statement and then you’d like do the drop existing blah, blah, blah, make sure all the included columns are accounted for. Uh, like for all the create statements for the indexes, you’re going to get rid of, you have to like alter table, like alter index, blah, blah, disable. And then like, like, you know, fixing all of the other stuff that goes wrong with like index creation, you have like fill factors that are really low. Uh, can you use online, you use, are you using a version of SQL Server where you can create indexes online or not?

Uh, do you need to like, you know, drop the, like do drop existing? Do you need to add compression? Like all this stuff that you have to think of every time you do this, it gets very, very annoying. And then because index tuning is a very iterative process and I realize I’m standing in the way of some of these words. So I’m just gonna, I’m gonna duck down for a moment and oh, the green screen effect. We’re having a good time with this, aren’t we? Uh, so then you get to do it all over again because index tuning is a highly iterative process, right?

Like you go through, you make this first set of changes, you see how things look with SQL Server, and then you, you check in and then you make sure that like, there’s no new crap. And then you can just do all these things until, until time, time immemorial or something. Um, but even after doing all that, you still don’t have a full list of indexes that are not compressed that you probably should compress because you have too much data and not enough memory, especially if you’re on standard edition, this is a big deal. Now, I absolutely adore SP Blitz index. I, I still use it. Uh, I still, well, I mean, I haven’t had anything to contribute to it in a while. I feel like I’ve contributed all, maybe all that I can to that. Uh, but, and it’s great at analyzing things. It is fantastic at doing the analysis and flagging things that are like worth you looking at. So like unused indexes, what’s duplicative based on either all of the leading key columns and a lot more other stuff.

I’ve written some checks in there that are very, very useful to this day. Uh, SP Blitz index, of course, originally written by Kendra Little now, like, I guess, happily maintained by the, the, the greater open source community of SQL Server. But, uh, like adding all this code to that would have been a nightmare. Um, so there, there is also a mode with Blitz index where we’ll inventory all the indexes in your database and give you scripted create and drop statements. But like, if you want to add compression, modifying those statements is not the most fun in the world. Uh, and it doesn’t do any of this work for you. Now, me being a consultant, um, I want to spend as little time on these sort of tedious, repetitive tasks as possible. So I wrote SP index cleanup. Um, so it will identify unused indexes. It’ll warn you if your server uptime is not very long because it’s a bet. It’s not a, you can’t make a very confident decision about unused indexes. If your server hasn’t been up for a terribly long time, uh, it will compare index and included columns. It will take into account like uniqueness and filters and all that other good stuff. And it will find compressible indexes.

So anything without page compression already, you will find those and mark those and say, Hey, you should probably compress this. Uh, and it works off a set of well-defined rules for unused exact duplicate key columns, uh, superset subset key columns, uh, unique versus non unique keys, including just like unique constraints. Um, one thing this script does that I’m very happy with is if you have, uh, unique constraints on your table and you also have a nonclustered index that is on those same columns and has like some includes, then it will tell you to get rid of the constraint and make the non clustered index unique. So you can kill two birds with one stone there. Uh, it will evaluate column sort directions. So if like just say you’re on the user’s table and you have something on reputation, ascending and something on reputation descending, it won’t call those dupes. And it will also merge include columns into a handy create script for you with all the options, like the way that I want them, right? When I do this stuff. Uh, so like I said, it scripts out all the actions for you. It’ll merge any missing included columns into keeper indexes. It’ll disable unused and exact dupes or subset indexes, and it’ll compress any indexes without compression. Uh, it’ll give you some things to review.

I mean, all of this is for review. It doesn’t automatically do all this stuff. It just gives you the script to do it. Um, so like stuff that you might have to review, uh, like as part of like the overall results are indexes that are again, like have like the same key columns, just in different orders. Again, like columns, ABC column, the CBA or BAC or something. Uh, you can look at the usage patterns there and figure out if that’s like, like you want one or the other or both or whatever, uh, indexes that we’re keeping. I leave in the results so you can validate any changes there.

Like one thing it’ll flag is like this index is a subset or of this index or this index, it’s a superset of this index. So you can like go and like grab those definitions and compare them. Uh, it’ll give you all the proposed scripted changes and it will give you a summary report about all of the actions that you can potentially take at like overall database and table level and like the potential gains from executing on each one of those actions. Uh, I don’t do any double working. Um, if indexes are unused, they’re not part of the D-dupe analysis. So, you know, again, like server uptime is pretty important for this. You want to make sure that like, you know, your server has been up for like a day or two or something like that. You want to make sure that you have at least a couple of weeks of uptime to make sure that like you have a good amount of like index usage information in there. Uh, any indexes that get merged will already have compression, uh, scripted into them. So you don’t need, so you won’t get like a separate compress this index because you already compressed it when you did the merge thing. Uh, and compression, so compression recommendations don’t include indexes being merged or indexes being disabled for any reason.

Uh, and so let’s talk a little bit about usage. Um, there are things you can filter on. You can filter to a specific database if you want, and you can also filter to a specific table if you want to focus on one table.

Uh, there are also some things about your tables and indexes that you can filter on, like a minimum number of reads, writes, minimum number of rows in the table, and the minimum table size in gigs. Um, this is it that the size, the min size GB is a decimal, so you can put things in there if you’re interested in something that’s like 500, 500 megs or something.

Um, you can also choose to get all databases, and you can choose to include or exclude databases as part of that. So if you want to like get all databases, but you want DB1 and DB2 and not DB3 and DB4, you can tell the script that and it will do it. And so you get just the databases that you care about, because me, Erik Darling, I care about you. Uh, some sample results for this thing. I realize that my, my body is in the way of some of these, that’s okay, but this is sort of what the cleanup output looks like.

Uh, you get the type of script, so whether you’re going to merge, disable, or compress something. Uh, any additional info, note in this column we do have like the warning, there’s less than 14 days of uptime. Uh, then you get stuff like the database schema, the table name, the index name, uh, the consolidation rule that it got hit with. Uh, and then down here we have the target index name and the superseded stuff.

So this is, this is what I was talking about where like, uh, if you need to compare which indexes are, like which index won the merge war and like which index is getting merged into it and which one like is the mergee and the merger, then you can, you can sort of follow these names and do that.

Uh, you get the size of the indexes, the number of rows in the index. So filtered indexes, they’ll show some stuff different. Uh, the number of, you get like the read and write ratio of these. Of course, you know, this is just me like ginning up some test stuff. So the reads and writes aren’t very impressive looking here. And then there’s the original index definition. So like the index as it exists right now. And then there’s also the, uh, this final column, the script, this is what you would run to apply whatever changes the script is recommending. And then finally we have the reporting output, uh, where you will see, uh, stuff at the, like all the things that were analyzed.

This does require a little bit of explanation, which I’ll get to in a minute, but you have the number of objects analyzed. Uh, then you have like the database and table level numbers, uh, the database name, uh, like tables analyzed, total indexes, indexes you can remove, indexes you can merge, and then like some percentages. This first line up here is only the stuff that was analyzed. So like in my stack overflow database right now, the only table that I have a bunch of extra indexes on is the users table. So that’s why this says tables analyzed one and total index is 25. If you look at the, like, like the number of indexes in here, it says 30 and total for all this, that that’s, that’s by design.

Uh, the total index is 25. There are 25 nonclustered indexes on the users table. That’s what was analyzed. There are 26 total indexes on the users table, including the clustered index. Um, this script will not like try to tell you to get rid of nonclustered indexes if they match your clustered index or if they match a non-clustered primary key, because that would be stupid, right?

You don’t want to mess with those. Those are like, those are more specific things than just some non-clustered indexes that people have clicked, right clicked on the green text to create. Uh, you also get some size information. So the current size of, uh, again, this line here is only for the analyzed stuff.

So this top line is always going to look different from the rest of the results, but for the rest of it, you have like the, at the database level, the size of the database, the size of the tables, how much space you can save by cleaning up indexes. Um, there’s also some guesses at how much space you can save, uh, both like, uh, like min and max for, uh, the total for the database and everything.

And then you have the total rows and then like a breakdown of reads, writes, locks, slashes, all this other good stuff. And then at the table level, it’ll show you like how many like things you can save a day. Let me move over to the side a little bit here so you can see better.

Uh, how many things you can save a day by cleaning up indexes. So this is the user’s table right here. This is the user’s table right here. It’s had 499 latch weights and we could save 25 latch weights a day by cleaning up the nine removable indexes on here and performing our two merge statements. So, uh, that’s kind of how this whole thing works. Um, if you want to get it and try it out, you can go to code.erikdarling.com and you can do that there, but, uh, let’s, let’s do some live demo action on this thing. Uh, so you can kind of see how it works. And I’m going to show you just a couple of other things while we’re in here. So, um, what I’ve done is created, these are the 25 non-clustered indexes on this table. Uh, this is the script that I run to generate some reads against them. So that, against most of them, some of them I do want to be unused to test the unused rules.

And then this is the store procedure itself in all its glory. Uh, like all of my store procedures, uh, there is a help parameter. So if you run this, it’ll tell you, uh, what parameters are currently available. It’ll introduce itself like a proper, proper young gentleman. Uh, it will, uh, give you all of the parameters, uh, their data type, their description, their valid inputs, and their default values. Uh, and then if you run this just normal, and this runs pretty quick, at least on my machine, this is, these are the results that we just looked at, uh, where you can see all of this stuff that we just talked about where, um, you know, like you get the, all of the scripted output here. Uh, it’s lots of, lots of fun. And then down here you get the, uh, the reporting analysis on everything.

So there is lots of stuff in here. Um, it works. Uh, I’ve tested it. I’ve, I’ve actually done unit testing to like created like indexes that match rules and like extended rules and stuff. And, uh, everything seems to be working very well. Uh, so I’m excited to get out and try it on more real world servers. I’ve only had a chance to run it on two or three client servers to try and weed out some of the initial, like, I just wrote this script problems. Cause there are always, there always are some, and, uh, that was very helpful to be able to do that. But, uh, anyway, I look forward to you using it and giving me feedback on it. And, uh, I don’t know, I think, I think that’s just about it for this one. So, um, yeah, um, again, very excited over here in, in darling data land. Uh, so I hope you enjoyed yourselves. I hope you’ll use the script. I hope you’ll enjoy using the script. And, uh, again, if you, if you run into anything with it, uh, the, the place to go or rather the URL to go to is, uh, code.erikdarling.com. This redirects to my GitHub repo. You can, uh, you know, you can open up issues with the code. Uh, you can, you know, uh, propose contributions to the code, ask questions about it, whatever it is you need to do. That URL will lead you to GitHub where you can do all of those things and get support. Do not email me for support on this. I do not do email support at all, ever. I hate it. I will tell you to go to GitHub. Do not email me directly. Okay.

Okay. Anyway, uh, I think that’s about it here. Uh, all right. Yeah. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.



2 thoughts on “Introducing sp_IndexCleanup!

Comments are closed.