A Little About How Overindexing Can Hurt SQL Server Performance

Video Summary

In this video, I delve into three unexpected ways that over-indexing can negatively impact SQL Server performance—far beyond just slowing down inserts and updates. I explore how having numerous unused or redundant indexes can lead to inefficient buffer pool utilization, causing more frequent disk I/O operations as the same data is repeatedly cached in memory. This not only wastes space but also introduces variable query performance due to constant swapping of data pages between memory and disk. Additionally, I demonstrate how extra indexes can trigger earlier lock escalation attempts during transactions, leading to increased locking overhead on both tables and indexes. Lastly, I showcase how transaction logging can be unnecessarily burdened by redundant indexes, resulting in more extensive log activity for the same operations. With practical demos, I aim to provide clear insights into these often-overlooked performance pitfalls.

Full Transcript

Erik Darling here with Darling Data, of course. And today we are going to talk about three ways that over-indexing can hurt SQL Server performance. Now, I don’t mean it in the sense that a lot of folks think about it. We’re like, oh, I added too many indexes. Now my inserts are slow. Or now my updates are slow. Or now my deletes are slow. Because sure, that can happen. But that’s the first meme that everyone throws out whenever they think about over-indexing. Now, by over-indexing, I don’t necessarily mean the quantity of indexes. What I mean is the indexing strategies that I see a lot while working with clients, where you’ll either have, you know, like a single column index on every column in the table. Or a bunch of indexes that have been sort of ad hoc over the years that no one has ever sort of reanalyzed to see if they are still of any use or validity. Some indexes could have been created to support parts of the application that aren’t parts of the application anymore.

Or someone could have added another index later on that SQL Server started choosing and using over another index. And you just see either index usage patterns, like drop off for some index, like have totally like zero read indexes. Like these indexes just aren’t helping queries go any faster.

And you still have to keep those indexes in sync whenever you modify the base table, because if you didn’t do that, your data would be corrupt. And also creating indexes on sort of like the same sets of columns over again, over and over again, rather. You know, like, you know, column A, column B, include D, then like, you know, on column ABD, and then, you know, column ABC, and then, you know, you know, stuff like that, where you just see like the same columns indexed over and over again.

And, you know, like index on column A, index on column A, B, C, index on column A, B, C, include column F, G, H. I know all the letters of the alphabet. I can do this all day, all day, standing on my head. But three ways that are sort of unexpected when I talk about them with the folks I work with are how having lots of unused and duplicative indexes can hurt the buffer pool.

How it can lead to earlier lock escalation attempts. Remember that lock escalation is something that is attempted but not always granted. And, of course, transaction logging.

And I have demos to prove it all. I brought receipts. So, the first thing we’re going to look at is how indexes, how indexing, rather, can have you make less efficient use of your buffer pool in SQL Server. The buffer pool, of course, being where SQL Server stores all those lovely data pages that queries want to work with.

SQL Server being a mildly intelligent database. It doesn’t work with pages directly on disk. Any query that wants to read data or modify data, those data pages have to go in memory before SQL Server will start dishing them out to queries to work with.

So, and this goes for even unused. So, this is more about indexes that are read from. So, if you’re indexing sort of the same columns, like, over and over again with the, like, slightly different includes, stuff like that.

It even, like, you know, the same key columns just sort of, like, tacked on at the end over and over again. You will end up with a lot of the same data in memory over and over. Now, this is a problem because most of the servers I look at, and I’m talking specifically to you, I’m pointing at you, I’m saying you, probably do not have enough memory in your SQL Server to cache the relevant data to your workload in memory.

And having the same data effectively, remember, every index is a completely separate structure on disk, and so it’s a separate structure when you read it up into memory. Having the same data indexed over and over again means that you have the same data up in the buffer pool, which means you have less room for other things in the buffer pool. And that can generally lead to a pretty bad situation.

We are constantly going out to, you know, those, well, I’m sure your SAN disks are great, but maybe your SAN network isn’t so hot. But you’re just constantly sort of, like, hot-swapping data for one table or index and for data in other table and indexes constantly. And that just introduces weird variable performance issues because you may have a query that runs very quickly when everything is in memory and then a query that takes, you know, a lot longer when you have to go out to disk and read a bunch of pages in.

So the first thing that I do for all of these demos, because sometimes I don’t always do them in the same order, is rebuild the user’s table so that all the data pages for it are nice and densely packed. I do some updates and other stuff across these that sometimes leads to the user’s table being in a strange state when I go to run the actual demos. So the first thing I’m going to do is tell you a little bit about the scripts that I’m using here.

So let’s actually create these indexes while I talk through those. I get those off and running. This shouldn’t take too, too long to create.

The user’s table is rather small, and I have a great home computer for this sort of stuff. So I have two helper scripts that I’m using for this one. One is called What’s Up Indexes.

It is not a terribly expansive piece of code. It just gets some general information about tables and indexes for a specific database. These are available on my GitHub repo.

The link to that will be in the description of the video. So if you feel like also clicking on that link, as well as clicking on like and clicking on subscribe, you can also go get the scripts to do this stuff at home. So if I run this, we’re going to get some very basic information about the indexes in the database that I just created here.

So we are in the Stack Overflow database. We’re looking only at the user’s table. And we have four indexes currently on the table, one being the clustered primary key, which is 348 megs and about 44,000 pages.

And then we have the three nonclustered indexes that we created, which, since they’re only on a subset of the columns from the clustered index, they are all smaller than the clustered index. But they add up to around like 212 carry the thing, maybe 213 megs total. Again, not huge, but, you know, I’m trying to do some quick demos here.

In real life, we would be looking at much, much larger things. But about 212 megs for all of the indexes. Now, the indexes that I have sort of display a pattern that I see quite a bit in client work, where we have one on reputation include display name website URL, one on reputation creation date include display name location, and one on reputation creation last access date include display name upvotes downvotes.

So three indexes that, I mean, they have slightly different includes, but the key columns all share, like, you know, some commonalities as far as, like, column order and what on all that stuff goes. So another help review that I have in here is called WhatsApp memory, which looks at pages in the buffer pool in which objects are responsible for them. And since I just created these three indexes, the entirety of the index is going to be in memory.

And two, since I had to read from the clustered primary key to create those indexes, like that was a data source for the indexes to create from, that whole thing is currently in memory. At the start of this test, what I’m going to do is run a checkpoint and drop clean buffers. I do it twice because I am a superstitious man.

But if we run that twice and we look at what’s currently in memory, there is absolutely nothing, which is great and wonderful. And boy, I just couldn’t ask for a better demo so far. So I’m going to run this.

I’m going to look at what’s in memory before I run my queries. Then I’m going to run these three queries that all hit the user’s table in a different way. And then I’m going to look at what’s in memory afterwards.

And the execution plans for this are not terribly important because you’ll see at the very end that we ended up reading good chunks of all three of those indexes up into memory. So the first one uses the index U1 right there. I guess I did win, didn’t I?

I won the SQL Server lottery. The second query uses U2, which is a band that I absolutely hate. And they are not New Wave, no matter what anyone says.

And the third query will, of course, use U3 right there. And note that even though we seeked into all of these indexes, we still had to read a pretty significant amount of the pages in. So this is what’s currently in memory.

We got, let’s see, a 53 plus a 47. So that’s 60, 100, about 130, 130 something megs of the indexes are currently in memory of the 212 megs that the indexes make up altogether. So we have three different indexes, nearly the same data in there.

And we have, you know, what I would consider to be sort of a wasted buffer pool space because of that. Now, when I see this when working with clients, usually I consider it part of my job to start consolidating indexes. I mean, like obviously dropping off unused indexes if the server’s been up long enough to make a sort of confident call on that.

But then, you know, consolidating duplicate and, you know, close by duplicate indexes so that we have fewer objects competing for space in the buffer pool. So how I would do that here, I mean, I wouldn’t run drop indexes on a client database unless they were really mean to me, didn’t pay their bills. But what we can do is we can consolidate those three indexes into one index that has the key columns that we care about and then the included columns that we, that sort of made, that all the three different indexes had in there.

Again, include column order doesn’t matter. They’re just window dressing for the indexes. We can put them in any old way we want.

And let’s run that same experiment now with just the single index and see how things turn out. So we start off with nothing in the buffer pool because we cleared everything out. All three queries run and use the u4 index, which makes me euphoric.

Kill me now. And all three are going to use that index because that index has all of the same stuff in there. Note that we did a seek into this index for all of these, starting with the first one.

Let’s just zoom back in on this. All three of these do a seek into the u4 index. That joke was u4 icky.

But now, rather than having, you know, 100-something megs from three different indexes sitting in the buffer pool, we have one index in the buffer pool that takes up about 70 megs. So obviously, we’re doing much better buffer pool utilization because of that.

Now, if you go look at this, remember that the three indexes combined were 213-ish megs altogether. This index all consolidated is 101 megs altogether. And so we have one much smaller data source that can service all three of the queries that we care about.

So, yeah, I don’t know. I guess that’s about that there, right? Cool.

So now we know that consolidating indexes, and that would include getting rid of unused indexes, can help us save space both on disk and in the buffer pool, right? Those precious disks that your SAN administrator gets paid billions of dollars to manage.

Billions. It could be an Oracle DBA making that kind of money. All right.

So the second thing that index overindexing, having too many unused and duplicative indexes around, can hurt is lock escalation. So we’re going to start off.

We’re just going to rebuild the user’s table real quick. And the first thing we’re going to do is in the transaction here, we’re going to update this table. I don’t even know why I keep that in there.

We don’t even look at the execution plan of this one. It’s a habit, I guess. And then we’re going to use a different helper object called what’s up locks. Since we’re not doing anything right now, there are no locks to assess in here. But once we’re in a transaction with this update, boy, howdy, there’ll be fireworks.

So if we run this with just the clustered index, and we look at what locks gives us, we can see that SQL Server has, well, like one intent exclusive lock on the object, which isn’t like a locking locking thing.

It’s like a pre-lock weight kind of. This weight would be blocked by other locks or by an object level shared lock on the table, on the object. But the real thing that we care about here are the granted exclusive locks on pages in the primary key.

So that’s just with one index. We end up with 3,394 pages locked. If we add in this index, and we run through the same thing again, we will see that SQL Server now has a new object to lock, and it takes out more locks, not like the time machine villains, but more locks as in like more locking on the index.

And we end up with some thousands of locks taken and granted, both still on the primary key and still on, come on, mouse cursor, do what I want. And now also on the non-cluster index that we just created.

Those are 4,002 key locks there. So we have key locks here and key locks there. Now, if we add in just a second index, which is, again, a situation I run into with clients a lot, just the same columns in a slightly different order.

And like they both kind of get used, but, you know, no one really knows which queries, you know, hit what, which queries are more important, all that stuff. You know, and this, I mean, this takes a little bit more analysis and domain knowledge to figure out, like if you can get rid of one of these.

But if we add in that third index, and then we run this, rather than locking a whole bunch of stuff separately, we now have one exclusive lock on the entire user’s object, and that has been granted.

So, again, lock escalation is only attempted. It is not guaranteed to happen. If other competing locks get in there, you may not see lock escalation happen. There’s some counters in some of the index DMVs with lock escalation attempts in there.

So those are, you know, interesting things to keep an eye on. But this is one way that, you know, having, you know, additional duplicative or even unused indexes on your table can hurt performance.

Because even, like I said before, even unused indexes need to end up in the buffer pool and need to be locked to modify when you modify the base table. SQL Server can’t play favorites and be like, well, you haven’t been read in a while.

We’ll just update you later. We’ll defer that update to some other time. It doesn’t, that doesn’t happen. It does not happen. And so that can end up being sort of a bad time. All right, cool.

So the third and final way that, you know, you know, over-indexing can hurt SQL Server is with transaction logging.

So what I’m going to do is just create all three of these. I’m going to do all this stuff in one big swoop. And then I’m going to show you what an update looks like from the transaction logs point of view.

So two things I want to show you. One is that there’s a reason for the checkpoint here. And the reason for the checkpoint is because after we create indexes, we have data about creating those indexes in the transaction log.

If I checkpoint everything in there, go away, SQL prompt. If I checkpoint everything in there and I go look back at the transaction log, we’re not going to, it’s going to be empty, right?

So we don’t have anything in there for this table. If I go and do this, so again, we’re doing the transaction in an update so that I don’t, I don’t, you know, we’re doing the update in a transaction rather so that I don’t have to worry about undoing any evils that I’ve done.

If we run this, we’re going to see all three of the nonclustered indexes that I created end up with a bunch of locks in the transaction log and also with a bunch of additional records in the transaction log.

Now, this will be true for, since this is an update, only indexes that have the columns that are being updated will end up in the transaction log.

If this was an insert or delete, then every index would be affected because, you know, deletes are every row in the table, inserts are every row in the table, even if they’re null or something, a new record is added, so you would see records in the transaction log for that.

Now, of course, if you have filtered indexes where, you know, the insert or delete didn’t touch the data that was, you know, not part of the filter, it wasn’t like inclusive of the part of the filter, then you wouldn’t see anything there, but, you know, that’s a little bit more of an edge case.

So, you know, we have all that. Anyway, those are three ways that I find people are rather surprised by that can hurt SQL Server performance by having too many unused and overlapping indexes hanging around in your database.

If you want to analyze your indexes, I, of course, recommend SP Blitz Index. It’s an open source tool that I contributed. Well, still contribute once in a while, too. That one’s sort of set in stone at this point, but I contribute a lot of stuff to that in the past that I’m rather proud of.

Of course, originally written by the lovely and talented Kendra Little, wrote a great post today about SQL Server Management Studio being the best SQL Server monitoring tool on the planet, which I heartily agree with because the rest of them are a bit iffy.

But anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And again, please like and subscribe and all that good stuff.

And I will see you in another video where we will talk about, well, I guess more SQL Server stuff. It’s the way this seems to go, isn’t it? Wish I knew more about something else so I could talk about that.

Anyway, thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.