A Little About Fill Factor And Fragmentation In SQL Server

A Little About Fill Factor And Fragmentation In SQL Server



Thanks for watching!

Video Summary

In this video, I dive into the often misunderstood concepts of fill factor and fragmentation in SQL Server indexes. Drawing from my experience working closely with clients who rely on outdated practices, I share how setting a 32-bit mindset aside can lead to more efficient database management. I explore why regular index rebuilds might not be as necessary as many believe, especially given modern hardware capabilities that reduce the impact of fragmentation and improve overall performance. Through practical examples, including queries and index maintenance scripts, I demonstrate how fill factor settings can significantly affect both logical and physical page density, and explain why focusing on densely packed pages is crucial for optimal query performance.

Full Transcript

Erik Darling here with Darling Data. Two big capital D’s on the Darling and the Data. Just in case anyone at home was wondering. I’m a little blurry here. Is that too much? That’s too much. There we go. Now we’re nice and crisp. Now you can see the slight blemish on my face, which I’m very embarrassed about. Anyway, in today’s video, we are going to talk a little bit about fill factor and fragmentations. Because I had to talk about this with a client, the nice people who pay me to keep producing this free content. If you would like to hire me so that I can share my wit and wisdom with you and keep making free content, well, you can go to my website and contact me. I suppose that’s probably the best way to do it. I don’t know. I don’t take PayPal or Venmo or Bitcoin or really anything else. Just cash. Cash on the barrel head, pal. So this is a tricky subject because you have to talk a lot of people out of having a 32-bit mindset when it comes to these things.

Because unfortunately, it seems like everyone got their advice about things like fill factor and fragmentation from 20 years ago. And they just refused to change their mind about it. They cling desperately. They bitterly cling to these notions that these are still terribly relevant metrics to apply to SQL Server from a query performance perspective. And there are lots of times when I say to people, well, who’s responsible for tuning indexes at this company? And they say, oh, we have a script that rebuilds them. And they say, great. So who’s responsible for tuning indexes at the company? Because rebuilding indexes is not tuning indexes. Neither is reorganizing indexes, to be quite frank with you. So I’ve got this query here. So I’ve got this query here, which is going to show us all sorts of detailed stuff about a particular index that exists on the users table.

And I’m going to show you some funny things about the old frog lamentation. So my learned opinion, my experienced opinion, which, you know, perhaps anecdotal, but, you know, after many years of anecdotes, I’ve kind of just come to the conclusion that I’m right, is that index rebuilds should be reserved for special circumstances. Like, let’s say that you’ve deleted a lot of indexes. Like, let’s say that you’ve deleted a lot of data from a table. You may want to consider rebuilding indexes to densify all your data pages.

If you need to change something about the indexes. Some index changes do require rebuilding to apply those changes to them. And I guess it’s not technically an index, but if you have a heap that has a lot of forwarded fetches or has a lot of deletes against it, you may want to consider rebuilding that heap to get rid of forwarded record pointers and potentially empty data pages. Remember that when you delete data from heaps, empty data pages are not guaranteed to be deallocated from them.

In that situation, you probably just want to create a clustered index, though, anyway. Now, I started off the way a lot of other people did with the index maintenance stuff. And I was like, you know, we’re going to do it. Ride or die, we’re going to do it.

And so, you know, this was back around, oh, I don’t know, 2008, 2009. Or I don’t know, I guess given computer hardware back then, it made a little bit more sense. But, you know, there was a lot of talk about fragmentation.

And, you know, a lot of people sort of seemed to all say that it was bad. And so, you know, I didn’t know anything. So I said, you know, if all these smart people are saying it’s bad, it must be terrible. So I’m going to go ahead and rebuild this stuff.

And, you know, I guess without really any sense of how to measure for improvements or anything like that, it just seemed like, well, it seemed like it’s maybe, maybe it’s fixing stuff that I don’t know about. Maybe if I stopped doing it, I would just suddenly start having all these problems.

And I would have to start doing it. And I would feel like a real eggy-faced idiot because I stopped doing it and I introduced problems into my SQL Server workload. And then, you know, slowly kind of coming to the conclusion that even though I was doing these index rebuilds regularly, I was still having all sorts of regular performance problems.

There were things that would still go wrong. I would still have slow queries. I would still have all the normal stuff that goes terrible with a generic SQL Server workload. There was still lots of blocking.

There was still lots of deadlocks. There was still parameter sensitivity. There were all these things that would happen regardless of my index rebuilds. And, you know, eventually databases started getting to kind of a size where I no longer had time during my maintenance windows for important stuff, like running dbcc checkdb.

And so, you know, you would kind of reschedule things a little bit. Stop. Maybe stop rebuilding, like, indexes at such low thresholds, like 5%, 30%. Maybe crank those up a little bit and maybe do some other stuff.

And then the less and less that I re… Well, I’m going to stop saying rebuilt. Less and less I did index maintenance, the more I realized that no one was actually complaining about anything new.

I still had all the same problems, but I had more time to do important things. And that’s kind of when it started kicking in that maybe index fragmentation is not my problem. Now, the reason this is 32-bit mentality is because when SQL Server was a 32-bit piece of software, if you can remember back that far into the years, the problem that you had was that you had, like, two or three gigs of memory available for user space stuff.

Now, granted, maybe not so many databases were so much bigger than two or three gigs way back then, you know, or tables or whatever. But that just wasn’t a lot of space for the buffer pool and all the other memory stuff that SQL Server has to do.

And worse, the storage, you know, the often direct-attached storage that was backing a lot of SQL Servers had not yet reached the flash and SSD realm of things. And so you had these spinning disks that just went round and round in circles.

And then if you were doing sequential I.O., all was fine and dandy. But as soon as you had to do random I.O., it was like a record skipping all over the place. And you had to have a little disk head jump up and down and all around in order to go find that data.

And, of course, when there’s a physical movement of a drive, when things have to really shuffle around to do random I.O., well, I mean, there’s certainly a penalty on that.

You don’t have those sorts of penalties with flash and SSD, and you certainly don’t have that penalty when you have a good amount of RAM on your system. So I’m going to talk about something that is connected to all that.

And that’s something that is another 32-bit piece of mentality, in that I still run into a lot of people who set, like, just without any testing or without any, like, real forethought or, you know, really any sort of idea of what the hell they’re doing, will just set fill factor lower and lower.

The even funnier thing that I run into is a lot of people who get fill factor reversed in their head. And they’ll think, I want to leave 20% free space on a page, so they set fill factor to 20, thinking that that will allocate 20% free space, when really it allocates 80% free space.

And so let’s look at some of this stuff. So I created this index on the users table on the reputation column. And we’re going to save these queries for a minute, for a minute from now. And I’ve cut out some of the numbers in between here.

I used to do 5, 10, then count by 10s up to 100, but you just don’t need to stick around for all that stuff. I’ve already been talking for, like, eight minutes, and you haven’t seen any action yet.

And gosh darn it, you deserve some action. Deserve all the hot SQL action you can get. So what I’m going to do is I’m going to rebuild this index on the users table with a fill factor of 5. That’s going to take a second to run.

And if we look at what this query returned before, average fragmentation in percent was 0, and average space used in percent was 99.9, with a lot of numbers after it.

Right? So pretty densely packed pages, doing pretty good there. Pretty happy with that. Now, if we rerun this, if we rerun this query, after setting fill factor to 5, this is very interesting.

In fact, this is one of the most fascinating things that you might ever see in your entire life. Average fragmentation, logical fragmentation, pages being out of order, is incredibly low. But look at the average space used in percent.

It is no longer 99.9, a lot of other numbers. It is 5. So one of the things that I want you to take away from this is that, you know, a lot of people will say, like, oh, well, you know, I’m going to run my index maintenance scripts, you know, once a week or once a month, just to, you know, catch stuff that’s fragmented and rebuild it.

The problem is, you’re not guaranteed to fix the right kind of fragmentation with that mentality. The right kind of mentality is that you should be checking for data pages that are not very densely packed. And you should be looking at those instead.

Because average fragmentation of percent, that’s logical fragmentation. That’s pages being out of order. That has nothing to do with your queries being slow. Average space used in percent can sometimes have an impact on queries being slow if they’re scanning entire indexes.

And I’ll show you exactly what I mean by that. So let’s come back over to this window. And let’s look at these queries.

So we run this. And we look at the results. These queries are very carefully put together to hit 1%, 10%, and 100% of the table.

Right? And if we look at the query plans for this, something kind of funny happens. The first two queries that do seeks are very fast.

There is nothing slow going on here. Then we have this query down here. And there’s something different.

There’s something different about this query. Notice we have an index seek into a non-cl index up here. But here we have a clustered index. Yes.

SQL Server has looked at the size of our nonclustered index and said, that’s huge. That’s bigger than the clustered index. I’m not going to use that one.

I feel very silly using that one. But we can remedy that. We can remedy that. We can fix that, you and I. And we can set fill factor now to 50%, which will reduce the size of the index.

One thing I want to point out over here is the number of pages in this index is 47,277, because we have 95% free space on every single data page. Right?

But if we run this and we set fill factor to 50, we’re going to cut the size of our index by, well, let’s just say in half, because that’s close enough. But look how many pages we have now. We went from 44,000 to 4,800.

Right? Which is pretty good. But these numbers are still very funny to me, because average fragmentation in percent, we are logically very unfragmented, but we are still physically quite fragmented.

And where the physically quite fragmented thing might come into play for you is with your buffer pool. Right? Because your buffer pool is where all these data pages get cached. And if you have a bunch of half-full data pages in your buffer pool, you may not be making the best possible use of your buffer pool.

You may be overly polluting your buffer pool. Now, granted, this is only about 5,000 pages, which is not the end of the world. This is a small index, small table. But this is what we’re going to work with, because I can run the rebuilds to different fill factors faster, so we can get through this video in a reasonable amount of time.

So, now with fill factor at 50%, fill factor at 50%, let’s run these queries again. Let’s look at if they’re any faster or slower.

Well, I don’t know. This one’s still at zero seconds, so this seek to a single row is still fine. This one’s a couple milliseconds faster, so reading, you know, 5,000 versus 40,000 data pages from memory didn’t really mess us up, did it?

It didn’t really do anything. This one, a little bit faster. Few fewer milliseconds on that one. But the big thing here is that when you’re seeking to data, index fragmentation and even page, well, rather, let’s just say, when you’re seeking to data, page density makes a little bit less of a big deal, especially for, you know, when you’re seeking to, like, small chunks of data.

If I were doing an index seek that read the whole table or whole index, it’s a different matter, but, you know, we don’t need to talk about that here. We’ve talked about that before in other videos.

So, now, it’s fairly rare for indexes to become physically fragmented to this point. But I would say that if you do find that you have indexes becoming…

Did that actually show up on the screen? No, it didn’t. I had some weird pop-up to show up, and it was very, very strange to me. But let’s say that you were…

If you were to run a query like this to look for indexes that are physically fragmented to that degree, and you found some, and they were on big tables.

I don’t mean little tables, like 5,000 pages. I mean big tables, like 50,000. I don’t know. Maybe is 50,000 even that big of a deal? Probably not.

500,000, probably. 500,000. If you had big tables with… And the page… Average page space used in percent was at, like, 50% or lower, I would probably encourage you to do something about that.

They may end up at that point again because of the way that data sort of tends to get naturally distributed in indexes over time. But it’s a good idea to fix that every once in a while, especially just to see if it comes back.

If it comes back, you’re wasting time by rebuilding it. If it doesn’t come back, well, then something weird happened at some point. Maybe you did do a big delete. Now, so let’s do this again at 80%, right?

And we fill this up to 80%. We’re going to reduce the size of the table again. So from 50 to 80, we’re going to go from about 5,000 pages to about 3,000 pages. We are now 80% packed full, and we still have very, very low physical fragment…

Or low logical fragmentation, right? We still have a bit of this, right? So I can’t remember if I stated it explicitly before, but all your index maintenance scripts out there in the world, it doesn’t matter if it’s OLA or whatever, are going to look at this number.

And this number and this number do not necessarily track. You can have completely out-of-order data pages that are densely packed full. Likewise, you can have data pages that have, like we just saw, 95% free space on them that are completely logically ordered correctly, right?

So all your index maintenance scripts that go and measure that average fragmentation percent number aren’t going to tell you about the page density number.

They’re not going to tell you about that. You’re not going to be affecting the right stuff. You have to very specifically go and look for this stuff if you want to fix anything of any meaning. So now let’s come back over here, and let’s look at these queries again.

I don’t know why I have the set statistics time stuff. I’m not looking at it at all. Maybe I was just looking at it. Maybe I was looking at that while I was testing things. I forget.

So now this one is still at zero seconds. This one improved by a couple milliseconds. Not anything to write home about. This could be like Windows Update was checking for something at the same time as the last one.

That’s how few milliseconds this thing really improved by. This thing got better by, I don’t know, maybe 10 milliseconds. I don’t even remember at this point. Eight or nine somewhere.

There wasn’t a big difference. And then if we bump this stuff up to 100, and we go look at things, and then just wait for that 20% to disappear, and we run this again, we’re now down to 2,400 pages, right?

Which about makes sense, because when it was at 50%, it was like 4,800 some odd. Now we’re at 2,400 some odd. And now we have very, very good page space used.

And we always had no logical fragmentation, because our pages were already in very good order. They were all very orderly pages, but that didn’t tell us when we had a lot of free space on them.

All right, and if we come run these queries again, we go and we look, and we see what is happening here. This one is still at zero.

This one is still at about 16. And this one is still in the 160s per millisecond. So changing the index fill factor, making SQL Server read more data pages from memory, didn’t really improve much.

It didn’t really mess with anything all that big. Again, this is a smallish table, so we don’t expect to see big drastic shifts and stuff going from like 40,000 pages to 2,000 pages. It’s just not that difficult on a modern CPU to read those different things.

So if you have real big tables, real big indexes, and you want to find some meaningful metric to maybe take some action on those indexes with, you’re going to want to look at this number right here.

You don’t want to look at this number. This number is out 32-bit mentality, that number. You want this number in the rectangle. You don’t want that number with the, I don’t know, sort of weird text drawn on it there.

There. So, some other, another funny thing about fill factor, there’s lots of F’s in these fill factor, fill factor facts, is that SQL Server does not respect fill factor when things are just in their normal operating window.

Fill factor gets set when you create an index, fill factor gets applied when you rebuild or re-organ index. For the rebuild, you can change fill factor too, which is nice.

But otherwise, SQL Server is just going to jam those pages with whatever data it can. It’s going to split pages whenever it needs to, and all that other good stuff. So, like, SQL Server is not pausing the middle of your workload to say, wait, wait, wait, wait, wait.

Eric said this had to be 80% fill factor. Stop filling up that data page right here, right now. Leave 20% full, go on to the next one.

SQL Server just doesn’t do that. That would be a complete waste of time. So, the main metric that will go down if you rebuild indexes that have low page density is reads.

Like I’ve talked about in other videos, logical reads are not a great metric to judge query tuning by. Right. CPU is, right? CPU is what you can tune to bring your cloud bills down.

CPU is how you can get your company to spend less money on the cloud and maybe even give it to you in the form of a raise or a bonus. I don’t know.

Maybe an underling. That would be nice too, right? Maybe you’ll get a secretary out of the deal. But CPU doesn’t really change much at all when you do this stuff.

Changing CPU is more making sure that you have the right indexes and you write your queries in ways that can take advantage of those indexes.

Sometimes even getting batch mode involved is a good thing there. More efficient CPU use. So sometimes when it might make more sense to rebuild than others, if you are somehow on old spinning disks in the year 2024, perhaps logical fragmentation would be an issue to you.

But if you’re just rebuilding indexes to try and fix a problem, you are unlikely to find any joy in that outcome.

It’s also very unlikely that setting fill factor to a lower number is going to have any meaningful difference on your workload. Everyone has a story from 10, 15 years ago about, Oh, I solved this problem by lowering fill factor and preventing page splits and blah, blah, blah, blah, blah, blah, blah, blah, blah.

These are like old war stories. It’s like, I don’t know. It’s like, you said that, I don’t know, you rode a horse and you stuck a spear down the barrel of a tank’s cannon and the tank blew up.

Like, well, good for you. Nowadays, you would probably just have another tank and shoot the other tank. You wouldn’t, you would have a drone and drone strike the tank.

You wouldn’t, you wouldn’t be riding a horse up to a tank because that is not a very modern approach to warfare. Just being honest. It’s not. Kind of ridiculous. So, like, yeah, way before, you know, SSDs and stuff, maybe, maybe stuff like fill factor was a good idea and maybe stuff like logical fragmentation was a good thing to look at, but now it’s just, it’s just not where anyone should be focusing their time.

If you want to look at the big problems your workload is having, you have all sorts of better ways to do that. You have all sorts of better ways to solve those problems. Right?

There’s, there’s just, there’s, there’s no real replacement for good query and index tuning. It’s all, rebuilding indexes does not tune your indexes. It just wastes time and burns your SSDs out faster.

So, when you, when, when you’re looking at your servers or when you’re talking to various support outlets, maybe about a third party application and they start haranguing you about index fragmentation, I don’t know, maybe, maybe point them to this video.

I’m happy to work with software vendors on being less crappy to SQL Server. That’s, that’s, it’s a pretty cushy gig when you do that because you get to tell a whole lot of people that they’re wrong about everything at once and, um, it’s far more effective.

Alleviating the masses of their, of their, of their opiates is a noble endeavor. So anyway, uh, this video has gone on a bit longer than I anticipated, uh, and, and, I don’t know, there’s a lot of green text there that I, I sort of skimmed over and my head’s blocking some of it in here.

So we’re not gonna, we’re not gonna bore anyone any further with that. So, uh, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. Um, if you like this video, thumbs ups is nice.

Comments is nice. Nice comments are nice. Right? Mean comments are not nice. Uh, if you like this sort of SQL Server content about fill factor and fragmentation facts, uh, then you can subscribe to this channel and you can join nearly 3,000, hang on, I have to double check my numbers now just to make sure I don’t lie to you.

Make sure that we stay honest in these videos. Um, let’s see, where is my channel? Uh, you know what? It’s like, like 3,550 something at this point.

So, uh, you know, however, however many other people. You can, you can join that lovely, lovely queue to get notified when these videos get published. And, um, I don’t know.

I’m gonna, I’m gonna go enjoy the air conditioning now. Uh, I finally got marital approval to put the air conditioners in. So, I did that and, uh, you know, I’m working hard for the SQL Server community.

Putting in big, heavy air conditioners to, so I can record these videos without turning into a sweaty puddle in front of you. I’m sure everyone appreciates.

Anyway, uh, thank you for watching. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.



7 thoughts on “A Little About Fill Factor And Fragmentation In SQL Server

  1. Tangentially related… Jeff Moden had a cool presentation a couple of years ago about how clustered index page density is affected by [n]varchar(max) and varbinary(max) columns. The moral of the story was to run EXEC sp_tableoption ‘.’,’large value types out of row’,1 ; after creating the table but before inserting any data.

    1. That was supposed to be EXEC sp_tableoption ‘schema.table’,’large value types out of row’,1 ; but I used angle brackets the first time and that didn’t render properly.

  2. What is your advice when you have a situation with a piece of software that has designed unique identifiers as clustered indexes and have identified that there are page splits and average page space issues causing the true fragmentation that you speak of? Is there a balance or sacrifice that would need to be made to do index maintenance more often or lower fill factor down from 100%?

    1. I think my goal would be to minimize queries using the clustered indexes for data access that requires scans, by adding nonclustered indexes that can do the job instead.

      Like I said in the video, that won’t really even mess up queries that seek into indexes.

  3. Other rule when it comes to maintenance that I kind of blindly follow and see being followed is updating statistics. eg. when some percentage of changes inside the table is made. Would you provide us with some wisdom concerning statistics update, please? Maybe at some future post?

Comments are closed.