Stop Worrying About Duplicate Statistics In SQL Server

Thanks for watching!

Video Summary

In this video, I delve into the peculiar behavior of SQL Server statistics and why deleting duplicate statistics might be a pointless exercise. I share my observations on how people often get fixated on minor performance tweaks that don’t significantly impact their server’s overall performance. By creating a simple utility table and running queries to generate various statistics, I demonstrate that SQL Server only updates one set of statistics even when multiple are created for the same column. This video aims to provide clarity on why focusing on duplicate statistics might be more about performing a micro-optimization than addressing real-world performance issues.

Full Transcript

Hey, it’s Erik Darling with that Darling Data Company who does the SQL Server Consultant Training and Education. And that’s him over there in the corner, losing his mind. In today’s video, which might be the second video I’ve recorded today, there’s no way for you to know. Good luck with that. We’re going to talk about, I mean, this is really just sort of like a funny behavioral thing. I see a lot of questions out and about in the SQL Server internet where people have this very strange fixation with deleting statistics. They’re like, I want to delete old statistics or I want to delete statistics based on something. Right? Like systems, duplicates, duplicates, like this thing, this problem they have where they just refuse to like, uh, buckle down and like do anything that might meaningfully help their server go faster. They get like micro fixated on these dumb things that aren’t going to help you. Right? It’s like when, like, when people have like the most basic ass performance problems and they’re like, gonna look at spit locks. You don’t like for what? Like you haven’t added a single index to any of these tables. Your queries don’t have where clauses. Like, uh, your SQL Server is a single core with four gigs of RAM. Uh, your VM admin hates you. Like you, you just don’t, you’re not, you’re not really gonna help anything by doing that. Um, if you’re, if you’re SQL Server is, uh, is tuned up to the point where you have time to sit back and think, I wonder if duplicate statistics are dragging me down. Uh, I want you to go on vacation or get a new job or maybe learn about a different database thing in the world. I don’t know. Like there, there are a lot of things that I would do, uh, rather than go looking for duplicate or I don’t know, like somehow try to figure out if statistics are going to be a little bit better.

or if the statistics are unused or not. And, um, and delete them because it’s, it’s, it’s just such useless performative garbage that, uh, I, I can’t take it seriously. Every time I see that question come up, I’m like, oh, you just don’t have no idea what you’re doing then, do you? You just, you’re just clueless in the world. It’s floating, floating on the ocean, wherever the breeze takes you. Um, so, uh, this is just kind of a strange little video about, um, like how SQL Server doesn’t update duplicate statistics. All right. So we have that to look forward to. So what I’m going to do, I’m going to create a little utility table called user stats. Uh, the definition of the table, um, the contents of the table have very, very little, uh, importance. Um, it’s just a, it’s just a table that I can kick around without worrying about like messing up data in the actual users table.

Because I need to run some updates. All right. So we just created this table. We just inserted rows into it. I think anyway, we should check the query plan to make sure something actually happened there. And it did. We put all 2.4 million rows in the users table, uh, into the user stats table, but just for a few of the columns, right? We only have ID downvotes, upvotes and account ID, right? That’s all we need for this one. So, uh, let’s look at statistics currently, right? Because this table has a clustered primary key on it. The thing is, we have not yet run a query that would cause SQL Server to generate statistics. Now, we created a table, and we created an index, and we loaded data into that index. If we had created the index after the fact, it would do a full scan, stat sampling of the data in that column, and we would have something here.

But, because we have not queried the table in a way where we had the index first, the data load second, we haven’t run a query to say like where ID equals one. SQL Server has not generated statistics for this index. All right? So, but this is not the, this is not the statistics object that we care about. We’re going to mess with a different column, and we’re going to run this query, where we’re going to be looking for, uh, account IDs within a certain range, and we are going to get this incredibly lucky number back. All right? Incredibly lucky. Uh, Chinese stuff, eight is, eight is a lucky number. Uh, when, when I lived in Chinatown, it was very funny, because like in, in American buildings, uh, the, we always skipped the 13th floor, and the buildings that were like built in Chinatown, they always skipped the fourth floor, because four is unlucky, but eight is very lucky.

So we have a very lucky number here. We have four eights. The only thing that would make this luckier is maybe a fifth eight, but I don’t know if there are rules around how many eights would be an unlucky number of eights, because I’m just not that culturally aware. Um, it was never really explained to me. It’s a little, a little strange. But anyway, uh, with that query run, um, and, uh, we can come back and look at this. And now we see that we have, we still have nothing on the primary key, which is okay.

We are never going to have anything on the primary key, because we’re not going to query and filter on the primary key. But now we have this system statistic called WASIS-OOF-47-BIF-90. Uh, it has all those rows in it. Uh, not all of those rows were included in the sample. Right? That’s a much smaller number than that. And, uh, it has had no modifications against it. All good there. All Gucci all the way down.

Now, I’m going to run, uh, a couple updates against the account ID column, and I’m just going to mangle the hell out of it. All right. And that’s, it’s all well and good. This is going to run for a few seconds. And now we’re going to look at the same statistics thing. And now we can see that we have a whole bunch of modifications.

Now, the only reason why I have where one equals one at the end of these is because if I don’t do, like, I know that there’s an option in SQL prompt to, like, not get yelled at. But when you have, like, a modification query without a where clause, uh, I just think it’s funny to leave that on and have SQL prompt say, oh, where one equals one. Cool where clause. No warning for you. That’s just kind of amusing. Anyway, uh, where was I?

We have now a bunch of modifications against that statistics object. Right? So if we rerun that query, SQL Server will re-update. Oh, sorry. I don’t need to show you the query plan for that. I’m, it’s such a weird, uh, muscle memory thing that I look at the query plan for everything.

I didn’t need to show you that query plan. But if we look at the stats now, uh, we will see, um, some, some rows sampled. Uh, I think that’s a smaller number than before. And, uh, now we’re back to zero modifications because we just, uh, updated those stats, uh, when that query ran. And I guess if you needed any proof about when this video got recorded, there it is. Um, the time is a little bit off because of my server.

For some reason, all my Windows servers installed in Pacific time. I have no idea why that happened. I didn’t choose it. I know I could change it. I just choose not to. Okay. Great. So here’s where the duplicate stats thing comes in. Right? So let’s say we have this index. We’re going to create an index on account ID. Right? We already have a statistics object on account ID, uh, because that’s the thing that we’re querying. Right?

So SQL Server created a system stat on account ID. And now we’re going to create an index on account ID and a statistics object on account ID. We’re going to do both, both things. Right? And now we’re going to run this, uh, not that query. We’re going to run this query again to look at these statistics objects. And I want you to just note, uh, that, you know, we have, uh, now three duplicate things. Right?

We have the system stat on account ID. We have an index on account ID, which produces a statistics object. And we have a custom user statistic on account ID. Right? And, uh, the rows sampled for these are both going to be equal to the number of rows in the table because they were made with, I mean, the index is full scan by default.

And these create stats thing, I added the full scan option to that. So we got that there. Okay. Cool. Let’s run these updates again because these updates are going to take a little bit longer than, uh, before, because now we have, now we have to update the index on account ID and that slows us down a little bit.

Actually makes it, it makes it go twice as slow or half as fast. However you want to, however you want to, uh, however you want to call that. And now if we look at this, we’re going to see all three of these stats objects have a whole bunch of modifications against them. Right? All three of them have modifications. All three.

The thing is if I run this query, you get a count, which finishes pretty quick. And I’m going to remember here that you don’t need to see the execution plan for this count query. Good job, Eric, darling. You did it.

And now we go look at this. Well, what do we have here? SQL Server only updated statistics for one of those.

Great. Great. Love it. Love it. Love to see that happen because we know that SQL Server didn’t update three separate statistics objects, uh, on, on, on the same, the same duplicative field.

Uh, we, we are a little, we, we are allowed to be a little bit disheartened that, uh, SQL Server, we, we, we used to have this nice full sampling of the statistics down here, but now we have this, this boohist, uh, default sampling of the, of the, of the, of the, for the statistics object there.

Not that that’s the end of the world. You know, it just kind of sucks that you go from like the big full scan of stats to like the little default sampling of stats. Now I know that there are settings in SQL Server where you can say, no, no, no.

I want you to maintain like whatever sampling percent I choose every time stats are updated, even if they’re auto stats, uh, because, no, I, I demand that level of control. No.

So you do have that option available to you, whether you use that or not. It’s up to you. You might find a great use for it. You might never find a use for it. Uh, I don’t care unless you’re paying me. Then I care a lot.

It’s like faith no more. So when might duplicate statistics be something that you care about? Well, it’s, it’s really hard to figure out from a query optimization standpoint, um, why or when you might care about, uh, duplicate statistics existing. Uh, it doesn’t really add that much, really doesn’t, it doesn’t add a whole lot to the, the, the query optimization conundrum.

Uh, the one thing that might be kind of interesting is, uh, you know, doing statistics updates maintenance, uh, where, you know, I, I, I do, you know, reasonably agree that it would be maybe not the most beneficial use of time for you to update. Uh, so like, you know, like if you’re, like you’re using maintenance plans or older scripts, you know, and you say, hey, well, you know, look for things with modifications. These are modifications and these would get updated, whether they get updated and used or, uh, whether they take a long time to update and they mess up your maintenance plan window.

I don’t know for this table, definitely not, but in general, uh, it’s just, it’s just really not going to make a difference, uh, to, to the general, general query performance on your server. If you go around and start getting rid of duplicate statistics, SQL Server doesn’t, is, is a little bit smarter than that.

Uh, at least, at least in this one regard, which is, which is, I guess, a nice regard to be smart in. Um, I guess if you’re, if you’re really concerned about maintenance, um, you could, if you wanted to, um, well, I mean, I guess, I guess you could delete duplicate system statistics, let SQL Server recreate any that it might need.

But, uh, if, if, but that would only really make sense on, like, gigantic tables where, you know, um, you know, updating those statistics with, uh, with, with, with, even, geez. The, the, if it’s, if it’s still really slow with the default sampling. Yeah, sure.

I guess. But, like, if you’re using full scanning, you’re like, well, it’s slow. You’re like, well, maybe, maybe if you’re going to be such a control freak that you need to do a full scan on everything, you should be picking specific statistics to do the full scan on rather than just saying, hey, store procedure, go find anything that’s been modified and update it.

Take a little control. Take a little more control, you freak. Take a little more control.

And maybe, maybe I’ll just take the weekend off. Wouldn’t that be nice? Me just hanging out.

So anyway, uh, thank you for watching. Um, I hope you enjoyed yourselves. I hope you learned something. And, um, I will see you in another video at another time.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.