Spawning Monsters
Here we go again, with me promising to blog about something later.
This time it’s an attempt to explain how SQL Server chooses which statistics to update.
It’s not glamorous, and it may even make you angry, but you know.
They can’t all be posts about…
*checks notes*
*stares into the camera*
*tears up notes*
*tears up*
*stares off camera until someone cuts to commercials*
And We’re Back
Let’s start with the query we’re going to use to examine our statistics.
SELECT t.name, s.name, s.stats_id, sp.last_updated, sp.rows, sp.rows_sampled, sp.modification_counter FROM sys.stats AS s JOIN sys.tables AS t ON s.object_id = t.object_id CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp WHERE t.name = 'UserStats';
Right now, the results aren’t too interesting, because we only have a statistics object attached to the Primary Key.
We’re not gonna touch that column. We’re gonna use another column.
This query will get system generated statistics created on the AccountId column.
SELECT COUNT(*) FROM dbo.UserStats AS u WHERE u.AccountId > 1000 AND u.AccountId < 9999 OPTION(RECOMPILE);
By itself, this isn’t very interesting. Let’s create an index, too.
CREATE INDEX ix_AccountId ON dbo.UserStats ( AccountId );
The index created statistics, too. With the equivalent of a full scan! See that rows_sampled column?
I mean, why not, if you’re already scanning the whole table to get the data you need for the index, right?
Right.
I’m gonna use a couple updates to flip values around.
UPDATE u SET u.AccountId = u.UpVotes + u.DownVotes FROM dbo.UserStats AS u WHERE 1 = 1; UPDATE u SET u.AccountId = u.UpVotes - u.DownVotes FROM dbo.UserStats AS u WHERE 1 = 1;
Don’t ask me why I swallowed a fly.
But the WHERE 1 = 1 is enough to get SQL Prompt to not warn me about running an update with no where clause.
Both stats objects have been modified the same number of times.
Let’s run our COUNT query and see what happens!
We can see that only the stats for the index were updated (and with the default sampling rate, not a full scan).
Now let’s create another stats object with FULLSCAN.
CREATE STATISTICS s_AccountId ON dbo.UserStats ( AccountId ) WITH FULLSCAN;
We’ll also go ahead and run an update again.
And then our COUNT query…
SQL Server took two perfectly good fully sampled statistics and reduced them to the default sampling.
This doesn’t hurt our query, but it certainly is annoying to see.
That’s why newer versions of SQL Server allow you to persist the sampling rate.
Latest and Greatest
A lot of the stuff people call “rocket science” about statistics options, like auto create and auto update stats, are there for a reason.
When you let SQL Server make choices, they’re not always the best ones.
Tracking this stuff down and understanding when and if it’s a problem is hard work, though. Don’t flip those switches lightly, my friends.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
Related Posts
- Why SQL Server’s OPTIMIZE FOR UNKNOWN Hint Hurts Performance
- SQL Server 2022 Parameter Sensitive Plan Optimization: A Missed Opportunity For PSP To Make A Query Go Faster
- What’s The Best Statistics Sample Rate For SQL Server?
- Trace Flag 3608 Disables Automatic Statistics Creation And Ghost Record Cleanup In SQL Server