I get called in to help with a lot of interesting performance problems. I’m gonna say something that sounds harsh right up front: very few people are prepared for sudden issues.
After the fact (and usually after a reboot that clears out all the useful information) people start doing all sorts of things, like logging sp_WhoIsActive to a table.
Consider a couple things for a second though (aside from the fact that rebooting clears out all your important server metrics)
- Nothing captures blocking by default (you need the blocked process report for this)
- The system health session keeps limited information in it about things like deadlocks
- SQL Server doesn’t time-slice wait stats or any other metrics
- The plan cache is usually highly volatile for a number of reasons
With SQL Server, you need to prepare today for emergencies tomorrow.
One way you can get some of this stuff is by enabling Query Store on your important databases.
It’s not perfect, and there’s certainly a risk of introducing unacceptable observer overheard in environments with a really high rate of transactions.
But it’s a whole lot better than the plan cache for looking back in time at query performance issues, and starting with SQL Server 2017 we started getting high-level waits stats too.
It’s not a replacement for a monitoring tool, and there’s enough functionality missing that I wrote a stored procedure called sp_QuickieStore to search through it, but it’s better than nothing.
The odds of relevant query plans from a performance meltdown being in the plan cache several days later is nearly zero. People just do too many bad things that make it fill up or clear out:
- They don’t parameterize queries
- They turn on optimize for ad hoc workloads
- They don’t give SQL Server enough memory
- They stack dozens of databases or schema
SQL Server 2019 added a bunch of switches you can tweak to reduce the overhead and the queries you collect in Query Store, which is super helpful.
I hear a lot of people say they have SQL Server monitoring tools, but they either:
- Are really crappy and impossible to use
- Don’t really monitor SQL Server well
That doesn’t even begin to account for end-user issues, like:
- Broken monitoring tool installs not collecting data
- Software being way out of date, or…
- Them not not knowing what to look at because they’ve never opened it
But if you have a good monitoring tool collecting a variety of important SQL Server metrics, you have a much better chance of being able to root cause performance issues:
- Wait Stats
- Query Performance
- Disk Stuff
- Memory Stuff
- CPU Stuff
You get the idea — your monitoring tool should collect a wide enough variety of metrics that common problems don’t totally escape it — but probably not show you every single meaningless metric that you can possibly collect.
As a consultant, I love having these things to comb through to either troubleshoot a past performance issue, or just find candidate queries to tune for clients.
Without them, it makes my job a lot harder — or even impossible sometimes, when data is gone for good — and I have to spend a lot of time waiting for an issue to crop up again to see what was involved, or do a bunch of generalized performance tuning in order to hopefully avoid future issues.
In tomorrow’s post, I’m going to talk about some of the common tools and scripts that I use while consulting (though not how to use them). If you’re talking to anyone in the SQL Server consulting space, these are good tools to have available.
Thanks for reading!
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.