Done Started Something
If you talk to anyone who bills by the hour for SQL Server, one of the first things they’ll ask you is what your wait stats are.
It’s not a bad question; it kills some time, and it lets them gauge how engaged you are with your server.
- Waits can sometimes indicate a severe bottleneck
- Waits can also be “high” on a server that has perfectly acceptable throughput
There are also some problems with wait stats, and performance counters in general: there’s no direct correlation between the counters and the queries that caused them.
Sure, Query Store in 2017+ tracks high level wait stats for queries, but I don’t run into a lot of people using Query Store still.
The other trouble with performance counters is that they just aggregate over time, with no partitioning either by database or time.
Brooklyn Basement
Some things to keep in mind when you’re looking at wait stats:
- How they compare to server uptime
- How many times the wait occurred
- How long each occurrence took to complete on average
If your script or monitoring tool doesn’t report this stuff, it’s leaving some really important stuff out of the picture.
The relationship to server up-time is important, because if all you have to work off of is percentages, you have no idea if your server is constantly grinding on these waits, or if they only happened for a few minutes.
As a thought experiment: say 100% of your waits are on PAGEIOLATCH_SH, and you have 2 hours total of those waits.
How does your opinion of those waits change if the server has been up for:
- 1 hour
- 4 hours
- 24 hours
- 96 hours
Typically, the longer a server has been up, and the less concentrated waits become, the less of a focal point they should be. There’s no magic ratio here, but for me, the closer wait times are to up time, the more I may want to look at them.
Longs and Shorts
Yes, there can be a mix of these, but there’s still two basic ways that wait stats rack up.
- Lots of waits that happen pretty quickly
- Fewer waits that last a long time
Obviously, the source of those waits will impact how much we care about them. Here are some common examples:
- You wait often on storage, but the waits are short (say <10 ms)
- You don’t wait often on locks, but the waits are long (say > 10,000 ms)
It’s pretty trivial to look at your wait stats
- Running queries store them in sys.dm_os_waiting_tasks
- In 2016+, session waits are in sys.dm_exec_session_wait_stats
- Completed waits are stored in sys.dm_os_wait_stats
I’m not gonna reinvent any wheels here, but I will reiterate a couple things:
- You can see which queries are generating waits while they’re running, but…
- If you’re not constantly logging that, it disappears on you
- Once waits are completed, they are faceless and blameless
- Waits in general can be far less valuable when workloads are inconsistent
Attentione!
Tomorrow, I’ll talk through some of the most common waits, and first-line ways to start solving them.
They may not solve every single problem, but they’re generally a good place to start.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.