Everywhere You Look
Finding out about this was probably like when some probably guy probably in a wig or something discovered atoms and was like “ohh wowwe therre are’st atommes ev’rywhislt I doth looketh, happie dais!”.
If that’s not historically accurate, I apologize. I’m a child of divorce and public school systems.
If you want a quick and dirty way to see if your server is having a tough time with checkpoint activity, you can run this to search your most recent error log for relevant messages.
- If you recently restarted your server, you may need to search additional error logs
- If you cycle your error logs frequently, you may need to search additional error logs
- If your server is associated with Cloud Nonsense© these commands may not work
CREATE TABLE #ErikLog ( LogDate datetime, ProcessInfo varchar(50), Text varchar(MAX) ); INSERT #ErikLog WITH(TABLOCK) ( LogDate, ProcessInfo, Text ) EXEC sys.xp_readerrorlog 0, --Just the first log file 1, --SQL Server error log N'FlushCache'; --Search string INSERT #ErikLog WITH(TABLOCK) ( LogDate, ProcessInfo, Text ) EXEC sys.xp_readerrorlog 0, --Just the first log file 1, --SQL Server error log N'saturation'; --Search string SELECT el.* FROM #ErikLog AS el ORDER BY el.LogDate; DROP TABLE IF EXISTS #ErikLog;
You can adjust the first parameter there to look back at older log files, too, if you need to.
So what happened? Was it everything you imagined?
Did You Find Anything?
If you got some hits searching through the error log, it may be a sign that you need to make a change.
- Who can change this? You!
- What change? Target Recovery Interval!
- Where do I change it? At the database level!
- When should I change it? Whenever, my friend!
- Why database level and not at the server level? If your databases are in an AG or something, you want the change to follow them to other servers!
- How do I change it? Like so!
DECLARE @sql nvarchar(max) = N''; SELECT @sql += N'ALTER DATABASE ' + QUOTENAME(d.name) + N' SET TARGET_RECOVERY_TIME = 60 SECONDS;' + NCHAR(13) FROM sys.databases AS d WHERE d.target_recovery_time_in_seconds <> 60 AND d.database_id > 4 AND d.state = 0 AND d.is_in_standby = 0 AND d.is_read_only = 0; SELECT cmd = @sql;
Why Are We Doing This?
Long checkpoints can cause all sorts of weirdball issues while they’re happening, and they don’t just happen when you run a checkpoint command.
Log and differential backups cause them, as well as internal stuff around recovery time.
- SQL 2016 – It Just Runs Faster: Indirect Checkpoint Default
- SQL Server : Large RAM and DB Checkpointing
The good news is that this is the default for databases creating on SQL Server 2016 or better. The bad news is that most of you out there started creating your databases way before that, and you’ve just been restoring them to newer versions without checking out any of the hot new gadgets and gizmos available.
Not that I blame you. Stuff like this doesn’t exactly make headlines, and you’re busy with important stuff, like hemming and hawing over fill factor.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
Have you seen any issues making this change to a high OLTP environment?
My hesitation so far changing it is from MS’s warning about it, but I don’t know how prevalent the potential issue is.
The warning is here: https://docs.microsoft.com/en-us/sql/relational-databases/logs/change-the-target-recovery-time-of-a-database-sql-server?view=sql-server-ver15
Nope. Most the environments I make the change in are pretty high throughput OLTP anyway (usually with reporting on top).
Okay good to know thank you Erik!
good series, one thing that might be worth adding is checking tempdb has multiple files to avoid allocation contention issues
I don’t see many servers where there aren’t multiple tempdb files these days, but I do see plenty where the sizing is all screwed up. The thing is that fixing file sizes is kind of a pain without a reboot. I know there are ways to do it, but they often involve impairing prod servers in various ways: dropping buffers, clearing proc and system caches, etc. I don’t wanna tell people to run those.