Scarecrow
I don’t mean to put anyone off from using batching to make large modifications more efficient.
In this case, more efficient doesn’t necessarily mean faster. You still have this complete unit of work to get through.
The whole point is to go a little bit easier on your server:
- Smaller transactions sent to the log file
- Shorter locking durations
- Less build up of the version store (optimistic isolation levels, triggers, etc.)
But there are thing that make batching impossible, or downright tricky.
Atom Bomb
Let’s say you need to update 10 million rows, and you’re using batches of 5000 rows.
What happens when one batch throws an error?
If you were running the whole thing as a single query, the whole thing would roll back. Depending on how far along you’ve gotten, that could be pretty painful (especially for you suckers that don’t have Accelerated Database Recovery enabled).
That also kind of sucks… depending on the error, too. Not all of them mean the update violated some database rule you’ve set up via constraints.
But all that aside, there are some circumstances where maybe the entire thing should fail and roll back.
In those cases, you’ll need something that keeps track of the “before” rows. There are ways to do this within SQL Server that don’t require you to program anything:
- Change Data Capture
- Change Tracking
- Temporal Tables
If you can afford to use those, it might be a pretty “easy” way of tracking changes to your data so that you can roll it back.
But it’s up to you to write code that uses any of those things to find the old values and do the ol’ natty switcheroo.
Batchy Bomb
It is possible to save off all the rows you’ve change to another table, and then reverse the changes.
The table load and setup script is here. To use them, you can do something like this.
Then you can use Michael’s batching code, with an OUTPUT clause to your “save” table.
/*Batch update with output*/ -- From https://michaeljswart.com/2014/09/take-care-when-scripting-batches/ DECLARE @LargestKeyProcessed int = -1, @NextBatchMax int, @RC int = 1; WHILE (@RC > 0) BEGIN /*Starting place*/ SELECT @NextBatchMax = MAX(x.id) FROM ( SELECT TOP (1000) aia.id FROM dbo.as_i_am AS aia WHERE aia.id >= @LargestKeyProcessed ORDER BY aia.id ASC ) AS x; /*Updateroo*/ UPDATE aia SET aia.some_date = DATEADD(YEAR, 1, aia.some_date), aia.some_string = aia.some_string + LEFT(aia.some_string, 1) OUTPUT Deleted.id, Deleted.some_date, Deleted.some_string INTO dbo.as_i_was (id, some_date, some_string) FROM dbo.as_i_am AS aia WHERE aia.id >= @LargestKeyProcessed AND aia.id <= @NextBatchMax; /*Enhance*/ SET @RC = @@ROWCOUNT; SET @LargestKeyProcessed = @NextBatchMax; END;
The only thing I’m doing different here, aside from my own tables, is using OUTPUT to dump the prior row versions into a new table.
That way, if a batch fails, I can roll things back.
Witch Errors
One thing that you should figure out is which errors you want to guard against. Lock and deadlock errors are common ones.
You can do something like this:
DECLARE @do_over tinyint = 0, @game_over tinyint = 5; WHILE @do_over <= @game_over BEGIN BEGIN TRY SET NOCOUNT, XACT_ABORT ON; UPDATE dbo.as_i_am SET some_string = REPLICATE('ack', 1) WHERE id = 138; /*do a thing*/ BREAK; /*exit loop if thing is successful*/ END TRY BEGIN CATCH IF ERROR_NUMBER() IN (1204, 1205, 1222) /*lock and deadlock errors*/ BEGIN SELECT @do_over += 1; WAITFOR DELAY '00:00:10'; END; ELSE BEGIN; /*log some details to a table here*/ THROW; END; END CATCH; END;
This is somewhat incomplete pseudo code, but it looks good enough to blog. Don’t judge me too harshly, just use it to get where you need to go.
Other errors, like primary key, foreign key, constraint, string truncation, and others that indicate data quality issues shouldn’t be retried.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.