The Art Of The SQL Server Stored Procedure: Batching Modifications

Command


The goal of batching modifications is to accomplish large numbers of row modifications with exacerbating locking problems, and being kinder to your server’s transaction log.

There is generally an appreciable difference in transaction time in modifying 1,000 rows and modifying 10,000,000 rows. Go try it, if you don’t believe me.

Probably the canonical post about batching modifications is written by my dear friend Michael J. Swart.

In many cases, that code is good to go right out of the box. Just replace the table and column names.

But you may have other requirements, too.

Control


One post I had an enjoyable time writing was this one, about how to track before and after values when batching updates.

I also talk a little bit at the end about how to retry modifications when they hit deadlocks, but anyway!

This is something most developers don’t think about at the outset of writing batching code: Should I keep all the prior changes if one set of changes produces an error?

Another thing you may need to think about is this: What if data changes in the course of your code?

Lemme give you a couple examples.

Champion


If you do this, which is a subset of my code from the above post:

SELECT
    @NextBatchMax  = 
        MAX(x.id)
FROM
(
    SELECT TOP (1000) 
        aia.id
    FROM dbo.as_i_am AS aia
    WHERE aia.id >= @LargestKeyProcessed
    ORDER BY 
        aia.id ASC
) AS x;

/*Updateroo*/
UPDATE
    aia
    SET aia.some_date = 
            DATEADD(YEAR, 1, aia.some_date),
        aia.some_string = 
            aia.some_string + LEFT(aia.some_string, 1)
    OUTPUT 
        Deleted.id, Deleted.some_date, Deleted.some_string
    INTO
        dbo.as_i_was (id, some_date, some_string)
FROM dbo.as_i_am AS aia
WHERE aia.id >= @LargestKeyProcessed
AND   aia.id <= @NextBatchMax;

Or even this:

/*Updateroo*/
UPDATE
    aia
    SET aia.some_date = 
            DATEADD(YEAR, 1, aia.some_date),
        aia.some_string = 
            aia.some_string + LEFT(aia.some_string, 1)
    OUTPUT 
        Deleted.id, Deleted.some_date, Deleted.some_string
    INTO
        dbo.as_i_was (id, some_date, some_string)
FROM dbo.as_i_am AS aia
WHERE aia.id IN
      (
          SELECT TOP (1000) 
              aia.id
          FROM dbo.as_i_am AS aia
          WHERE aia.id >= @LargestKeyProcessed
          AND   aia.id <= @NextBatchMax
          ORDER BY
              aia.id ASC                
      );

There’s a problem, isn’t there?

Isolation Levels


If you’re using:

  • Read Committed (Pessimistic)
  • Read Uncommitted (Anarchy)

Only the table that is either only locked when the update runs (first example), or only the instance referenced by the update (second example) will ever have exclusive locks taken against it.

The part of the operation in both examples that only reads from the table will take normal shared locks on rows or pages as the reads happen.

Even under Read Committed (Pessimistic), some lousy things can happen:

  • You can miss rows that are updated by other processes
  • You can see rows twice that are updated by other processes
  • You can include rows that have been deleted by other processes

Read Committed is not a point in time read of all the data your query needs.

Heck, even Repeatable Read (Quite Pessimistic) only locks rows as it needs them. That means rows can change ahead of your seek/scan position freely.

For really interesting cases, you might need to use Serializable (Perfectly Pessimistic) to do your batching.

And Stay Out


I ran into a client process recently that actually had to do this.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION
/*Updateroo*/
UPDATE
    aia /* You could also do this WITH (SERIALIZABLE) */
    SET aia.some_date = 
            DATEADD(YEAR, 1, aia.some_date),
        aia.some_string = 
            aia.some_string + LEFT(aia.some_string, 1)
    OUTPUT 
        Deleted.id, Deleted.some_date, Deleted.some_string
    INTO
        dbo.as_i_was (id, some_date, some_string)
FROM dbo.as_i_am AS aia
WHERE aia.id IN
      (
          SELECT TOP (1000) 
              aia.id
          FROM dbo.as_i_am AS aia /* You could also do this WITH (SERIALIZABLE) */
          WHERE aia.id >= @LargestKeyProcessed
          AND   aia.id <= @NextBatchMax
          ORDER BY
              aia.id ASC                
      );
COMMIT TRANSACTION;

If you’re deleting a small number of rows, and you have appropriate indexes in place to support your queries finding the data they need to modify, this isn’t painful.

Just be wary of:

  • Triggers
  • Foreign keys
  • Indexed views

Which can make life more interesting than most people care to deal with.

Now I know, a couple posts ago I told you that there’s very little benefit to error/transaction handling if you’re not going to do anything with them — and that I’ve left that out of this pseudo code — you’ll just have to forgive me my transgression here.

I had to save something for tomorrow.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.



One thought on “The Art Of The SQL Server Stored Procedure: Batching Modifications

Comments are closed.