Sitting In A Tree
Transactions and error handling often go hand-in-glove, to make better use of XACT_ABORT
, manually manage rollbacks, and try to make the best use of all those ACID properties that database people seem to care deeply about.
The main things to understand about transactions in SQL Server are:
- There’s no respect for nested transactions
- The default read committed isolation level can be a bad time
- Not everything requires an outer transaction
The lack of nested transactions has been well-documented over the years. Though savepoints do exist, they’re not really the same thing.
For the remaining points, let’s talk a little bit more.
Blocking, Blocking, Blocking
This is a big enough problem with singular modification queries under the default Read Committed pessimistic isolation level.
Trust me. Roughly half of my consulting life is looking at the blocked process and deadlock reports.
If you’re going to take it upon yourself to involve multiple queries in a transaction, it also falls upon you to make sure that:
- You use the appropriate locking hints
- You handle any errors and rollbacks
- You make sure that thing is as short and sweet as possible
Everything you do between BEGIN TRANSACTION
and COMMIT TRANSACTION
will hold whatever locks are required or specified until the bitter end.
Think of the duration of a query, and then the duration of a transaction as a whole, as the surface area.
The larger that surface area gets, the more opportunity there is for it to interfere, or be interfered with, by another surface area with a similar trajectory.
You also need to really understand the isolation level in use, and how that can make results weird.
Let’s take this piece of pseudo code, and assume it’s running under the default Read Committed pessimistic isolation level.
BEGIN TRANSACTION DECLARE @UserToPromote integer = NULL; SELECT TOP (1) @UserToPromote = u.Id FROM dbo.Users AS u WHERE u.Reputation = 1 AND EXISTS ( SELECT 1/0 FROM dbo.Posts AS p WHERE p.OwnerUserId = u.Id AND p.PostTypeId = 1 AND p.Score = 0 ) ORDER BY u.CreationDate, u.Id; /*IRL you might bail here if this is NULL or something*/ WITH UserToPromote AS ( SELECT TOP (1) p.* FROM dbo.Posts AS p WHERE p.OwnerUserId = @UserToPromote AND p.PostTypeId = 1 AND p.Score = 0 ORDER BY p.Score, p.CreationDate ) UPDATE utp SET utp.Score += 1 FROM UserToPromote AS utp; COMMIT TRANSACTION;
Leaving aside some of the obvious stuff that a professional performance tuner would avoid (like local variables), and ignoring the fact that I haven’t done any error handling, what’s wrong with this code?
- We only get the first user whose data was currently committed in the Users table
- … And who has a question in the Posts table
- Then we try to update a row in Posts for that user
What that leaves out is:
- The user could delete their profile after we find them
- Someone could vote on their question after we find them
- They could delete their question after we find them
Read Committed is not a consistent snapshot of your data during a transaction. In a highly transaction environment, all sorts of things can change right underneath you.
All Read Committed guarantees is that the data you read was committed at the time it is read. Quite flimsy once you think about it for a moment.
If you want to avoid those changes, you’d need to add hints like UPDLOCK
and SERIALIZABLE
to your select query (and exists subquery) to prevent those changes until the update completes.
Even if you were to do all that work in a single query rather than two, you could run into the exact same issues without those locking hints.
Once those read cursors find the row(s) they want, anything goes until the exclusive locks start being taken.
Don’t Transact Me, Bro
Let’s look at some more psuedo code. It’s a pattern I’ve noticed with more than a few clients now.
BEGIN TRY BEGIN TRANSACTION INSERT dbo.HighQuestionScores ( Id, DisplayName, Score ) SELECT u.Id, u.DisplayName, p.Score FROM dbo.Users AS u CROSS APPLY ( SELECT Score = MAX(p.Score) FROM dbo.Posts AS p WHERE p.OwnerUserId = u.Id AND p.PostTypeId = 1 ) AS p WHERE NOT EXISTS ( SELECT 1/0 FROM dbo.HighQuestionScores AS hqs WHERE hqs.Id = u.Id ); UPDATE hqs SET hqs.LastQuestionBadge = b.Name FROM dbo.HighQuestionScores AS hqs CROSS APPLY ( SELECT TOP (1) b.Name FROM dbo.Badges AS b WHERE b.Id = hqs.Id AND b.Name LIKE N'%Question%' ORDER BY b.Date DESC ) AS b; COMMIT TRANSACTION; END TRY BEGIN CATCH IF @@TRANCOUNT > 0 BEGIN ROLLBACK END; /*Do some error stuff maybe*/ THROW; END CATCH;
The idea of the code is to insert any new users to the HighQuestionScores table. For the sake of completeness, let’s say there’s another query in the mix that would update the high score for existing users too.
It’s just invisible 👻
The problem here is also fairly obvious. There’s absolutely no reason to ROLLBACK new users inserted into the table just because the ensuing update query fails for some reason.
Let’s say that the Badges table was recently modified to accommodate a new, longer, badge name, but the HighQuestionScores table wasn’t.
We would get a truncation error, obviously. But that truncation error should not invalidate the new users inserted at all.
Likewise, if our invisible Score update query produced a bigger integer and failed trying to insert it into an integer column, it should not invalidate new users inserted.
It’s not their fault.
The bottom line: Transactions should only encapsulate queries where correctness would be effected by one failing.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
Related Posts
- Indexing SQL Server Queries For Performance: Unpredictable Searches
- Indexing SQL Server Queries For Performance: Equality and Inequality Predicates
- How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About OUTPUT
- How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About CTEs
Erik – What is the purpose of the 1/0 in the EXISTS subquery in the first code sample?
I always love reading your stuff. Thank you!
Well, you have to put something in the select list, but what you put in there doesn’t matter.
I see a lot of people doing goofy things, like adding TOP (1) or COUNT inside of (NOT) EXISTS subqueries for no good reason.
Thanks!