Should An Optimistic Isolation Level Be The New Default For SQL Server Databases?

Posted on May 15, 2019May 16, 2022 by Erik Darling

Two Words

Alright, I’ve lied to you already. There are way more than two words involved, here.

I recently had breakfast (JUST BREAKFAST, GOD) with Microsoft’s Most Handsome Man™, and the topic came up.

Partially because there are such fundamental changes coming to SQL Server, and also because it’s already the default in Azure SQL DB.

If I had to name a redeeming quality of Azure SQL DB, that would be it.

Unboxing

I’ve seen pessimistic locking causing problems and heartache on many, many servers.

While it wasn’t totally the fault of the isolation level (query and index tuning was needed, and there was some other silliness), it shouldn’t take that kind of dedication to make It Just Run Faster©.

Possibly the worst side effect is people leaning heavily on reading dirty data (via read uncommitted/nolock) to avoid issues.

You can’t preach about the evils of dirty reads without offering ways to avoid blocking.

Yes, I’m looking at you.

You all cackle at seeing NOLOCK everywhere, but I don’t hear much about solving blocking problems without it.

Standards and Futures

Right now “other major database platforms” offer optimistic locking as the default.There are implementation differences, but the net result is the same.

Readers and writers don’t suffer locking contention, and only fully committed data is read.

SQL Server currently offers optimistic locking via Read Committed Snapshot Isolation (RCSI from here), and Snapshot Isolation (SI from here).

Right now, they both send row versions to tempdb. But in the works for SQL Server 2019 is a feature called Accelerated Database Recovery (ADR from here). The mechanism that makes this possible is a Persistent Version Store (PVS from here) that’s local to the database, not tempdb.

While RCSI and SI could end up sending quite a bit of data to tempdb, which could cause contention there without a lot of precautions and setup work, any potential contention from ADR would be localized.

I know, Microsoft has been addressing tempdb, too. Setup now configures tempdb in a mostly sane way, and trace flags 1117 and 1118 are the default behavior in 2016+. That should make optimistic locking a more comfortable setting for people, but no. No. I hardly ever see it in use.

Dependencies

Breaking changes are hard. I know, there are people out there who depend on pessimistic locking for certain patterns to maintain correctness.

The thing is, it’s a lot easier to have just these processes use pessimistic locking while allowing the rest of the us to not have to sprinkle NOLOCK hints every which where to get a report to run.

An example of a process that requires some locking hints is Paul White’s post about Sequence Tables.

At this point in time, if you’re implementing a queue in SQL Server, you should be the one learning about how to get the kind of locking you need to get it working. Normal people who just want their app to work shouldn’t be the ones staring at articles about isolation levels, locking hints, lock escalation, etc.

Get It, Get It

This is a good problem to get in front of. I’d be quite happy to not have to talk about the reader/writer aspects of locking anymore.

In the same way, it would be nice to not have to talk to users who are endlessly frustrated by locking problems, explain dirty reads, explain optimistic isolation levels, caution them against certain issues, and then tell them to have fun removing all those old lock hints.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

17 thoughts on “Should An Optimistic Isolation Level Be The New Default For SQL Server Databases?”

Jeff Moden says:

May 20, 2019 at 9:37 am

You speak of people cackling about not using WITH (NOLOCK) and not providing solutions otherwise and then talk about how RCSI and SI are the solution. Fortunately, you already know the reason why a lot of people haven’t implemented that as a solution because you wrote…

“While RCSI and SI could end up sending quite a bit of data to tempdb, which could cause contention there without a lot of precautions and setup work.”

The other and best fix has been to fix utterly crap code. 😀 😀 😀

I’m looking forward to 2019 to see how well PVS actually works but my warning would be that is shouldn’t be used as a veil to cover up the rotting meat known as crap code. 😉

As for MS making TF 1117 the unchangeable default for TempDB, that screwed me for some of the stuff I do. Even though there’s a “connect” item (no longer called that but the replacement is hard to remember because I normally don’t use that) on it, a lot of people don’t realize that IDENTITY INSERT causes all the data being copied to materialize in TEMPDB and if all your files are set to grow evenly, you end up with 1.6 worth of required space when trying to copy one 200GB table and you have 8 TempDB files because you end up with a 1.6TB TempDB, which is just stupid. The enforcement of the effects of TF1117 and not being able to temporarily disable it have screwed me.

Let’s hope that the “improvements” they’re making in 2017 thru 2019 (I’m currently using 2016 EE) don’t put the screws to me and others that have no so unusual big stuff to do.
Jeff Moden says:

May 20, 2019 at 9:43 am

p.s. And apologies for the structure and the occasional misspellings in my response… I’ve not yet had sufficient quantities of caffeine this morning and haven’t had a cigarette since Friday night and the crowbar that I’m chewing on keeps getting in the way.
1. Erik Darling says:
  
  May 20, 2019 at 9:49 am
  
  Good luck quitting. I stopped 10 years ago (pack a day for 15 years), and was basically delirious for the first week. I think I spent more money on chewing gum the next three months than I would have on cigarettes.
  
  Friends of mine who have quit more recently have switched over to vaping and tapering down the strength of whatever the hell is in those cartridges.
  1. Jeff Moden says:
    
    March 21, 2022 at 12:01 pm
    
    Update. Still haven’t had even a puff because I know that “I’m just a puff away from a pack a day”. Been there, done that. I haven’t been able to quit nicotine yer. Still using mint flavored nicotine lozenges. 😀
    1. Erik Darling says:
      
      March 21, 2022 at 12:23 pm
      
      I love those lozenges! I can only do the 2mg ones though; 4mg gets me a little too jazzed.
      1. Jeff Moden says:
        
        March 21, 2022 at 12:43 pm
        
        Same here on the 2MG thing. And the mint part of it is another addiction that I have to fight. It turns out that regular peppermint lozenges were a part of the reason for me going into AFIB the time I tried to quit a few years before that.
        
        I also miss going outside for a smoke. I’m trying to do that between lozenges but it’s not working. 🙁
Chad Miller says:

May 21, 2019 at 6:35 am

I disagree RCSI should be the default. I had the unfortunate experience of learning this lesson the hard way. A few years ago in an effort to improve performance I enabled RCSI on database which sees 35K transaction a second. Of course we did this in dev and qa first, but testing didn’t have the same variety and volume of transactions. After turning on in production we encountered performance problems. A Sev A case with Microsoft lasted one week until we turned off RSCI–system back to “normal”. Is this bad code, perhaps? Or it could be RCSI performs terribly on very high-volume systems and you should very cautious about enabling. The other scary thing, RCSI is required for AlwaysOn–won’t using that for this system.
1. Erik Darling says:
  
  May 21, 2019 at 6:39 am
  
  As a wise consultant once said: “I’d be happy to take a look…”
  
  🙂
2. Bojan Jekic says:
  
  March 21, 2022 at 11:07 am
  
  Hi Chad,
  
  Can you share with us what was a main problem when you enabled RSCI?
Alex Friedman says:

May 22, 2019 at 5:30 am

Frankly, I can’t really expect the devs to know how different apps behave differently in the DB after switching some parts to RCSI. Working with optimistic concurrency is very different than the default and well-known pessimistic way, and I’m 100% sure it will cause many cases of “wait, why are we processing the same rows again?!” / “why is the balance now negative?!” / etc.

I’d much rather handle occasional outages due to blocking and improve the queries and indexing (while teaching the devs to do it too) than risk the business processes doing unexpected stuff.

For a new company developing a new product and infrastructure, sure. But changing parts of existing apps is very risky, before even considering the performance implications.
1. Erik Darling says:
  
  May 23, 2019 at 6:09 am
  
  “Processing the same rows again” sounds exactly like the kind of queueing that should be handled via pessimistic locking. My point isn’t that it’s a perfect fit for every situation, only that it’s a far less painful fit in most situations. Everyone who does-this-specific-thing will disagree with me, but it’d be far easier for people doing-specific-things to say “set transaction isolation level read committed” instead of making the rest of the world wrestle with awful blocking issues.
gotqn says:

May 28, 2019 at 1:49 am

Well, knowing that the “Serializable” is also the default isolation level specified by the SQL standard, having Read Committed as default seems a little bit unfair. Now, many things are improved – from software to hardware and may be it is a time for a change (at least for the newly created databases).

The RCSI is the default for SQL database, so I believe soon we will see it as default for SQL Server database, too.
John Nelson #2 says:

June 18, 2019 at 2:55 pm

Paul White’s post about sequence tables isn’t really a thing anymore, is it? I mean, that article was from 2010 and MSSQL now has these sequence objects which kind of eliminates the need for manually handling all that, no?
1. Erik Darling says:
  
  June 18, 2019 at 3:00 pm
  
  Yes and no. I still see people use sequence tables fairly often, because they were implemented pre-2012.
  
  Sometimes, they can’t fully make the switch over because they still have to support older versions (or the code isn’t causing a problem, so they leave it alone).
  
  Mostly though, it’s a good example of how to use locking hints correctly for a specific situation, which I’d rather see be the norm instead of people needing to use NOLOCK.
  
  Thanks!
2. Aaron Bertrand says:
  
  February 4, 2022 at 11:51 am
  
  Also, a serialized sequence table has an advantage over sequence / identity / etc.: you can much better control gaps, if that is important. It shouldn’t be important – in most cases, it’s only important because someone wants it to be important, but there are some instances where a gap is a legal nightmare.
  1. Erik Darling says:
    
    February 4, 2022 at 11:57 am
    
    Just have a process that runs in a loop checking the max used id in each table against the max used id in the sequence table and reset it if there’s a gap right? 😭
    1. SR says:
      
      March 7, 2022 at 11:27 am
      
      Perhaps this is a bad solution, but I created a query to return the next unused ID in a table which could have gaps. We wanted 4 digit numbers, but someone blew that by manually picking 9999, so I couldn’t just use MAX(myKey) and overlook the gaps…
      The SQL Statement uses a self-join and looks like this:
      SELECT MIN(t1.myKey+1) FROM myTable t1 WHERE NOT EXISTS(SELECT NULL FROM myTable t2 WHERE t2.myKey = t1.myKey + 1)
      Cheers!