A Little About Why I Don't Like The Read Committed Isolation Level

A Little About Why I Don’t Like The Read Committed Isolation Level

Video Summary

In this video, I delve into the reasons why I believe that the read committed isolation level is not ideal for most SQL Server databases and advocate for using an optimistic isolation level instead. Starting with a simple example in my Stack Overflow database, I demonstrate how read committed can lead to inconsistent results due to its lack of a consistent view of data. By switching to read committed snapshot isolation, we achieve consistent query results every time. Additionally, I explore the blocking and deadlock issues that can arise when using read committed, especially with concurrent read and write queries. Through an example stored procedure, I illustrate how these queries can block each other, leading to deadlocks. The video concludes by recommending read committed snapshot isolation as a better default setting for all databases, offering practical advice on how to implement it effectively.

Full Transcript

Erik Darling here with Darling Data, fresh off the heels of a great weekend. And I’m going to duck down a little bit so that you can read the full screen here. It says, why I think that read committed… I messed up my green screen a little bit. That’s how fragile these things are. Why I think read committed is not a very good isolation level, and why I think you should be using an optimistic isolation level instead. So, we’re going to start off with a pretty simple case, where you can get strangely incorrect results. Now, the problem that the main three problems that I have with read committed, the default isolation level for all the sorry souls, except the sorry souls who have to use Azure SQL DB. The main problem with it is that it doesn’t give you a consistent view of the data. So, if you have a scan that’s moving along a table, and some modification comes along and updates something over here, that row can move to this side of the scan and the scan will completely miss it.

Likewise, you can have something that the scan reads here, and then after the scan reads it and releases its locks because reads don’t hold on to locks for anything. That’s why you can’t have lock escalation with the read. So, if you have a scan read a row here, and then an update says, oh, I need to change something with this row, and that row moves over here, you can read that row twice. So, there’s not good side effects from not having a consistent view of the data. This isn’t to say that you should haul off and use repeatable read or serializable, but it is to say that there are better options out there in the world, read committed snapshot isolation and snapshot isolation being the two of them.

I do have a slight preference for snapshot isolation, but as far as ease of use and getting queries to be agreeable, read committed snapshot isolation is the easier choice there. So, we’re going to create, oh, actually, you know what, we’re going to go back and we’re going to use the right database first. I just realized we’re in the wrong one for me to show you what happens when things get weird.

Now I have these two tables sitting around in my stack overflow database, look very unprofessional. So, I have this query set up to use this read committed lock hint because I already have read committed snapshot turned on for the crap database. That is already done, that is already in there.

And if we run this query and we look at the results, we’re going to see 110 for all of them. That’s because there is no other concurrent activity going on in the database. And if I take this tiny little query and I put this in a different window, this is just going to update things in a loop forever and ever so I can keep running this query and showing you how goofy things are.

So, I’m going to kick this off and we’re going to come back over here and we’re going to look at what happens when I start running this. So, just on the first execution there, you can see that not all these rows have the same value in them. All right, these all changed.

All right, that one’s 10, this one’s 30, 50, 70, 90, 410, 430. So, and if we keep running this, we’ll see that consistent inconsistency every time this executes. All right.

And if I just quote out this read committed lock hint and I allow my query to use the read committed snapshot. Oops. I allow the query to use the read committed snapshot isolation level.

We will get consistent results on every execution. All right. So, every time you run this, the numbers are the same down the line. We don’t get inconsistent results from our select query.

So, we don’t get weird results moving up and down, which is good. What we want is consistency in our data. All right.

So, that’s the first reason. The second and third reasons are things that I have talked about a few times before. Let me just fix that up a little bit.

Sorry about that, folks out there. Like I said, it’s an early morning for me and my third tiny cup of espresso is still brimming. So, please, please do forgive me. It’s a little bit like being jet lagged.

Except it’s just the entire weekend catching up with you at once. So, that’s that one. The other problems that I have with the read committed isolation level are because of two things that I run into when helping clients with deadlocks and blocking problems quite a bit. And that is when you have read queries, read queries and write queries executing concurrently in the same database where you’re using read committed as the default isolation level.

They can block and deadlock with each other. This is something that not a lot of people realize. This is also something that not a lot of people…

Well, I’m not going to say it. Many people have avoided these scenarios by slathering no-lock hints all over their queries. Even their modification queries.

Just update with no-lock. Okay. That’s a joke. But when you use no-lock, you don’t run into the same necessarily locking problems that you do when you use the default read committed isolation level. This isn’t an argument in favor of no-lock.

This is me making fun of people who use no-lock. Ha ha ha ha. You fools. And this is me saying that you’re better off using an optimistic isolation level where you won’t run into this sort of thing. So the first thing I’m going to show you is this.

Nice. So we have a store procedure here. And we are going to intentionally parameter sniff our store procedure. The first execution we’re going to do of it is going to be for a very small date range.

And let me turn on query plans. See, this one finishes quickly. Right here.

Look how fast this one finishes. All right. Some seven milliseconds or something. Pretty good. Query tuned to perfection. Except now we are going to get parameter sniffed to perfection.

And I just want to point out really quickly that this is SQL Server 2022 running in the 160 compatibility level. But the parameter sensitive plan optimization doesn’t kick in because our only predicate that involves a parameter uses an inequality predicate for the search. So the parameter sensitive plan optimization is only available for a quality predicate.

So we don’t get any help there. So one thing that I want to do is grab this and stick that in this window. And we have to go back over to Stack Overflow.

And I’m going to kick off a very slow version of this store procedure. So doing this for, I think, like a day, let’s call it. This is like the final day of data that exists in the Stack Overflow 2013 database.

If we do it for this date, we don’t find a lot of rows. If we do it for this date, I mean, technically we only need to go back to like 2007 or something. But this is just funny to me.

And it’s also a nice piece of SQL trivia. If you ever want to know why the low value for some date types in SQL Server is 1753.0101, you can search for the name Philip Stanhope. And you can find out way more than you ever wanted to know about calendars.

So I’m going to kick this off. And it’s going to run terribly slowly because it’s going to find a lot more data. And I’m going to run that update query. And we’re going to go look at SP who is active.

And we’re going to see that our select query is blocking our modification query. Here’s our select running happily. And here’s our update sitting around waiting to get these intent exclusive locks so it can do its updating.

This happens because of a sort of funny thing with the query plan. And this thing is done now, so I can show you that. And if you look at the execution plan, you’re going to see that this thing did run for 15 seconds.

And the reason why this blocked the modification query is this portion of the plan. We have an index seek. We have a nested loops join.

And then we have a lookup back to our clustered index. I recorded a video pretty recently about key lookups. So if you’re really unfamiliar with what those are, you’ll want to go watch that video. I promise it’s not hard to find.

It’s the one where I look really, really cute in an Adidas t-shirt. Anyway, let’s get back to business. So what happens is our modification query wants to update the votes table.

And it needs to update, at minimum, the clustered index so that we can change data there and then, you know, push those changes out to any other nonclustered indexes on the table that have the column in it that is being updated. So we have an index seek where we find the rows that we care about here. Every time we find a row that we care about here, we pass it to this nested loops join.

And that nested loops join goes and looks in the clustered index for those rows. Now, remember when I talked about the read committed isolation level not providing a consistent snapshot of the data? So you could miss or see duplicates of rows, right?

So what happens here is SQL Server does some locking magic in order to prevent those inconsistent phenomena or phenomenon, depending on how you speak English. I speak it incorrectly, so I’m actually not sure which one to use there. And what it does is it uses an object level shared lock to lock down the clustered index so that no data can change in that clustered index while it’s trying to locate rows that it’s finding from the index seek.

All right, so every row that comes out, we go look. And the entire time that’s happening, the clustered index has an object level shared lock on it, which is preventing those intent exclusive locks from the update from coming in. Cool.

We got that. So that’s how read queries can block write queries in the read committed isolation level. Another thing that can happen is we can run into a situation for absolutely similar reasons where read queries and write queries can deadlock with each other for sort of similar reasons. Now, I have a select query that runs in a loop and just looks for anything with the vote type ID of 8 in the votes table.

And what I’m going to do as soon as that index is done is, well, I can do this before the index is done. I’m going to stick this into this window here. This window is getting a lot of reuse, right?

Doing a good job recycling things here. So we have this update query that is going to just flip the vote type ID column back and forth for a single row in the votes table. Just one row.

All it’s doing is changing vote type ID from 8 to whatever. And because our select query has a semi-covering index, it will have a key lookup in that plan and we’ll end up with the same object level shared lock issue that we saw before. Now, sometimes we end up with a little bit of getting lucky with the timing.

So I got to kick that off and kick that off. Oops, I didn’t select the whole thing. And I probably have to go back over here and rerun this now.

There we go. All right. So while this update is executing, we ran the select query and the select query ended up in a deadlocked state with the other query. All right.

So that update query and this select query have deadlocked with each other and we have, well, our select query has been murdered, essentially. That happens again because there’s a key lookup in the select query, the object level shared lock, and the locks that the modification query are trying to take are fighting with each other. And then, like, we have the nonclustered index and it needs to update that and read from that.

And this one’s trying to select this and lock this. And just because of the way that those things sort of do their little murder dance, this select query ends up being the victim of a deadlock. Usually, SQL Server will try to choose who should be the deadlock victim by looking at which transaction would be the easiest to roll back.

And typically, select queries are the easiest to roll back because there’s no transaction logging to deal with. So this is obviously not ideal. Again, you know, a certain level of no lock hinting would prevent these issues from happening.

And also, sort of interestingly, there are other things that would fix it. You know, we could use an optimistic isolation level, again, read committed. If we had a covering index so that we didn’t have to do those lookups, then we could avoid that situation as well.

The slight danger here is that sometimes, you know, these select queries are doing, you know, select star or the equivalent of select star. It would be really onerous to create very wide indexes to solve all those scenarios when there are typically better options that fix it without you having to, you know, adjust many indexes, add lots of columns to them as include, stuff like that. There’s also some other things that we could do.

If we wrote the query and, you know, sort of like wrote our own sort of lookup and we used a, like we did a self join or used a hash or merge join hint, then we would be able to avoid that scenario because it’s really the lookup in there that causes the problem. Anyway, that’s a little bit about the read committed isolation level, why I sort of don’t like it very much, why I think read committed snapshot isolation is a better isolation level. It should be the default, it should be your default for all your databases.

If it’s not, just flip the switch, it’s the worst that happens. Everyone loves you, just like they love me. Anyway, if you turn on read committed snapshot isolation today, I will send you an Adidas t-shirt.

Promise. It’s only good for May the 1st of 2023. So I don’t want to get an email in like a year saying I did it and I want my Adidas t-shirt because the offer has an expiration date.

Anyway, thank you for watching. I hope you learned something. I hope you enjoyed yourselves. Remember, like and subscribe and all that stuff.

I do appreciate likes and subscribes. Views are also nice, but, you know, I want to make sure you keep viewing. So there’s that.

Anyway, I’m going to can this one, upload it, and start talking about something else because I have agreed to go roller skating today. And this might be the final piece of work that I do for SQL Server because I don’t have a lot of confidence in me on wheels. That’s why I don’t have a driver’s license either.

Anyway, time to go. Goodbye.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.