A Little More About Isolation Levels In SQL Server
In this video, I talk about some of the misguided expectations that Read Committed has associated with it, and query patterns to watch out for under Read Committed Snapshot Isolation.
If you’d like to see my full day performance tuning session, here are upcoming dates:
Video Summary
In this video, I dive into the often-overlooked world of SQL Server isolation levels, particularly focusing on read committed snapshot isolation (RCSI) and its alternatives. With a humorous twist, I highlight common misconceptions about isolation levels and demonstrate how they can lead to issues like race conditions under different scenarios. By walking through practical examples and real-world queries, I aim to clarify the nuances of these settings and help you make informed decisions for your database environments. Whether you’re dealing with complex query patterns or just looking to avoid potential headaches, this video offers valuable insights into optimizing SQL Server performance without overcomplicating your setup.
Full Transcript
Erik Darling here with Darling Data, a surrogate style of Beardgut Magazine. Long, beautiful relationship. Today I want to talk about isolation levels a little bit because something that keeps coming up, and the most annoying things about isolation levels keep coming up, things that I really loathe having to repeat or try to do. tell people about over and over again. And we’re gonna do that today. Because Friday, there’s nothing like a good Friday talk about isolation levels. So first, no isolation level is perfect for everything all the time. I wish that one was because then we could all just use that and we could stop having these conversations. But usually, SQL Server, where we’re going to go, we’re going to go to the next level of the world. So we’re going to go to the next level of the world. The things that I have to talk people out of is one using no lock hints, because that’s where you get bad data potentially under concurrency. And the other one is the switch between read committed, the default pessimistic isolation level in SQL Server, and read committed snapshot isolation, which is an optimistic isolation level in SQL Server. The reason why that’s the usual choice is because most people is because most people have to like, like, like, don’t want to like, like, like, you can use snapshot isolation, but then every query would have to ask for snapshot isolation in some way, either like, like, like, when it like, if it’s an application, when it connects to SQL Server, if it’s a stored procedure, you have to add it to the code, stuff like that. And that just, you know, unless you have a lot of control, unless you have a lot of time and patience, to figure out which queries you want to use an optimistic isolation level, then that’s kind of a tougher one to talk folks into. That also assumes that you have control over those things, right? Like some, if you have a third party application, you might not, your only hope might be to use read committed snapshot isolation, which kicks in for every read query that comes in and hits the database. So that’s the most common choice. The thing is, every time I talk about switching over, where someone will go to some due diligence, and they’ll read some blog posts where all they can talk, they think that there are bugs in SQL Server with read committed race conditions and things like that. And that’s really not the case. The case is that there are certain query patterns you have to watch out for where you might hit race conditions.
Now, I’m not saying that read committed snapshot isolation is perfect, because again, it’s not. But for servers that I look at, where there is a lot of bad blocking between readers and writers, and a lot of deadlocking between readers and writers, read committed snapshot isolation is perhaps the safest way to solve all of those problems in one go. You’ll still have write queries block one another, but the read queries fighting with write queries and write queries fighting with read queries goes away completely. So it is a great isolation level for most SQL Server workloads. And most mature database platforms out of the box use some form of multi version multi version concurrency control by default, right? Even Azure SQL DB uses it by default, because Microsoft probably like, like, hey, we’re going to release this product, we’re going to manage it. We don’t want people complaining about blocking all the time, because then they have to go solve blocking problems. And that’s not fun, because that’s query tuning, index tuning, stuff like that. So read committed, the pessimistic isolation level has a lot of problems that one might consider to be race conditions, if one were to be really concerned about application and query concurrency. So here’s the first example, and I’m going to figure out which way I have to turn. There we go. That should get us in the frame.
Or get get all the text in the frame rather. So if you have a query that just finished reading what used to be row C, but it’s now a ghost record because it was deleted. And your query just finished reading it and your query just finished reading it and has now moved on to row D, that row C will still show up in your query results because your query already grabbed it. It doesn’t disappear from the results. All right. And these slides are all from my foundations of SQL Server performance tuning class. I delivered it a pass. I’m delivering it a couple more times. Data tune in Nashville and up in Boston in May, as part for the New England SQL Server user group. If you’re in either of those areas, it would be a pleasure to see you.
The second reason why read committed isn’t really all that promising of an isolation level is let’s just pretend that again, we just finished reading row C, but then it gets updated. And now we have two row H’s. Our query will show one letter C and two H’s in the results. Right? That’s not great either. Right? That would seem like a race condition. That would seem like a bug. But that’s the way read committed works. Read committed the non-snapshot pessimistic isolation level. It takes very brief read locks on things, but data can change on either side of those locks whenever it wants. Right? Because those locks don’t hold on for very long. There’s no lock escalation with read queries, at least without hints or whatever. But all of this stuff is open to change as soon as those locks get released.
Another reason why it’s not great is because similar situation. Let’s say we’re reading row E. Row F gets updated to be another value C. So now we have two C’s over here. But all our query will see is one C and no F. Right? And those are, again, something that could very much be interpreted as a race condition in your queries if you are really concerned about concurrency. Now, this is stuff that read committed snapshot isolation fixes. But we have to talk about some other stuff first.
The first thing we have to talk about are some query patterns in under read committed the pessimistic isolation level that can also cause things that look like race conditions, but are really just, again, the lack of promise that read committed has as far as what data is going to return. So if you do something like this, right, you, you know, in your store procedure and your query, whatever it is, if you set some variable value equal to something based on a select, locks on that select, unless you add locking hints into transaction, don’t hold on once that query is done.
Actually, even like once it finds that row, like data, data in the table can change all over the place. So if you were to take this, like whatever this gets set to and use it to like, you know, insert into another row, use it to like find data that you’re going to update, that data could be completely irrelevant by the time your query gets to it.
Again, under concurrency, if you’re just running it in isolation, everything’s going to look great every time. But under concurrency, the data in there could change really quickly. Another pattern that could have similar effects as a race condition, big air quotes on that, is if you dump data into a temp table and then you use that temp table to go update things, because whatever’s in that temp table is maybe invalid by the time you go to do that update.
So these are things that a lot of folks don’t think about when it comes to matters of concurrency. And this is a lot of the lack of understanding about the promises that Reid committed the pessimistic isolation level makes are way overblown. A lot of people think that Reid committed the pessimistic isolation level behaves like snapshot isolation, like where your query takes a picture of the data and it’s perfect, or like serializable where nothing can change while your query’s reading the data, right?
Because everyone thinks that what it’s returning is this magnificent piece of data, but all it really promises is blocking and deadlocking with modification queries. And all it promises is that the data that it read was committed at the time that the read happened.
So remember, just like in the slides, modifications can happen all around it. The only thing that’s guaranteed is if you hit a lock, your query will wait for that lock to release before reading that. That’s the only real guarantee that Reid committed the pessimistic isolation level makes.
Now, the next thing I got to talk about is query patterns that might exhibit what seems like a race condition under an optimistic isolation level. But a lot of the times when you hit this, these are also things where there is some potential for these as well under a pessimistic isolation level. They’re just a little bit more rare.
They’re actually pretty rare under even an optimistic isolation level because you have to write really dumb queries for stuff like this to happen. So I’m in my database crap. At least I hope I am.
Home is where the crap is. And I’ve got this table called dinner plans. And I’m going to populate that table with, well, when I wrote this demo originally, there were a bunch of people who I thought I was going to have dinner with at pass.
It turned out the only person I had dinner with was Kendra. But that was nice anyway. It was a great, great, actually, it was like a few dinners with Kendra. She’s like the only person who would hang out with me.
Whatever. Not bad company, though. And so right now in that table, our dinner plans table, this is what things look like. You got a list of people.
And for some reason, seat number one is free. But the other five seats in the table are all taken. So we’re going to make sure that RCSI is off for our crap database. And we’re going to look at, I have these queries set up in the other two windows over here.
But we’re going to look at just real quick what the query does. So this is the kind of sort of dumb query pattern that would exhibit a race condition under RCSI. And it might not necessarily hit under a pessimistic isolation level, but could still happen if things got weird enough.
So what we’re doing is updating a table alias, our table dinner plans, which is aliased as DP. If you are not a very mature person, you want to make jokes, go ahead. But it’s dinner plans, nothing more.
So we’re updating our dinner plans table. And for some reason, rather than just doing a regular update, we’re doing this exists check through the base table to look for an ID where the seat is free. So this is where things get interesting because we have two references to the table.
We have one for the update and one for the select. Only the reference for the update will ever have the type of exclusive locks on it that will prevent a read query under a pessimistic isolation level from executing and getting data. The reference in the select portion, that can read whatever.
So if you have a no lock hint in there, you’re screwed. If you, you know, if in that inner query, you know, any of the stuff happens that we talked about, that like where data can change around where the reads under the pessimistic read committed isolation level happens, you could still hit what feels like a race condition. All right.
So let’s go look at what happens when we do this. So I’m going to say begin tran. I’m going to run that. And we’ve output this. And now I’m going to come over here and I’m going to run this.
And this is going to get blocked. Right. This query is now blocked because we have this update in a transaction, updating dinner plans. And this query wants to update dinner plans and read from dinner plans.
We come back over to this first window and we commit this. This query will come back and return no results. All right.
Because that other query blocked it, updated that row to find a free seat. This query did not find a free seat when it went to run. So let’s commit this now so that we don’t have anything weird going on.
Let’s make sure this is fully committed. Now let’s change the crap database to turn read committed snapshot isolation on. All right.
This takes a second to run. That’s okay. It’s worth it. Okay. So now that’s turned on. Good. If we repeat that same demo, right, we’re going to run this. Oh, you know what?
I didn’t reset things. Let me admit that. Totally forgot to reset the table between runs so that that didn’t fail. And the joys of remembering stuff.
All right. So now let’s run that. And see, this one finds this. All right. And now let’s run this. Now this is still going to get blocked because that update is still happening.
On the other window, right? This transaction is still not committed. But because of the way an optimistic isolation level works, when this query did its update, the last known good version of the row got sent to either if you’re not using accelerated database recovery, it gets sent to tempdb.
If you’re using accelerated database recovery, it gets sent to the version store, local to the user database you’re in. And now, because this query is going to read a versioned row that this query is updating, when we, right, this one found this seat, right, and we commit this, now we come over here, and what did we find?
We found a seat. We know we found a seat. We found a seat. Because this thing got updated to the reverse Eric down here. All right.
So now if we commit this, we’ll have a little bit of an awkward situation. Because forward Eric will think that he got a seat at the dinner table, but backwards Eric will have the golden ticket and say, I get to sit there.
You don’t get to sit there. And then forward Eric and reverse Eric will, like, matter and anti-matter fight. And, I don’t know, some sort of universe death probably will happen. So these are the kind of query patterns that can cause things that, again, look and feel like race conditions under RCSI that you wouldn’t necessarily hit under, read committed, the pessimistic isolation level.
But, again, the promises that read committed, the pessimistic isolation level make are really flimsy. So, under most circumstances, for most query workloads where people aren’t writing completely idiotic queries, or if you have no-lock hints everywhere anyway, you’re probably better off using an optimistic isolation level because there’s far less, like, room for error than there is when you’re using no-lock hints.
And there’s far less pain than if your read queries are blocking and deadlocking with your write queries all the time. Right?
So, like, this is most queries function better than any database using an optimistic isolation level. If you have queries that don’t, if you have queries that need to read the most up-to-date version of data, just keep in mind that those queries are going to be subject to blocking and deadlocking.
If you want to enable read committed snapshot isolation and you want to have certain queries, not use row versioning, there’s a perfectly good read committed lock hint you can add to those queries or other locking hints that would make sense for those queries.
But just like a direct update like this wouldn’t have the problems that we were looking at. The problem really is the subquery doing the select reads a version of the row that it looks like it wasn’t supposed to because that should have been taken.
But, you know, again, it takes pretty high concurrency for you to find these problems. And it takes pretty stupid looking queries for you to find these problems. Right?
So, in this case, you know, that self-join, completely unnecessary. If you write modification queries that do things like that, you kind of deserve what you get. That’s not a smart way to write queries.
But there are times when you would have to write a query sort of like that, like a different query pattern. Like in this one, it’s particularly stupid because it’s just one table that we want to update and there’s no reason to do a subquery to touch another table.
But if there were like a different table where like we had to like update from like, you know, some reservation list or like a guest list or like, you know, like a list of reservations where you could possibly like go, you know, go to different restaurants or whatever.
Those are circumstances when having a subquery would be necessary. But I just want to remind everyone that like those subqueries under read committed could also read some weird data.
Like data could change before or after and because this table is the only one. It’s not going to have any sort of exclusive locks taken in there or around it. This wherever the select is.
So if you join tables together to do updates and there’s a no lock hint on like one of the on the table that’s not getting updated or you’re just using read committed the pessimistic isolation level.
You can still see weird stuff that’ll look and feel like race conditions under a pessimistic isolation level. Well, actually specifically under read committed the pessimistic isolation level. Serializable and repeatable read offer way more guarantees.
They also offer way more blocking. So you have that to look forward to. Anyway, this was a little bit longer than I expected. And like I said, it’s Friday and got some stuff I got to go do.
So I’m going to go do that. I’m going to get this started uploading as usual. I hope you enjoyed yourselves.
I hope you learned something. I hope that if you are not currently using an optimistic isolation level and you have a lot of problems with locking and blocking, you’ll consider using an optimistic isolation level. I hope that if you are slathering your queries with no lock hints, some bizarre cargo culting about saying that’s a best practice that you’ll consider using an optimistic isolation level and removing those.
If you enjoyed the video, pretty pleased give it a like. If you enjoy this sort of technical SQL Server content, feel free to subscribe to my channel. I’m always happy to have new folks coming in and learning stuff.
And I don’t know. Gosh, I think that’s it. I can’t think of a single other thing to say. Happy Friday. Hope everyone has a great weekend.
Thank you for watching and I’ll see you in the next video. Have a good one.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.