Learn T-SQL With Erik: Read Committed, Not So Committed
Video Summary
In this video, I delve into a scenario where a query running under the read committed locking isolation level in SQL Server appears to return impossible results, challenging common misconceptions about read committed. I explain that read committed only ensures that your query reads data as it was at the time of taking shared locks, not that other queries cannot modify this data before or after your query runs. Through a detailed example involving a query to find top-scoring posts and their answers on Stack Overflow, I demonstrate how concurrent updates can cause unexpected results due to the timing of lock acquisitions during execution. This video is part of my ongoing effort to clarify complex SQL Server concepts for beginners and experienced users alike.
Full Transcript
Erik Darling here with Darling Data. You’re welcome. In this video, we’re going to continue. We’re going to look at a situation where a query running under the read committed locking isolation level appears to return impossible results to us, the end user. And we’re going to do this because some people still have very, very strange misgivings and misunderstandings about read committed locking isolation. And what exactly read committed means. When you get down to it, read committed only means that your query at the time that it attempted to take the shared locks that it requires in order to read rows and pages and whatnot, the data at the time that it was read was what was committed in the database. That does not mean that other queries cannot interact with that data and change that data before and after your query has been submitted. And that’s the data has released those shared locks. All 23 hours of the beginner content is available to you. Currently, there’s a link down in the video description for you to buy it. The presale price is still 250 bucks with the value doubling, skyrocketing to $500 after the summer months when my brain has officially stopped sweating and we can start thinking again. So anyway, let’s take a look at this query. Now, the goal here is to find the top score of the query.
So we’re going to find the top scoring post that is a post type ID to his questions. And what we want to find are all the other answers. We want to find all the answers for the question that has the top scoring answer to it. So we’re selecting star from the post table. We’re going to have an index hint in here just for demo stability. And we’re saying where the parent ID of whatever we find, right, or the parent ID of the things that we want to find equals the parent ID for the highest scoring answer to a question. So all the other answers to the question essentially. Now, because we have a sub query in this query and this, right, we have the post table referenced once here and once here.
This really means that the post table is going to be read from twice. Under locking read committed, that means that there are, I mean, many opportunities for reads to happen at all different points in time because the shared read locks are going to happen for one reference to the post table, right? And we’re going to take those and quickly release them. And then we are going to join one reference to the post table to another reference to the post table on the parent ID column. And that’s going to be another opportunity for shared read locks to slowly be taken and released as they get the data that they need.
So let’s look at the execution plan for this thing. I’ve already run it ahead of time. The first thing I actually want to do though, is show you over here that when we run this unfettered without anything happening around it, without anything else interacting with the data, all the results that get returned over here have a post type ID of two.
Okay. Everything, everything next to me is a two. This whole column is all twos because that’s what we were looking for here. So if you, your, if your post has a parent ID, that means it was an answer to a question. If your post does not have a parent ID, that means it was the question, right?
Questions belong to answer. If your post, let’s try that again. If your post has a parent ID, that means it was an answer. So it was like, it’s like, it’s like the parent ID is the question ID that ties your answer to the question.
There we go. So the execution plan for this features two things. Well, I mean, I guess we have three opportunities in here to read strange data from the post table, don’t we? We have a clustered index scan here and this takes about five seconds.
So a single thread scans the clustered index and gets all the rows. The next thing we do is sort all of those rows. After we sort all of those rows, which we get, we have one of, there’s one row that comes out of this because we find the top one scoring answer to a question here.
Right. So we do that. We scan things, we find the top scoring thing, and then we join what comes out of this to a nonclustered index on the post table. And we find the nine, in this case, there are nine other answers to the question that we found with the top score here. So we go find the nine rows here that we care about.
And then over here, we touch the post table a third time because we need to do a lookup in order to get the other columns that we need to satisfy the presentation section of the query. So we have this whole output list here, right? This is all things that we’re going to do. We’re going back to the clustered index to get these columns after we’ve done a seek to find the parent IDs that we care about here.
So there are really three opportunities to read data from the post table. And they all happen at slightly different times. Like because there’s a sort, all the 11 million rows that we need to find in the post table, right?
We have to get all those rows to the sort operator. Then the sort can happen. Then we do a nested loops join to the lookup.
And then we have another sort here, which means all the rows have to show up here. And then we can start doing a lookup here. So really, there’s like one, two, three different points in time that we’re reading data from anyway. And during these three different points in time, all sorts of other things can interact with the data underneath you.
So let’s do this. Let’s go over what we’re going to do in the other windows. Let me scroll down here first.
So this is the one that we’re going to run when we start doing things. I’m just going to tuck this down here so it’s a little tidier on the screen in case I need to zoom in again. Over here, what we’re going to do is we are going to, this is the last row in the post table.
This is the highest post ID. And what we’re going to do is start a transaction and update the post table. And all we’re doing is adding one millisecond to the last activity date for this last row in the post table.
Right? There are no rows after this. This is the highest assigned ID. The ID is the clustered primary key.
So this is the end of the table. And then what we’re going to do is run this. So let’s pretend that this is all happening very quickly. Like in the database, it’s not me running things in separate tabs. This is a database with a fair amount of concurrency.
Right? And so we have things running at all different times. And those things need to be able to run and do things. And we don’t want to like have like a train schedule for all these queries where it’s like, you go. All right.
You go. Update. Yep. You go now. Right? Like that’s not happening. These queries are all just getting sent in and run and trying to do their thing while other queries are getting sent in and trying to do their thing. So let’s begin this transaction and run this update.
And now let’s run this query. Now, this query took about five seconds before. This query will not take five seconds this time.
Right now, this query is blocked by that update. Right? Because like one of the things that we have to do in this query plan is scan. We have to scan.
We have to touch the clustered index twice. We have like the first thing that we did was scan through the clustered index and then sort all that data. Right now, this query has read everything in the clustered index except that last row because that last row has a lock on it. Right?
So everything got read except that last row. So this select query running under recommitted locking is blocked from taking a shared lock on that last row by the update in the other window. So while this update is holding this lock open, let’s come over here.
And what we’re going to do is we are going to update a very coincidental nine rows, nine or ten anyway. And we’re going to set their post type ID to one. Coincidentally, these are the other rows that we need to, or rather the other rows that get returned by our query is the other answers to the question that have, that belong, they’ll belong to the question with the highest scoring answer and all of stack overflow.
This query runs successfully. Now, if we come over here and we roll this back, we don’t have to commit this. We can just say, oops, I didn’t mean to add it.
I didn’t mean to add that millisecond there. We can roll this back. And as soon as we roll this back, we get results immediately. If we look at the execution plan, well, things look a little bit different. This scan of the cluster index now took one minute and 28 seconds.
It took a minute and 28 seconds because it was like the five seconds that it normally ran for, plus the like 25 or so seconds that I was trying to explain things to you. What’s neat now is if we go to the properties of the select and we go to the weight stats, the top weight that we’re going to see is LCKMS. Right.
So we can see that this query waited a long time on a lock. That lock, that shared lock, LCKMS, it’s trying to take a shared lock, was blocked by the update query that started a transaction and said, go outside and smoke a cigarette. I don’t know.
Maybe grab a beer for lunch. I don’t know. We’ll see what happens. There’s a halal cart. I don’t know. Get crazy out there. But what’s very interesting now is if we look at the results of this query, even though we specifically requested that only things with a matching parent ID, right?
So only, only, again, only answers have a parent ID. That’s the questions don’t have a parent ID, right? And post type ID is answers and post type, post type ID two is answers and post type ID one is questions. Even though this is what we asked for.
And even though the first time we ran this query, we got all post type ID two back. If we look over here now, magically, mystically, all of these post type IDs were one, which seems impossible. Because a post type ID of one, it cannot happen for an answer.
Post type ID two is answer. We looked for other answers and we found them. They all have the same parent ID too. We were not, oh boy, oh boy, oh boy.
Questions do not have a parent ID. Questions are the parent. So something terrible has happened here. And this sort of stuff can happen all the time under any sort of application concurrency using locking read committed. So be careful out there.
Before you go and you start thinking that read committed locking is some sort of snapshot point in time, amazing, consistent, confident. Doesn’t it sound confident, right? Read committed.
What do you read? I read committed data. I don’t read uncommitted data. I’m committed data only. It sounds like an LLM level of confidence from read committed, but we do not get that. Read committed has very weak guarantees.
Those guarantees do not coincide with the start of a statement or the start of a transaction. They are just a mishmash of various points in time when your query was allowed to take shared locks. So again, be careful out there.
Your queries might be returning weird stuff all the time. This isn’t even NOLOC, which is the surprising thing. Like you would expect this sort of thing using NOLOC or read uncommitted, but not read committed.
It’s committed to reading committed data. Well, maybe not so much. Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
This is a great post Erik. Gonna celebrate with a beer for lunch.
If it’s not bad you could have one more for dessert.