Learn T-SQL With Erik: A.C.I.D.
Video Summary
In this video, I delve into the concept of ACID properties in SQL Server databases, specifically focusing on how they apply to T-SQL operations. Starting from the basics, I explain each of the four ACID properties—Atomicity, Consistency, Isolation, and Durability—and provide clear examples to help you understand their practical implications. While the video doesn’t include demos, it does offer a thought-provoking example that illustrates how isolation can be misunderstood in SQL Server. By exploring scenarios where queries might not behave as expected due to read-committed isolation, I aim to highlight the importance of understanding and managing transactional integrity. Additionally, I touch on durability, explaining its significance and the potential caveats when dealing with features like delayed durability or in-memory databases. This video is part of my comprehensive T-SQL course, which will soon be expanding with advanced material after summer. If you’re looking to enhance your SQL Server skills, now might be a great time to join at the presale price before it doubles in value!
Full Transcript
Erik Darling here with Darling Data, and we’re going to continue on with the teaser content from my T-SQL course. All 23 hours of the beginner stuff out there published, ready for you to start consuming. The advanced stuff will be dropping after the summer. The course is still on the presale price of $250, and it will literally just absolutely double in value once the advanced material pumps out. So I would strongly suggest that you get in on this now while it’s at the cheaper price because I realize how price and cost sensitive a lot of people are. So in this video, we’re going to talk about acidity because acidity in the context of databases and more specifically in SQL Server is not quite the level of acidity is not quite the level of acidity that you might receive if you read a highly academic Wikipedia article that outlines all of the effects of the ACID, which is Atomicity, Consistency, Isolation, and Durability. So we’re just going to cover those. There’s no demos in here. It’s just sort of a one kind of example query to give you some food for thought on that.
So let’s look at this one on exactly how isolated a query executing might be. So Atomicity means each transaction either happens or it doesn’t. SQL Server by default runs in auto commit mode. So if you want to select, it’s sort of like in the background SQL Server on its own without you typing anything. It’s sort of like saying, you know, begin transaction, run the select commit. Same with inserts, updates, and deletes. It’s auto committing everything. So you don’t have to declare an explicit transaction for every piece of code you write. You can, of course, declare a specific and explicit transaction. But when you do that, like the, you know, assuming that you do it for a reason that is not just putting one query into an explicit transaction, there are multiple queries in a transaction. It is for the purpose of having those queries succeed or fail as a single unit. We do not want to leave those queries in a strange, lingering, stranded state.
But let’s say that you’re running an insert query and you’re trying to insert two rows. The first row goes fine, but the second row causes a, I don’t know, say primary key or foreign key violation, you know, constraint violation, something along those lines. That just means that, like the first row that even though that one was okay, isn’t going to still be in the table. That statement will roll back and the successfully inserted row will get removed.
Zooming out a little though, all atomicity means in a database is that each transaction will succeed or fail as a unit. It does not guarantee that other queries can’t interact with the underlying data while they’re running. Right? Like, you know, while you are inserting your two rows, someone else might insert a different two rows and someone else might delete a row somewhere else in the table.
And someone else might update a row somewhere else in the table. And someone might run a no lock query and select your two half inserted rows, right? A row and a half is in there or something.
Consistency for SQL Server is especially disappointing. Because the only thing that the C gets you is a guarantee that your constraints are obeyed. By constraints, I mean primary key, foreign key, unique, not null, and check.
Default constraints, I mean, well, I suppose they will be obeyed. They hardly qualify here for, I forget if I covered it in the teaser material for inserts. But if not, I covered in the full material.
So you should go buy that. But, you know, just don’t expect too much out of the consistency thing, at least in the context of SQL Server. However, isolation is the toughest one because most people are completely confused about what isolation means, like what isolation levels guarantee and at what point they provide the various guarantees that they make.
So let’s say that you have a code pattern that looks like this. You begin a transaction. You declare a local variable.
We’re just going to call this next user. And when you go to assign this local variable of value, which is going to be the ID column from the users table, you have to access, in the context of this query, you have to access the users table twice. If you’re, like, we don’t have any walking hints on here.
We don’t have any isolation level hints on here. So just assume that we’re using the default locking read committed isolation level. That doesn’t really guarantee much.
We’re going to talk way more about that in some of the other videos. But just understand that, like, you know, when you read, when you say select from, that’s one read of the users table. And when you say select from here, this is another read of the users table.
These two reads are not going to show data from the same point in time. Read committed is not a point in time view of data. So you might read this data from the users table in the outer context or the inner context at a different time from the inner context or the outer context.
And you still, if you’re scanning through the data, you have to read along those pages. All sorts of stuff can happen to your data as you’re reading along those pages. Again, it’s not a snapshot of the data.
It’s just you’re just taking little read locks as you go along and do stuff. As you’re taking those little read locks, all sorts of stuff can happen all around you. So, like, you know, just if we were to look at the query plan for this, just hit control and L.
We have one scan of the users table here. That’s going to happen at one point. This is going to run like the, like you have another scan down here.
But this isn’t going to run at precisely the same time because we have to get all this data, aggregate it, right? Scan the table, all two point something million rows, aggregate it, gather the streams, aggregate it again, distribute the streams again, and then go into a nested loops join so that every row that comes out of this goes in here and then reads the data.
And since that loop join is just going like this over and over again, these are all at different points in time too. Again, there’s no snapshot happening. So if the goal here is to find the top, like the most recent accessed user in the users table and then go assign them some award, right?
Say, hey, you’re the most recent user. You win a million dollars or reputations. You can go buy a cool stack overflow windbreaker or canoe or Zippo or something.
You know, like you don’t have a guarantee that that person actually was, like when this update ran, the person who should get it. Because all sorts of stuff can happen all around you. Someone else could log in right after you ran this query and this, whatever value you assigned to this would no longer be valid.
So this is where you would have to think about increasing the isolation of this query. Perhaps you would need serializable. Perhaps you would need updlock, right?
Perhaps you would need to do something to protect this data when you read from it both times in order to make sure that this date, like nothing else sneaks in and does something to this data, changes the validity of the value that you assign to that next user when you go and assign some stuff here.
Durability is an easy one, right? Durability, this one is very simple. What this means is that once a transaction commits, you won’t lose it.
There are some caveats around features like delayed durability, but in, you know, I think the in-memory stuff kind of makes it might make this a little weird too. But in most cases, unless you experience data corruption, committed transactions will be safe.
So just like talking, like thinking about the ACID stuff in the context of SQL Server, you know, A is most likely the strongest guarantee. C is a very iffy proposition.
I, you know, as you think about like, you know, depending on the isolation level you’re using, the I might not mean a damn thing, especially if you’re using read uncommitted or nolock. I means nothing.
I might not, I might as well not even be there. I might as well just be ACID. And durability is, you know, up there with A as far as strength goes, though there are some caveats to durability depending on database settings and, you know, the reliability of your underlying storage hardware.
So anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. In the next video, we’re going to talk a little bit more about isolation levels and stuff like that. So hold on to your bucket hats for that.
Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.