Learn T-SQL With Erik: Unfortunate Deadlocks Under Read Committed Locking

Learn T-SQL With Erik: Unfortunate Deadlocks Under Read Committed Locking


Video Summary

In this video, I dive into the world of deadlocks in SQL Server, specifically focusing on a unique example that showcases how read queries can deadlock with write queries under the default Read Committed isolation level. This is part of my 24-hour T-SQL course, which includes beginner-level content designed to help you understand and manage these complex issues. I demonstrate this through a simple yet illustrative scenario involving two tables, `deadlock_one` and `deadlock_two`, each with 5,000 rows. By running transactions that update one table while selecting from the other, we observe how SQL Server resolves deadlocks by terminating the transaction that has done less work. This example highlights the importance of understanding isolation levels beyond just read committed, as they can significantly impact performance and stability in your database systems.

Full Transcript

Hey, Erik Darling here with Darling Data. And we are going to continue on with some of the teaser material from my T-SQL course, this one about deadlocks, in which I’m going to talk a little bit about them and show you a fun example of one that is a little bit different from the type of deadlock that you would maybe see in other example, like deadlock examples for beginners. This is, of course, part of the 24 hour, or so, 23, 24 hours of content of beginner material that I have published already. The course is still with a pre-sale price of $250, and we are going to double in value to $500 when the advanced stuff drops after the summer, so, buy now, I guess is your call to action there. Anyway, let’s go talk a little bit about deadlocks. Now, man, you know, one of my least favorite parts of the job is dealing with deadlocks because, well, many of them are very simple and just like, we didn’t have an index there, did we? Like deadlocks can get quite complicated very quickly. Most examples of them, including the one that we’re going to look at today, are part of, like there’s like a two query deadlock, but I’ve run into plenty of situations where there’s a deadlock with a single query on a single table, and you know, you can, either see that from a parallel execution plan or something, and you can even run into them with like a single query, like two, sorry, two queries on a single table, depending on what the query plans look like and what the indexes are doing.

We’ll talk about that in a couple of videos when we talk about, talk a little bit more about how crummy of an isolation level read committed is. But deadlocks are different from blocking because blocking does not represent a situation with a permanent impasse, right? Like, like as soon as whatever query is like leading the blocking finishes or is committed or is rolled back, everything else can make progress.

In deadlocking situations, you have reached this sort of like circular embrace, this like Mobius strip infinity symbol of queries that cannot make, like one or more queries that cannot make progress until something else, like until something intervenes, right? Like there, like there is no hope for this traffic jam until SQL Server wakes up, looks around and says, kaboom, and knocks one of them out of the, the, the executing query queue. So, uh, blocking does beget deadlocks though.

So if you are dealing with deadlocks on your system, you should not just focus on deadlocks. You should focus more, you should also focus more broadly on blocking. Uh, if you’re going to set up, uh, like, you know, uh, an extended event to get deadlocks, you should also set up an extended event to get the block process report because you will, you will capture not like with, like you not only have the deadlocks from the XML deadlock report, but you’ll have like the sort of wider view of all the stuff that’s blocking.

And maybe why things are leading to deadlocks and not just, you know, like staying as like being like, Oh, well, like, you know, we just had this blocking chain and now this thing comes along and screwed everything up. So just sort of generally, uh, SQL Server will choose whatever transaction has done the least amount of work and to kill off when, uh, it wakes up and detects a deadlock. You can, of course, set deadlock priority to a number from negative 10 to positive 10.

If you want to give queries higher or lower priority, as far as who, uh, you know, who will survive the great deadlock wars. So, uh, I’m going to create two tables. One is called deadlock one and one is called deadlock two.

And I’m going to put about 5,000 rows into both of them. And this is just what the tables look like. Now there’s a, you know, pseudo random date and value data in here. And, uh, what we have is just IDs from one to 5,000 in both of them.

That is the, that is the not random part of this at all. Now, what I’m going to show you as far as the deadlocks go is a little bit different from what a lot of other people will show you when they want to show you your first deadlock. Most of the time it’ll be between like, you know, uh, updates or deletes or inserts like modification queries, which is fine.

Modification queries can absolutely deadlock with each other, but under SQL servers, default read committed locking isolation level, which is, uh, as far as I’m concerned, um, one of the most garbage isolation levels to ever exist. Uh, you can see deadlocks just between read queries and write queries. So what I’m going to do is, uh, I’m going to start a transaction.

And these are the two examples that we’re going to use. We have a transaction called T1 that is going to, uh, update, uh, deadlock one where ID equals one, right? So we have a, we have the clustered primary key on ID equals one.

So we seek to this immediately and then we are going to get a count of just from like join both of the tables together on that ID column. Uh, the second query is part of the deadlock is just going to be in a transaction called T2. I didn’t name these for any particular reason other than to differentiate them in the windows.

Uh, and then in this one, we are going to update deadlock two and then get a count from both tables. So we’re kind of doing the same thing in both of these. So in T1, if I come over here and begin the transaction and run the update and in, uh, this window for T2, I just try to run the whole thing.

And then I come back over to T1 and I run the select portion of this. Eventually one of these queries will get knocked off. Which one?

I don’t know. Hey, it was this one. I don’t have to switch windows. Right. But now this is a deadlock that occurred between read queries and write queries. Again, this can happen under, uh, isolation levels, not just read committed. Of course they can, if we were under serializable repeatable read, we would, we could have the same situation occur.

Because the shared locks that this query is that, that the read queries are trying to take are also going to interfere with the exclusive locks that the update queries are trying to take. So not only do we have the updates blocking the reads, but now we have the reads blocking the updates. And now we have a sort of, uh, little, little X shape thing where our shared locks and our, and our X locks are messing with each other.

And they are not having a good time. Someone had to die at this party. And that’s, that is not a good party.

As far as I’m concerned, there are, there are far better ways to throw a fun time than killing someone. Right. It’s not, not, not pleasant. So, uh, this is again, behavior that will happen under isolation levels where, uh, where, uh, read queries and write queries compete with each other. So read committed locking, uh, repeatable read and serializable.

These kinds of deadlocks don’t occur. If you use a real version and isolation level, like read committed snapshot isolation or snapshot isolation, or if you use, uh, an uncategorized isolation level, like read uncommitted. Of course, if you use read uncommitted, you deserve all the pain that you get in life.

And maybe you should go to that kind of party. I don’t know. I’m, I’m, I have mixed feelings about you. Anyway, uh, just a little bit about deadlocks here. Thank you for watching.

I hope you enjoyed yourselves and I will see you over in the next video where we will talk about, uh, exactly why, uh, I have very mixed feelings about this read uncommitted isolation level. So, all right, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Lock Compatibility

Learn T-SQL With Erik: Lock Compatibility


Video Summary

In this video, I dive into the basics of locking in SQL Server, covering essential concepts for beginners while also touching on some interesting nuances that might surprise even experienced DBAs. Starting off with a brief overview of my Learn T-SQL with Erik course, which is currently available at a presale price of $250 before it increases to $500 post-summer, I then delve into the fundamental types of locks—exclusive (X) and shared (S)—and how they interact within different isolation levels. I explain lock compatibility in detail using practical examples, such as how SQL Server handles row-level deletions and subsequent reads from the same table. The video concludes with a teaser on deadlocks, setting up for an upcoming discussion that will explore this complex topic further.

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we are going to talk about locking a little bit. Again, you know, sort of basic beginner concepts, but there are some interesting things that we’re going to go over along the way. This is, of course, all teaser material from my Learn T-SQL with Erik course. There’s about 23 hours of it available now. All the beginner material is out. It’s at the presale price of $250, and that will go up to $500 when you’re going to get it. So, of course, when the advanced material has popped after the summer. But for now, let’s go talk about a little about locking in SQL Server. Now, broadly, you can categorize locks into two categories. You have exclusive or x-locks, which occur when your queries are going to modify data. You can, of course, add locking hints that will change the locking behavior of your queries. Select queries. You can add an upd lock hint to your select query. In other databases, it’s more common when you see the select for update pattern where you’re going to select some rows or a row or something and then do something to change that row later. And, of course, there are exclusive locks when you’re going to change data and shared locks, s-locks, when you’re going to read data. There are many, many different lock subcategories, too.

So, if you ever looked at the sys.dm.os.waitstats view and just searched for weight types that start with LCK, at least in SQL Server 2022, there are 64 different subtypes that you can run into as far as locks go, which is a pretty fair amount of subtypes. But they all mean the same thing, that something has taken a lock, right? So, something wants to take a lock. Waited on taking a lock. There we go. That’s probably the most more accurate way of saying it.

Exclusive locks are quite aptly named, of course, because their access is taken in a way that will prevent other exclusive locks from being taken. There are some caveats to this with, I guess, you know, the snapshot isolation level, but we’re going to talk about those way later on. Exclusive locks will also block shared locks, depending on, like, assuming that the locks are taken at a granularity that prevents the shared lock from reading the row or rows that it cares about.

There are, of course, caveats to this as well when it comes to isolation levels, not just snapshot, but also read committed snapshot isolation and, of course, the read uncommitted isolation level, which, you know, you can just read any old thing that comes along. Shared locks are a bit different. Shared locks are a bit different.

Shared locks are a bit different. But, you know, in general, shared locks don’t accumulate in a way that lock escalation occurs. SQL Server cares about lock escalation because locks are managed with memory.

SQL Server tries to be courteous and respectful about your memory. Sort of. You know, we all know SQL Server is a memory pig.

But because of that, locks may escalate from the row or page up to the object, table, or partition level. Locks do not escalate from row to page. That intermediate step does not exist.

You go from row to object, table, or partition, or page to object, table, partition. You do not go from row to page. Right?

I just want to make that very, very clear. In SQL Server, the ability for locks to be acquired when other locks exist is called lock compatibility. I’ve tried to put together a table here to show you if a lock can be taken.

That’s the request mode. If another lock has been granted on that data. Right?

So this table is, I don’t know, for me, it’s pretty useful to figure out, hey, if this lock exists, can I take this other lock? SQL Server kind of does this pincer movement when it’s trying to figure out if the locks that it wants to take are available to be taken. So, you know, if you’ve ever looked at, you know, block process or lock acquired stuff, you might see a lot of, like, database level shared locks because that’s just queries going into the database to say, hey, I need to use this.

We can all share it. And then as queries get down to very specifically what data they actually want to work with, that’s when they start checking to make sure that, like, whatever locks they want to take are, like, available to be taken. They’re not blocked.

Like, nothing else has that lock or nothing else has a competing lock from this table that would prevent it from going through. Just a very simple example of that is if we have a table called, well, we have a table called two rows. It’s not an if.

We definitely have one called two rows, and there are definitely two rows in it. If we come over here and we just run this query, we’ll see that there are two rows and one column in the entire table, IDs one and two. And if we run, if we, like, I talked about it in the last video, this is sort of like just a reinforcement point here, is if you are going to begin a transaction, right, if you’re going to have, like, call an explicit transaction, or if you are going to modify data in your database, you need to do everything you can to make that as short a window of time as possible.

I realize that not everything can be made as quickly as you would like it, but there are lots of things you can do to make these queries go as quickly as possible. The reason for that is because you want to reduce the surface area of your modification queries as much as you can, particularly if you’re using the default locking read committed isolation level. The longer you hold locks for, right, which can be exacerbated by long-running, like, explicit transactions, or just long-running auto-commit transactions from a single query trying to change data, the bigger that surface area is, the more chance there is of it causing problems for other things trying to work with that same data.

It doesn’t matter if it’s read queries or write queries because they can both be interfered with. Of course, you know, NOLAkins can get you around a lot of that, but that’s not what you want. And row versioning isolation levels can help with that, but they are not a free ride.

So even if, you know, you’re aware of things that you can do to sort of, like, remediate blocking situations that have negative performance impacts for you, like, there still might not be, like, a great answer aside from, like, you know, like, working to make those locks last for a shorter amount of time so that fewer things mess with it. Anyway, just a very simple, like, example of the sort of pincering movement of can I take this lock or not. Let’s say that we open a transaction and we delete the row with ID 1 from our table called 2 rows.

We haven’t committed this transaction yet, which, you know, you can, that can lower your opinion of me if you’d like. We are, right now, we are being bad SQL Server practitioners and we are leaving this transaction open. But if we try to select from the table where ID equals 2, because we have an index on this table that allows us to seek precisely to the row that we care about, we are able to still read ID equals 2.

Because ID equals 2 is a different row than ID equals 1. Right now, over here, if we go come back to the delete query, the execution plan for this is very simple. If we look at, hover over the clustered index delete, we’ll see the seek predicate where we have deleted the row where ID equals 1.

Right? That’s this thing here. So, because we have, because we are able to seek to a value in this unique index, SQL Server is like, well, great.

Right? I can read stuff around that. So, this is the way that, like, you know, the lock compatibility stuff works where, you know, even though the row for ID 1 has that exclusive lock on it, the row for ID 2 has nothing on it.

And SQL Server is like, well, cool. I can go take that. Of course, if we try to select the row for ID 1, this will immediately be hung up.

We are not able to read this. Right? We can’t do anything with this right now because, guess what, that delete still has that transaction open and the lock still held. We can, of course, get around this with the no lock hint.

But is this what we want to see? Is this correct? What if we come over here and we change our mind? Are we okay with the fact that this thing read a row that we thought we might want to delete but is now back in the table?

And these are the sort of questions that we have to start asking ourselves when we start, you know, festooning our queries with no lock hints is, are we okay with, like, you know, like, granted, if a transaction is going to succeed, this might be fine. This might be correct. But if there’s any chance that that transaction might change its mind or fail or any number of other things, is the fact that we saw no row here for that okay?

Is this really the behavior that we’re after? In the next video, we’re going to talk about deadlocks, which is like locking plus. Right?

It’s just like, not only is it blocking, but we also have this now problem where two queries get stuck because they both want to do things that the other one has a lock on. So thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And I will see you over in the next video where we’ll talk about deadlocks. All right. Goodbye. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 27

SQL Server Performance Office Hours Episode 27



To ask your questions, head over here.

What are stupid things to monitor in SQL Server?
INTERVIEW QUESTION: what’s your sql weakness?
If you were a sandwich, what kind of sandwich would you be? Would you eat yourself?
I am not an expert and have a hard time telling when what someone is saying is BS. What are some signs?
Is there a way to tell if rows are locked before trying to modify them?

Video Summary

In this video, I had a fantastic Office Hours session with the Darling Data community! We dove into some fascinating topics, including what to monitor in SQL Server and which metrics are actually useful for troubleshooting. I shared my insights on avoiding meaningless stats like page life expectancy and context switches, advocating instead for more actionable metrics such as wait stats and query store data. We also tackled a few fun interview questions—my weakness is clearly not math, but rather the complex algorithms that others can craft so effortlessly! The sandwich question was a highlight, with roast beef and Swiss on rye being my go-to choice. Lastly, we discussed how to spot misleading information in technical content, particularly when dealing with large language models. Overall, it was an engaging session full of great questions and lively discussions—thanks for tuning in!

Full Transcript

Erik Darling here with Darling Data, and I have a very, very exciting Office Hours episode for you, I’ll tell you what. It’s going to be twice as exciting as the last one. I have no substantive facts to back that up. I just have a feeling in my bones. My bones feel excited. But anyway, if you would like to ask your own questions for these Office Hours episodes, there’s a link right there. It’s down in the video description. You click on it, you ask the question. It’s wonderful. Everyone gets a chance. If you like this channel content and you want to sign up for a membership, that is also down in the video description. It’s a grand thing to do. For as few as $4 a month, you can make me happy. If you need SQL Server consulting help, health checks, performance analysis, hands-on tuning, dealing with performance emergencies, and of course, training your developers so they stop giving you heart attacks and other problems, then I am the man for this. The job. And of course, as always, my rates are reasonable. My performance tuning training, again, that link, that discount code there, the everything bundle becomes yours for life for about $150 US. And of course, the thing that I have been working so very, very hard on to the point where I barely sleep anymore is my, I’m kidding, I sleep very well, is my new T-SQL course. All 23 hours of the beginner content is out available. You can go grab it, watch it, to your heart’s content. And of course, it is available at the pre-sale price of $250. And that will double in value to $500 when everything is done.

Red Gate. Sweethearts at Red Gate taking me all over the world this summer. New York City, August 18th to 19th. Dallas, September 15th to 16th. And Utrecht, the Netherlands, October 1st and 2nd. This is all part of the Pass on Tour series of events leading up to Pass Data Community Summit, not Comumity Summit. It’s a funny thing that happened once upon a time. And that is in Seattle, November 17th to 21st. But with that out of the way, let’s do the office hours voodoo that we do so well. We got some real brain acres today here. This is going to be a purse swinger. Let’s see. What are stupid things to monitor in SQL Server?

Anything that you would look at and think, what do I fix? Page life expectancy? Stupid. Beat it. Anything that’s a queue length, like bin it. Context switches. What are you going to do? Stick to stuff that is, I guess, things that when you see them, you’re like, oh, I can tie that to something. Wait stats are great. You know, like having like query store on is great. Look at query store. It will tell like you can feel like I have a very good store procedure called SP Quickie Store that you can look in there and see your worst queries and query store. It’s wonderful. Block process and deadlock reports. Wonderful things to look at so that you can monitor what’s actually going wrong on your server and not just have this wall of numbers and wonder, oh, okay. Well, what caused the context switch?

Why did PLE drop? Who cares? Like look at things that give you something to fix and work on. Like, like, like if you want to like gather a bunch of numbers and just be like, hey, I guess I made them go down. Cool. Right. That’s, that’s something you can do. Otherwise, you need something tangible to go work on. Right. You need a, like, you need something to do. Right. Fill your day with joy.

All right. Interview question. What’s your SQL weakness? Oh God. Math. I am not a math person. Uh, I, I read, uh, articles that other people write where they do like all sorts of crazy math algorithm calculation stuff in their queries. And I’m like, I don’t know. How do you know how to do that? Did you go to school? What? These, these numbers are foreign to me. There’s, there’s like a, like Greek letters and stuff. And I’m like, oh, well, smarty pants over here. I just make the queries go faster. You write the math. I’ll make them, I’ll make it fast. That’s, that’s my SQL weakness. All right. Oh, this is, this is the, this is the best question that’s ever been asked in office hours. If you were a sandwich, what kind of sandwich would you be? Would you eat yourself? Oh, any day of the week. Uh, so I mean, my, my favorite sandwich in the world is of course, uh, the, the roast beef and Swiss on rye. Uh, you gotta have the deli mustard on there.

No weird mustard, uh, prefer a Dijon of some variety and, uh, we are good to go. And boy, I tell you, I take a bite of that right now. All right. Uh, oh boy, this is, this could be a long one. I am not an expert and have a hard time telling when someone is saying BS. I assume we’re talking about, about SQL Server. Uh, what are some signs? Um, I don’t know. I mean, obviously if, if you detect, uh, through your advanced knowledge of what LLM output looks like, uh, that would be the first sign.

Um, you know, uh, like anything that is, you know, a bunch of emojis, um, you know, like random percentages, like, and numbers, like, uh, like I made my CPU 38% happier last week. Like, Oh, okay. Yeah. Great. Um, but like, if you’re, I don’t know, like if you’re reading a blog, like, I don’t know, stuff that sticks out to me is like, especially when someone is like, start, like starts off with like a problem statement. And it tells you they’re going to solve this. Like I have this query that runs slow, uh, and we’re going to make it faster in this blog post.

And then like the blog post drags on and like, like you can tell the query is not getting any faster, but like they start talking about other metrics and like start like just like, like referring to things like, like, oh, this query runs for 1.3 seconds. So now we’re going to do this thing. And, and look, we reduced logical reads and look, the query cost is lower. And you’re like, the query didn’t get any faster. What happened? Hey, where are you going? And like, stuff like that. Um, but you know, I honestly, like most of the BS comes from LLMs these days.

So really just if, if someone like, if like something like, like emoji filled nonsense shows up, just skip over it, right? Just leave it alone. It’s not, it’s, it’s like commodity information that you can get like anywhere. Like you could go ask an LLM for this thing and it would give it to you. Like this, this person did nothing to provide it to you.

Uh, so like, I don’t know that that’s, that’s, that’s kind of what I’m on the lookout for these days. Um, you know, like I suppose it’s different if someone is like presenting something and saying stuff out loud. Cause you know, I’ve, I’ve certainly sat in presentations where like the person presenting was completely wrong about something.

But like, I mean, I don’t know, like, like, like either someone else wrote the presentation and they’re just reading it or like, like whatever, like wherever they got this thing from was, was wrong in the first place. Like their source of information was bad and you’re just sitting there, but like, they are very, but like it’s on the slide and they are very confident that it’s true. So, uh, that, that situation is a lot harder, uh, you know, cause you walk away from that, like, you know, this person was presenting live at a conference.

Like how they had to know what they were talking about, but there’s not a lot of like the conference tech review of like the materials beforehand. A lot of that’s just like, yeah, go like you really, you’ll talk for free. Come on, do something.

Uh, so, you know, that, that situation is harder, but, um, you know, just in general, you know, like pay it, like kind of like pay attention, like try to follow along. If they jump around a lot and there’s a lot of like, but look over here now. Oh, look at this.

Look, that Santa’s way too fat for that chimney. Uh, that’s when you can kind of get a sense that like, you know, they’re, they are misleading you in some way. Uh, oh, whoa, boy. All right.

Technical question here. Uh, if, is there a way to tell you? If rows are locked before trying to modify them. So, uh, you could, of course, like, you know, interrogate various DMVs and look for, uh, unfriendly locks. But at like the, at the row level, that would be tremendously difficult, uh, if not impossible.

And, um, you know, like it would come at some expense trying to like interrogate these views all the time. Uh, probably what I would do is just set a lock timeout. And like, like throw a retry loop around the code.

Like you don’t want your code sitting there. Like, you know, like if exists like this. Oh, wait, hang on a second. Like doing that.

Uh, or like while exists, wait for something. Um, like you, like sending, sending the lock timeout, you know, set it for like, I don’t know, like a second or two. And if you can’t get a lock in that time, like go into some sort of retry thing.

Uh, cause like the lock timeout sends, sends back a very specific error number. So you can use the error number function to say, oh, like if like the lock timeout, like expired, I hit this error number. It’s like 12 something.

Uh, and then you get that error. And if you like, that’s the error that you got from it, it wasn’t like a primary key violation or some other failure, then you would just like, like, oh, cool. It’s, it’s like my lock, my lock expired, my lock timeout expired.

Like I’ll wait for like, you know, a second, maybe some exponential back off math in there and then retry it. So I wouldn’t, I wouldn’t try to check if rows are locked before, um, before trying to modify a query. I would, I would just, you know, I would just go for it and try it.

And then set like with the lock timeout, let your query expire and go do something. Anyway, uh, that is five questions. Count them two, three, four, five, six.

Uh, I think we’re good here. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and please keep these great questions coming. Uh, I particularly enjoy ones about sandwiches. So more sandwich questions are better.

All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Serializable Is Not A Snapshot

Learn T-SQL With Erik: Serializable Is Not A Snapshot


Video Summary

In this video, I dive into the nuances of SQL Server’s serializable isolation level and how it impacts data consistency and blocking scenarios. We explore a practical example where two select queries within the same transaction see different views of the data due to shared locks held by the transaction. This session is part of my comprehensive 23-hour T-SQL course, now available at a presale price of $250—doubling in value once the course is complete. I urge you to save on this valuable resource and purchase it before missing out later. Let’s demystify serializable together and understand why, for most scenarios, it might not be the best choice for achieving snapshot isolation levels.

Full Transcript

In this video, you and I, you and me, we’re gonna get real serializable together. This is of course a small taste of the material that I have in my T-SQL course. 23 hours of beginner content is all available now. It’s still at the presale price of $250 that will double in value to $500 when the course is done. So as usual, your job is going to be done. Your job is to save $250 and buy that thing now before you regret it later. You have to give me $500. Then I light a cigar with it and walk away laughing in my fur coat or something. I don’t know. Whatever. Leave me alone. So we’re gonna, again, we’re gonna talk about serializable because, you know, a lot of developers have this weird view of, you know, isolation levels that, you know, they really isolate everything. And, that’s still not exactly true even with serializable. Serializable does come kind of close because it forces read queries to take and hold shared locks on data. When, you know, it may even take like either like a lock on a, on a key if you’re using an equality predicate or a range of keys. If you are, you know, searching for like, like a range of values, right? Between greater than less than zero.

All those, all those range of things. So, uh, this, what serializable does prevent you with what most people would call an unchanging view of data. Uh, it is still not necessary. It’s still not a snapshot of the data within a transaction. Like, you know, like different, what I’m going to show you today is different queries, uh, within the same transaction, seeing data from different points in time. Uh, what serializable doesn’t guarantee is particularly when that view of the data started. All you know, is that that is how the data looked when the query finished. That might be good enough for you. If it’s not, you might need to look at, uh, the snapshot isolation level. Uh, read committed snapshot isolation does, you know, also come reasonably close to that. But snapshot isolation is, uh, is the snapshotiest, especially if you need, uh, transaction level consistency and not just statement level consistency. That’s one, that’s, uh, pretty much a place where read committed snapshot isolation and snapshot isolation differ. But we’ll, we’ll talk about that more in depth later.

Uh, the view of data that serializable provides is not necessarily, uh, from like when a transaction began, especially if the query was blocked. And that’s, that’s kind of what, that’s what we’re going to look at in these demos here. So, uh, what we’re going to do is just drop a table, create it and stick 11 rows in it. Uh, you can see here that we have, if zoom, it will cooperate. We have the odd numbers from one through 11 put into, uh, this table called surly eyes, which is, uh, one of the best bars in Boston.

If you’re in case you’re, in case you’re wondering where the name come from, came from, come from, it’s getting towards the end of the day here. All right. So let’s make sure there’s no weird open transactions in here.

And we’re going to, we’re going to run through these demos in two different ways. Uh, the first way I’m going to run this is we actually need to switch this back to seven. Is we’re going to start a transaction.

We’re going to set the isolation level, the serializable. And then we’re just going to run a select from that query where ID equals seven. Now this table does have a primary key on the ID column. So we don’t like, like we can just seek right to whatever we care about.

We’re not like scanning pages or like locking the table or anything. So, uh, let’s run this and let’s grab ID seven. We can, we get that very easily, right?

But now we have, we have this, this transaction and serializable has taken a lock, right? Like it’s going to hold this shared lock on ID seven. If we come over here, what we’re going to find is that we are still allowed to insert ID eight into the table, right?

So we can put ID eight in because serializable doesn’t need a lock to make sure that like ID eight doesn’t change. But now if we come over here and we try to run this, this select query is going to get blocked because this select query is looking for where ID is less than or equal to 11.

Remember 11 was the last key in our table. So it’s essentially just looking for everything behind 11. This query is now blocked because like this, and you know, this would happen under read committed as well.

This query would get blocked because they’re like this insert has an open transaction and it’s holding the lock on ID eight. So we can’t read that row while we’re doing this. So we can’t read that. But if we run this insert, right?

We do this and this all finally commits. What I think surprises a lot of people is that the serializable transaction with two select queries in it that ran at two different points in time, essentially saw two different views of the data.

Right? So like this thing ran and it found ID seven. That’s not really all that weird. But what people find weird is that when we run this query to look for where everything where ID is less than or equal to 11, not only does this query see the ID eight that we inserted, but also the ID 11, sorry, the ID 10 that we inserted, the even numbers, not the, not the odd numbers.

So this is what kind of freaks people out. Cause like they’re expecting this, you know, unchanging, like consistent snapshot view of data, which is not what serializable provides.

Again, it’s only what the data looked like when the query finished. Like when it, when that query began or any locks that it ran into along the way can, can certainly interfere with that. So let’s make sure that we don’t have anything weird going on here.

Uh, make sure this is all fully committed. So when we refresh the table over here, we don’t get blocked and get embarrassed by ourselves. So, uh, let’s do this.

Let’s, uh, let’s insert. Let’s start a transaction and insert ID eight here. Now let’s come back over here and let’s actually change this query for a little bit of texture. And let’s run both select query.

Let’s actually just run the whole thing at once. Right. And now we have these two select queries running. This one is blocked looking for ID eight. Cause ID eight is held onto with this one, but now I can run this query and I can insert ID 10. And like, not only is this query that ran, right?

We like this query that ran up here now found ID eight. Cause this happened after the lock was released, but this query once again, saw both eight and 10. So the main thing here is that, you know, serializable, like, you know, select queries will take and hold shared locks much, much longer than they will with read committed, the default locking isolation level for all SQL servers, except for Azure SQL database.

But, um, like serializable will take and hold shared locks and where read committed doesn’t. And what that, what that sometimes leads people down the path of is think that they’re thinking that there is like, you know, some internal mechanism scheduling these queries, like queuing them up.

Like, like when you see like the, like the blog post diagrams of like, here is T1. It is scheduled for now. It’s like a train schedule.

Here is T2. It is scheduled for now, but that’s, that’s really not what happens. These queries all run and they attempt to take these locks, but they’re all working off the same, like live table data. There’s no, you know, there’s no snapshot consistent view of this data, like goat, like that’s often stored anywhere.

So like even two select queries in the same transaction can encounter different rows when they run. Again, this might be good enough for you if, if you even need this high of a level of isolation, but it is still not a snapshot view of the data.

And once again, you would still need a, a row versioning isolation level in order to obtain that. Anyway, that’s enough about serializable. You probably don’t need it.

You probably don’t want it. When you use serializable, you certainly prolong blocking operations, especially from read queries, and you very, very certainly increase your likelihood of encountering deadlocks with read queries, taking and holding shared locks in ways that are often unfamiliar to people used to working with either read committed, the locking isolation level, or just festooning your queries with no lock hints, every which way you go.

So I hope you enjoyed yourselves. I hope you understand serializable a little bit better now. And I will see you over in the next video where we’re going to, we still have a, we still have some ground to cover with the transactions and isolation level stuff.

So we’ll, we’ll, we’ll get to something a little bit less isolating than serializable in the next one. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Transactions and Blocking and Confusion

Learn T-SQL With Erik: Transactions and Blocking and Confusion


Video Summary

In this video, I delve into the world of transactions in SQL Server and explore how they can lead to blocking issues that might seem perplexing at first glance. We walk through a practical example where an update and insert operation within a transaction block a simple select query, highlighting why using `NOLOCK` hints might not always be the best solution. I also discuss the limitations of the `sp_whoisactive` stored procedure and the `BLOCKING_PROCESS_REPORT`, showing how they can sometimes mislead you about what’s actually causing the blocking. The video emphasizes the importance of considering transaction length and isolation levels to avoid unnecessary complications, urging viewers to keep transactions as short and focused as possible for better performance and reliability.

Full Transcript

Erik Darling here with Darling Data. And we’re going to, of course, have some more T-SQL laughter, fun, joy. You know, all the stuff that comes along with writing T-SQL, really, aside from, you know, the pain, confusion, annoyance, all that other, all the other stuff. We’re going to have the fun part, right? So again, all 23 hours of the beginner content out there live for you to start consuming, digesting, mulling over. And it is all available at the pre-sale price of 250 US dollars. That’s good for life. But the price is going up to 500 bucks when the advanced material drops. So, uh, buy it now. I guess, I guess the call to action there. So in this video, we’re going to talk a little bit about, um, transactions in SQL Server. And specifically, when it comes to dealing with like blocking stuff, how like, what you see by the way you see is that you’re going to be doing a lot of things. In the way you see that the block can be very, very confusing and misleading.

So, uh, I have an, I have a transaction here and within this transaction, we have two statements, right? We have an update and we have an insert. Um, and over in this window, we have a select query that’s going to try to select data. Well, I guess it worked there. Uh, look at that. Hey, it worked. We’re not running anything yet. Good job us. We have a select query that’s going to try to select some data from the users table. All right, good. So let’s come over here and let’s begin our transaction. And now let’s update, uh, this one row in the users table. And, uh, let’s pretend that that update worked very, very successfully. And now let’s, uh, let, now let’s insert a row into the users table. If we come over here and try to run this query, and again, we’re not, we’re not going to be using any no lock hints, right? We can, well, I mean, I can show you, like if we say, uh, again, you’re like with, uh, no lock, da, da, da, da.

And we run this. Now we get, uh, this back. Of course, you know, over here, we updated the age column. So let’s, let’s, let’s add the age column in so we can see exactly what hell no lock has wrought. So we can see that we have updated John Skeet to be 21 years old. Maybe this is great for John Skeet. I don’t know if John Skeet liked being 21, but John Skeet is now a 21 year old, uh, software developer. But without the no lock hint, we, uh, just get blocked forever and ever, right? This can, this, because we, this, that other query has a lock and it’s not doing anything bad to anyone aside from, you know, like it’s not doing anything illegal, right?

It’s not, we’re, we’re, we’re annoyed. We’re, we’re definitely imposed upon. We are aggravated by being blocked, but that other query, that other transaction is not doing anything that it’s not supposed to be doing. If we come over here and we run SP who is active, the results are going to be somewhat confusing because what it looks like is happening is an insert into the user’s subset table is blocking a select from the user’s table, right? And we can see that this query has now been, I’ve been talking for almost 30 seconds while this poor query has been blocked.

Now we can see, uh, if we use the get locks parameter that will enumerate all the locks. Now I don’t recommend using this if you’re in a system with a lot of blocking going on, because this can take a long time. Like if there’s a lot of blocking sessions and like this can, it’s like, it’s a cursor that does this and like XML stuff.

So don’t, don’t use get locks if you’re in a real blocking crisis. Um, but if you come over here and we look at the locks at the end, that the select query is trying to take, it’ll, it’ll say, yeah, we were trying to take a lock on the user’s table. Like we’re, we’re, we’re just in there and, but up here in the, in the locks XML for this one, we’ll see two different things.

We’ll see that we have locks on both the user’s table from that first update that ran in the transaction. And we have locks on the user’s subset table. So we can at least dig in a little bit further, but if you’re, if you’re running SP who is active without get locks, without the get locks parameter, you’re going to not have that information, right?

We, we don’t have that handy XML thing over here. All we see is this insert blocking the select. And this could be very confusing because this is a completely different table.

And how could this completely different table be blocking a select from, from this other table? It’s, it’s, it’s not a very fun situation to be in. So if we just, you know, roll this thing back, right?

We say, Hey, you know, screw it. We, we made a mistake there. Let’s get rid of that. Uh, now we can run this and we’ll see John Skeet’s age is back to null. John Skeet, he’s null years old.

He’s not, he’s not 21, but what’s even worse is the block process report. Now the block process report is considered by Microsoft to be a best effort report of what was blocking. Unfortunately for us, unfortunately for us, what we see in the block process report is the same thing that we saw in who is active.

What we have, what looks like an insert into users subset blocking a select from the users table, right? That’s all this stuff over here from users. Now, if you like in real life, if you were to like, you know, you know, run, run SP human events block viewer, my wonderful free store procedure, uh, you would look at this and you would say, Eric might be on crack.

Eric, Eric might, Eric might have done something weird in this, but I, I was not me. I did not do this. Uh, what, and, and, you know, you might start looking at the user subset table and being like, well, is, is there a, is there a foreign key to, to users?

Is there, is there a trigger? Is there a, is there some indexed view that’s, that, that holds these things? No, it’s just the block process report being annoying, right?

Because all this stuff happened in the, in the scope of a transaction and the locks on the users table from the earliest statement in the transaction are still being held. Even though we have moved on and done something else in the transaction, uh, is blocking our select query. No wonder people just put no lock everywhere.

This stuff sucks, right? But this is where a row version and isolation level would help you. Cause then your select wouldn’t get blocked and it wouldn’t read wrong data. But you know, the block process report, if we scroll way over here and we click on this and we scroll past all this stuff that got added in SQL Server 2022, uh, then, you know, we will at least be able to see, uh, some more information about what, what went on in here.

Uh, way earlier in the XML, there’s like this object ID and some other stuff. And depending on how nice the block process report is, is feeling, sometimes you’ll get, you know, uh, sometimes you’ll get an actual object ID that works. And sometimes you’ll just get some weird randomish number.

I don’t, where did this come from? I don’t know. So do either of these objects have this object ID? Nope. Certainly do not. So here we are.

Anyway, uh, the block process report, well, can, can, well, it can be very useful to show you generally what’s going on with blocking is, uh, sort of not your friend. But there’s, there’s a, there’s a larger message here. And that is that, uh, something, something that a lot of people misunderstand about transactions is lock lifetime.

So that update ran, right? We started a transaction. We ran that update.

We hung around for a minute, smoked a few cigarettes. Then we inserted some data into another table, smoked another few cigarettes, and then tried to run that select query. The whole time that we were being cool, partying, smoking cigarettes, drinking champagne, doing all that fun stuff, those locks were being held from the update. The more stuff you put in a transaction, and the longer that stuff takes, the longer those locks hold, the more of a chance you have of causing problems.

Again, like, you know, like people just use no lock everywhere. They’re like, I don’t want these problems. But this would cause problems for another, for other competing modification queries too, right?

Like, like, like other updates, inserts, deletes, all that other merges, all that other good stuff. That would get blocked as well. So think about transactions, uh, and the length of time that a transaction is open.

Sort of being like the surface area of your query. The bigger that surface area is, the more problems it can cause. The more problems that can be caused to it.

So if you’re in the habit of using transactions, don’t shove everything in the world in there. Like, it’s gonna, it’s gonna hurt you, right? Especially because you’re using SQL Server and the default isolation level is read committed.

Right? Otherwise, you read a bunch of wrong data with no lock. So, uh, when you’re, when you’re using transactions, please make sure to pay special attention to which queries you’re putting in there. Make sure that they all actually need to be logically grouped together.

Make sure that they all actually need to commit, sorry, succeed or fail altogether. Uh, tune those queries as diligently as possible. Do not put anything in there that does not need to be in there.

Do not put any extraneous calls, especially outside of the database in there. I worked with one client years ago who was, uh, in a transaction doing a bunch of stuff. And then also using XP command shell to call like an executable and like go like, like write a file and then upload that file.

And then like do some other stuff. And then all that other extraneous database stuff was real bad time. So, uh, be very, very careful when you start using explicit transactions because the lock lifetime on those is from when you take those locks, right?

When that update took locks until we hit rollback there. So be very, very careful. Be very, very diligent with your transactions.

They should be as short and sweet as possible. All right. That’s probably enough here. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video.

We’re going to, uh, I think we’re going to talk about the serializable isolation level next. Cause boy, boy, howdy. She’s a humdinger that one. That’s serializable. Anyway, thank you for watching.

You’re very, you’re very, you’re very smart.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Row Versioning vs Locking Isolation Levels

Learn T-SQL With Erik: Row Versioning vs Locking Isolation Levels


Video Summary

In this video, I dive into the fascinating world of isolation levels in SQL Server, specifically focusing on the differences between locking and row versioning isolation levels. I share my experience of briefly testing out a Wario voice at the start, which didn’t quite hit the mark as I was still digesting lunch. However, things are about to get lively as we explore how read-committed snapshot isolation and snapshot isolation can offer more efficient query execution compared to traditional locking methods. I also highlight why Microsoft’s decision to default to read committed snapshot isolation in Azure SQL Database is a step forward, encouraging viewers to consider the benefits of row versioning isolation levels for their own databases.

Full Transcript

Eats-a-me, Erik! That was my best Wario voice. I don’t know, I don’t know, I don’t think I did very well. We’re going to talk a little bit about isolation levels in this video. I just want to briefly describe what the difference between walking and row versioning isolation levels.

If I seem a little sleepy, it’s because I just ate lunch and I’m like, just like a bear full of salmon right now. Anyway, I promise it’ll be very lively in a moment. As soon as that fish oil kicks in, you see what it’s like. My skin starts to glow and my hair gets shiny. All 23 hours of the beginner content is up and running and out there for you to start learning from.

It is all available at the presale price of $250. And all this content is going to double in value after the summer when the advanced material drops out of my head. So I would, again, encourage you to buy early and save yourself some money. All right, let’s talk about isolation levels. In SQL Server, isolation levels can be broadly defined as locking, row versioning, and of course, read uncommitted.

The locking isolation levels are read, repeatable read, and serializable. And the row versioning isolation levels are read committed snapshot isolation and snapshot isolation. And then we have this uncategorized mess, this monster, read uncommitted, which is a synonym for no lock, right?

No lock and read uncommitted, same thing. They do not, there’s no difference except where you write them. Still take schema locks and it will read data from uncommitted transactions, commonly referred to as dirty reads. This is the one that you generally want to avoid, though most of you have done quite a poor job failing grades on avoiding read uncommitted slash no lock.

So for the locking isolation levels, the locking isolation levels work directly with data and tables and indexes by acquiring shared or exclusive locks as data is read. That’s an important thing to understand because like the locks don’t occur like in mass, right? Like I guess you could specify like a tab lock or something and have SQL Server just be like, no, but under most circumstances, you read something you like, as you read data, the locks are required.

You know, even a tab lock isn’t guaranteed to be able to be taken the second you say to take it because you might have competing locks that would prevent it. Read queries under locking isolation levels might be blocked by modification queries while waiting to take their shared locks on rows. They may also block modification queries under certain conditions where shared locks are held until the statement has finished executing.

This is particularly true of serializable and repeatable read, but it is also quite definitely true under read committed. Read queries can indeed block write queries. I will have examples of that later.

Depending on the strictness of the locking isolation level, shared locks may either be held again until the statement is finished or very quickly released. Because like read committed, you’re just kind of like chewing along. You take your read lock, you take your little shared lock, you get whatever data you need, whether it’s rows or pages, and then you let that shared lock go and you carry on.

Both repeatable read and serializable will hold those shared locks to prevent modifications from modifying data that they have already read. So that doesn’t mean that they look, they’re not really like looking forward and trying to like, you know, jump ahead to take locks. But as they read data, then that’s when that all goes down.

The row versioning isolation levels are a little bit different. The row versioning isolation levels will read copies or versions of locked rows, which can, of course, result in fewer detrimental blocking and deadlocking scenarios. I, you know, I, well, I am a big proponent of the row versioning isolation levels, you know, especially all of you SQL Server people out there who are fawning over Postgres, who just can’t wait to leave SQL Server and get over to Postgres land.

Postgres uses multi version concurrency control by default, which is a row version, which for that, which means row versioning, right? Except they do it in the worst way possible and they keep all their versions in the table and you have to clean that up. You have to vacuum that mess up. So you’re going to have a whole new set of problems once you get over to your free database.

But the, the, like, you know, just to understand in SQL Server though, the implementation of the row versioning isolation levels, I believe is far superior. It’s still not for free, right? Because you have to version the rows. You have to read from potentially large stores of version rows, even queries that just have to go and check to see if there are like versions of rows that they might have to go read.

That’s not free adding eight by pointers to your rows so that you can follow the version stuff. That’s not free, right? So like, there’s a lot of stuff that’s still not free about it, even though it’s better, it’s still not free. But I think the non locking isolation levels, the row versioning isolation levels are generally better and less troublesome than your, than your locking alternatives.

And I think that the very easy way to tell that this is true, at least in the context of SQL Server, we’re not, we’re not talking about Postgres anymore, is consider that by default, SQL Server uses read committed locking as its, as its isolation level. This is of course different if you use Azure SQL database, Azure SQL database uses read committed snapshot isolation by default, but you can turn it off. But nearly every single application I see, nearly every query that someone sends me to say, hey, what’s going on with this thing is just absolutely festooned with no lock hints, right?

No lock everywhere. It’s like the, like people have such a hard time with read committed, the locking default isolation level that they’re like, screw it. I’d rather have wrong results than deal with this.

Right. Just no lock hints every year. There’s years of warnings about the perils of no lock. Like, you know, everything that people have, like all the blog ink and blood that has been spilled and drank and spilled again. Uh, you can’t keep people away from it.

It’s like, it’s, it is the, one of the most toxic database relationships that I have ever seen in my life. And, you know, like I said, also like consider that when it came time for Microsoft to offer a SQL Server to the world, like, but they were hosting it, they were managing it. They’re like, we’re in charge.

We can do this. Right. We have separate subject, but, uh, they at least got this part, right. And they were like, no, like, like we’re, we’re hosting that. You know what?

We don’t want to deal with a bunch of stupid reader writer blocking and deadlocking. You know, like, this is so annoying. Like, why would anyone want to deal with this? And like, meanwhile, they’re like, that’s, they’ve been forcing the general public to deal with for the last like 30 something years. Right.

When Microsoft got their hands, they’re like, nah, we got better things to do. So they made the default isolation level read committed snapshot isolation. So a lot of this is just to set the stage for some somewhat simpler statements. And, uh, we’re actually going to close this one out here.

Uh, under row versioning isolation levels, read queries are not blocked by write queries. They go read, uh, versions of the rows, copies of the rows that are locked. Uh, also read queries do not block write queries.

Another good thing, right? It’s, it’s almost, it’s like using no lock, but better because there are no dirty reads. Uh, and also deadlocks do not occur between read queries and write queries. There are of course, some edge cases and caveats to that where, uh, that like complicate those statements a little bit.

Like things like, you know, uh, far and key cascading, uh, maintaining, uh, index views, uh, that have multiple tables and then do weird stuff to the row versioning isolation levels. But those are like kind of out there edge cases. They’re not like everyday stuff that’s going to be happening to you.

Like every three seconds in your database that it’s going to make you use no lock hints. Under the locking isolation levels, read queries can and are quite frequently blocked by write queries. Read queries can also, uh, be blocked by write queries and deadlocks can occur between read queries and write queries.

So there is, uh, there is a lot bad to be said about the, um, about the default situation in SQL Server. I think that head Microsoft made a better choice, uh, back in say 2005 when the, uh, the row versioning isolation levels were introduced to the world. Perhaps not to say that all of your databases are now going to be using a row version isolation level, but perhaps to say that any new databases will by default be using a row versioning isolation level.

But you can turn it off if you want, sort of like with Azure, like Azure SQL database. You, it’s on by default, but you can opt out of it. I think SQL Server would be in a much shinier, happier place than it is today where people still have all sorts of unwarranted fears and misconceptions about row versioning isolation levels.

Perhaps from reading terrible blog posts about them that involve marbles and whatnot. Anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in the next video where we will talk about, uh, I forget, but it’s still transactions and isolation level. So it’ll be a good time because that’s what we do here. We have a good time.

You can’t have a good time. What can you have? Nothing. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: A.C.I.D.

Learn T-SQL With Erik: A.C.I.D.


Video Summary

In this video, I delve into the concept of ACID properties in SQL Server databases, specifically focusing on how they apply to T-SQL operations. Starting from the basics, I explain each of the four ACID properties—Atomicity, Consistency, Isolation, and Durability—and provide clear examples to help you understand their practical implications. While the video doesn’t include demos, it does offer a thought-provoking example that illustrates how isolation can be misunderstood in SQL Server. By exploring scenarios where queries might not behave as expected due to read-committed isolation, I aim to highlight the importance of understanding and managing transactional integrity. Additionally, I touch on durability, explaining its significance and the potential caveats when dealing with features like delayed durability or in-memory databases. This video is part of my comprehensive T-SQL course, which will soon be expanding with advanced material after summer. If you’re looking to enhance your SQL Server skills, now might be a great time to join at the presale price before it doubles in value!

Full Transcript

Erik Darling here with Darling Data, and we’re going to continue on with the teaser content from my T-SQL course. All 23 hours of the beginner stuff out there published, ready for you to start consuming. The advanced stuff will be dropping after the summer. The course is still on the presale price of $250, and it will literally just absolutely double in value once the advanced material pumps out. So I would strongly suggest that you get in on this now while it’s at the cheaper price because I realize how price and cost sensitive a lot of people are. So in this video, we’re going to talk about acidity because acidity in the context of databases and more specifically in SQL Server is not quite the level of acidity is not quite the level of acidity that you might receive if you read a highly academic Wikipedia article that outlines all of the effects of the ACID, which is Atomicity, Consistency, Isolation, and Durability. So we’re just going to cover those. There’s no demos in here. It’s just sort of a one kind of example query to give you some food for thought on that.

So let’s look at this one on exactly how isolated a query executing might be. So Atomicity means each transaction either happens or it doesn’t. SQL Server by default runs in auto commit mode. So if you want to select, it’s sort of like in the background SQL Server on its own without you typing anything. It’s sort of like saying, you know, begin transaction, run the select commit. Same with inserts, updates, and deletes. It’s auto committing everything. So you don’t have to declare an explicit transaction for every piece of code you write. You can, of course, declare a specific and explicit transaction. But when you do that, like the, you know, assuming that you do it for a reason that is not just putting one query into an explicit transaction, there are multiple queries in a transaction. It is for the purpose of having those queries succeed or fail as a single unit. We do not want to leave those queries in a strange, lingering, stranded state.

But let’s say that you’re running an insert query and you’re trying to insert two rows. The first row goes fine, but the second row causes a, I don’t know, say primary key or foreign key violation, you know, constraint violation, something along those lines. That just means that, like the first row that even though that one was okay, isn’t going to still be in the table. That statement will roll back and the successfully inserted row will get removed.

Zooming out a little though, all atomicity means in a database is that each transaction will succeed or fail as a unit. It does not guarantee that other queries can’t interact with the underlying data while they’re running. Right? Like, you know, while you are inserting your two rows, someone else might insert a different two rows and someone else might delete a row somewhere else in the table.

And someone else might update a row somewhere else in the table. And someone might run a no lock query and select your two half inserted rows, right? A row and a half is in there or something.

Consistency for SQL Server is especially disappointing. Because the only thing that the C gets you is a guarantee that your constraints are obeyed. By constraints, I mean primary key, foreign key, unique, not null, and check.

Default constraints, I mean, well, I suppose they will be obeyed. They hardly qualify here for, I forget if I covered it in the teaser material for inserts. But if not, I covered in the full material.

So you should go buy that. But, you know, just don’t expect too much out of the consistency thing, at least in the context of SQL Server. However, isolation is the toughest one because most people are completely confused about what isolation means, like what isolation levels guarantee and at what point they provide the various guarantees that they make.

So let’s say that you have a code pattern that looks like this. You begin a transaction. You declare a local variable.

We’re just going to call this next user. And when you go to assign this local variable of value, which is going to be the ID column from the users table, you have to access, in the context of this query, you have to access the users table twice. If you’re, like, we don’t have any walking hints on here.

We don’t have any isolation level hints on here. So just assume that we’re using the default locking read committed isolation level. That doesn’t really guarantee much.

We’re going to talk way more about that in some of the other videos. But just understand that, like, you know, when you read, when you say select from, that’s one read of the users table. And when you say select from here, this is another read of the users table.

These two reads are not going to show data from the same point in time. Read committed is not a point in time view of data. So you might read this data from the users table in the outer context or the inner context at a different time from the inner context or the outer context.

And you still, if you’re scanning through the data, you have to read along those pages. All sorts of stuff can happen to your data as you’re reading along those pages. Again, it’s not a snapshot of the data.

It’s just you’re just taking little read locks as you go along and do stuff. As you’re taking those little read locks, all sorts of stuff can happen all around you. So, like, you know, just if we were to look at the query plan for this, just hit control and L.

We have one scan of the users table here. That’s going to happen at one point. This is going to run like the, like you have another scan down here.

But this isn’t going to run at precisely the same time because we have to get all this data, aggregate it, right? Scan the table, all two point something million rows, aggregate it, gather the streams, aggregate it again, distribute the streams again, and then go into a nested loops join so that every row that comes out of this goes in here and then reads the data.

And since that loop join is just going like this over and over again, these are all at different points in time too. Again, there’s no snapshot happening. So if the goal here is to find the top, like the most recent accessed user in the users table and then go assign them some award, right?

Say, hey, you’re the most recent user. You win a million dollars or reputations. You can go buy a cool stack overflow windbreaker or canoe or Zippo or something.

You know, like you don’t have a guarantee that that person actually was, like when this update ran, the person who should get it. Because all sorts of stuff can happen all around you. Someone else could log in right after you ran this query and this, whatever value you assigned to this would no longer be valid.

So this is where you would have to think about increasing the isolation of this query. Perhaps you would need serializable. Perhaps you would need updlock, right?

Perhaps you would need to do something to protect this data when you read from it both times in order to make sure that this date, like nothing else sneaks in and does something to this data, changes the validity of the value that you assign to that next user when you go and assign some stuff here.

Durability is an easy one, right? Durability, this one is very simple. What this means is that once a transaction commits, you won’t lose it.

There are some caveats around features like delayed durability, but in, you know, I think the in-memory stuff kind of makes it might make this a little weird too. But in most cases, unless you experience data corruption, committed transactions will be safe.

So just like talking, like thinking about the ACID stuff in the context of SQL Server, you know, A is most likely the strongest guarantee. C is a very iffy proposition.

I, you know, as you think about like, you know, depending on the isolation level you’re using, the I might not mean a damn thing, especially if you’re using read uncommitted or nolock. I means nothing.

I might not, I might as well not even be there. I might as well just be ACID. And durability is, you know, up there with A as far as strength goes, though there are some caveats to durability depending on database settings and, you know, the reliability of your underlying storage hardware.

So anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. In the next video, we’re going to talk a little bit more about isolation levels and stuff like that. So hold on to your bucket hats for that.

Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 26

SQL Server Performance Office Hours Episode 26



To ask your questions, head over here.

Some of my reporting queries finish a little faster with NOLCOK hints even when nothing is going on in the database. Any ide why?
I get the argument that new features sell better than fixed up old features, so how come highly upvoted User Voice items for new features have sat around for years with no action?
There are a lot of applications out there that do horrible things to SQL Server. Why don’t software vendors hire people like you to fix things?
What is batch mode not a good fit for?
Is it always possible to tune queries to save money in the cloud with smaller hardware?

Video Summary

In this video, I delve into some intriguing questions submitted by viewers during my Office Hours session. We explored topics ranging from why certain queries perform better with no lock hints to the challenges of getting new features implemented in SQL Server. I also touched on the dynamics between software vendors and database performance experts, offering insights on how to optimize cloud costs through query tuning and hardware reduction strategies. Whether you’re a seasoned DBA or just starting out, there’s plenty here to help you navigate the complexities of SQL Server optimization.

Full Transcript

Erik Darling here with Darling Data, and we have another exciting rendezvous with Office Hours. My favorite. That’s where I get to be a real man of the people and answer some people questions that you submit. If you want to submit your own question, you can go to that link, which is down in the video description. If you feel so inclined and you like the material that I produce on this YouTube channel, you can go to that link, which is down in the video description. If you feel so inclined and you like the material that I produce on this YouTube channel, you can also sign up for a membership, you can also sign up for a membership, and you can support me with as few as $4 every month. If you need SQL Server Consulting help, I am out there in the world, working with clients, helping people, tuning queries, doing all sorts of stuff. And as always, my rates are reasonable. If you want to get in on my performance tuning training, you can get all 24 hours of it for about $150 buckaroos that is at that link with that discount code. That is for the everything bundle that covers my performance tuning training. If you would like to buy my new T-SQL course, that is also available there. All 23 hours of beginner content is done and recorded. This course will double in value from $250 up to $500 after the advanced material comes out after the summer months.

So save yourself some money now and buy that while you can. Redgate is being a bunch of real sweethearts and dragging me out of my house to varying degrees this summer. New York City, August 18th and 19th. Dallas, Texas, September 15th to 16th. And Utrecht in the Netherlands, October 1st and 2nd. This is all part of the Pass on Tour events that they are putting on, little mini Pass events. And this is all leading up to Pass Data Community Summit in Seattle, September 17th to 21st. But it is time to go answer some questions now. It is time to have a bit of question asking and answering fun. All right. So first up, here is an interesting one. Some of my reporting queries finish a little faster with no, let’s just call that a no lock hint.

Even when nothing is going on in the database. Any eyed why? Ah, sane eyed. So when you use no lock hints, aside from the fact that you are requesting the read uncommitted isolation level, which will allow your queries to ignore locks taken by modification queries that are executing concurrently, you also allow the SQL Server storage engine to access data a little bit differently. It can use what’s called an allocation order scan using IAM pages rather than using index pages to read through stuff. And that can sometimes be faster.

So, um, like that, that is most likely why I have an old video about it where there was a really dramatic difference between a query with no lock hints and a query without it with nothing else running on the server. Uh, part of it was reading, uh, like from a cold cache that was like reading from disk into memory. Um, you might look at the difference between those and you might see the, uh, the ordered property of whatever index scan operators being false.

Now keep in mind, this is for index scans. Index seeks don’t really get that behavior because you’re seeking along a B tree, but scanning a big index will, will definitely see that. All right. Next up here we have, oh boy, getting, getting personal here. I get the argument that new features sell better than fixed up old features.

So how come highly upvoted user voice items for new features have sat around for years with no action? Well, apart from the fact that, um, Microsoft seems to have the attention span of a squirrel that got into a bag of meth. Um, you know, uh, a lot of what, like, like you have a, like every, like every release, they have like a certain amount of budget that they can spend like people’s time and like salary on to develop stuff.

And, you know, some, some narratives get pushed a lot harder than others, you know, like, um, turning pink because of some weird light thing here. But like, like, you know, just like, if you look at like a lot of, a lot of recent releases, you know, like some narratives got pushed really hard that ended up just being like nowhere’sville. You know, like, um, you had big data clusters, which are a complete flop, right?

They’re, they’re completely deprecated now. Uh, ledger tables, no one cared, right? Like blockchain stuff. Um, you know, it’s like, there’s just a lot of things that, they get poked out there that, you know, are driven by these sort of like checkbox items to, you know, make sure that you have the new shiny thing in your database that people that like executives and C-levels currently care about.

And not a lot of that stuff aligns with what users care about. You know, it’s like, uh, like from what I know talks about, like, you know, improving partitioning come up, uh, like every release cycle and everyone’s just like, well, I mean, whatever. Uh, you know, the argument is that like, you know, it’s like, it’s like this stupid circular argument.

It’s like, like, well, the feature kind of sucks. So no one, like, yeah, like we could fix it. And like, yeah, but no one uses it. So it wouldn’t really have that much impact.

It’s like, well, no one uses it cause it sucks. Right. It’s like, you made it painful to use. Of course, no one uses it. Right. Like, of course it doesn’t have wide adoption. You made it miserable. Right. So, you know, uh, don’t expect that to change. Uh, I guess that’s the bottom line.

Unless you are like Walmart or something or SAP and you, uh, swing a big purse at Microsoft and you really want to push for stuff. You probably, you, you have the worst voice in the room, the quietest voice in the room. Um, and you’re like, like that, that user voice size is just a black hole for ideas anyway.

So like, unless you’re reporting a serious bug, uh, good luck. Good luck out there. There are a lot of applications out there that do horrible things to SQL Server.

Why don’t software vendors hire people like you to fix things? You know, that’s a great question. Um, I have actually worked with a number of software vendors. I am always happy to do so.

Uh, and it is always nice to improve products as a whole. Uh, I have noticed that this, this trend has actually peaked up a bit, uh, since like software as a service has become more and more ingrained in what a lot of companies are offering. You know, when it was like, you know, we sell you a piece of software and you install it on a SQL Server that you’re responsible for.

You couldn’t trust anyone to install anything and do anything right. Like you had no idea what version or a, like you could, you could put out a best practices guide, right? And you could say like, here’s a setup checklist.

Here’s the stuff you should do. Like here, like the bare minimum requirements here, like, you know, hardware requirements, depending on database size, stuff like that. There is no guarantee that anyone’s going to follow any of that. So, uh, the software as a service thing has, um, has, has gotten me hired by more software vendors because now they’re responsible in house for like infrastructure, the code, the indexes, everything.

And so I have been doing more work like that, but, uh, I would love to do additional work like that. So if you know of a software vendor who you think, uh, should, should hire perhaps an Erik Darling in the world who has, who has very reasonable rates, uh, to improve their software offering, you, you, you feel free to send them. Well, I don’t, I don’t have a resume.

You can send them to my website and that’s, that’s probably as good as it’s going to get as far as, uh, resumes go. Uh, let’s see. I seem to have lost my Zoomit cursor there. We’re going to try that again.

Uh, so here’s a, here’s an interesting question. What is batch mode not good for? Well, I think, you know, uh, I assume this question is because I’ve been, you know, uh, doing some videos lately about stuff that batch mode is good for writing batch mode friendly queries time. You know, when like, you know, like how to do stuff, how to think like batch mode, stuff like that.

So, uh, batch mode is typically like, you know, good for like your heavy lifters, right? It’s like your, your, your bigger queries that have to like go through a lot of stuff. If you just have to like find a few rows and like send a few rows along and, you know, stuff like that.

If it’s like, think like, like, you know, OLTP ish workloads, uh, batch mode isn’t generally a great fit for stuff that you do in there. Like typically where batch mode comes into, into play are, you know, when you have to do like big scans of tables and you have to aggregate a lot of rows down, uh, or, you know, you’re like, you know, you have like, you have to like hash join a lot of rows or hash aggregate a lot of rows, stuff like that. If a lot of your query plans are like single threaded, like little loopy join things anyway, that’s, that’s really not a good fit for batch mode.

Obviously modification query is not a great fit for batch mode, things like that. But, you know, um, you know, it, it’s, I, I am happy that Microsoft has stopped locking some of the intelligent query processing stuff behind batch mode because that was, that was kind of a drag. Uh, but you know, I, I, I do wish that like batch mode on rowstore was a little bit more lenient with where it would kick in.

But anyway, uh, let’s go on to the next one here. Cause this is, this is, this is a kind of a fun thought bubble question, isn’t it? Uh, is it always possible to tune queries to save money in the cloud, but like with smaller hearts?

So I mean, I assume that means like by reducing hardware. So I would say generally, yes, but it really does depend on where you’re starting. If you’re already not on, uh, if like, if the hardware that you’re on is like sufficient for what you’re doing and, you know, like you, you’re thinking to yourself, I need to add hardware to keep up with this.

Then yeah, absolutely. Um, but if you’re, if you’re at a point where like, you know, you have like terabytes and terabytes of data and, you know, and like the, like maybe like, you know, high double digits, low single digits memory. And, you know, depending on kind of like some, you know, user concurrency parallelism stuff, like a small number of CPUs, it’s, it gets harder to, uh, you know, tune things so that, uh, so that you can reduce hardware for that.

You can certainly tune things to make better use of the hardware you currently have, which can give you a better idea of what you might need to increase stuff to. So like, like one way to think about it is, you know, like if, if you’re on like a smaller server and you know, you need to go to bigger hardware, it could certainly pay off to do a lot of query and index tuning. So that, uh, you can make a more, like you can make a more like good conservative choice about the next hardware step up.

So like, instead of like quadrupling hardware, maybe only like double or like, you know, 1.5 X your hardware. So you don’t have to go like, you know, like nuclear and get like, you know, like four terabytes of memory or something. Right.

Like not that, you know, a lot of, a lot of, a lot of cloud instances don’t even go up that high, like managed instance piece of crap. But, uh, anyway, um, like, you know, like it is, it is always possible, but really it depends. Like if you, if you have like, you know, like if you just went like to the absolute biggest server you, you could possibly find, uh, and like, you know, no one’s really doing the, no one, no one’s doing, or no one has historically done like meaningful query and index tuning.

Then yes, absolutely. You know, you can, you can bring a lot of stuff down that way. But, uh, you know, a lot of the stuff that I see is sort of painfully like under, under provisioned, uh, from a hardware perspective as a starting place.

And that just makes it a lot, that this makes it a lot harder to do stuff that can make enough of an impact to say, oh, now you can reduce hardware further. It’s like, cause you’re already spending like, you know, some tiny little instance size already, you know, and you’re, you’re expecting, you know, this miraculous 24 seven mission critical 30,000 batch request plus reporting workload to, to all of a sudden fly on that thing. It’s, it’s a little bit tougher of a proposition.

Um, so, you know, like, but there are things you can do, obviously, you know, like, you know, like aside from like the generic index and query tuning, um, you know, I like one of the, one of the biggest things that I, I push is, uh, compressing indexes, which makes them smaller on disc and in the buffer pool. So that you make better use of, you know, those expensive, like, you know, uh, IOPSy discs that you, uh, pay, pay too much for. And so that, and you also make better use of the memory on the server.

Um, so, you know, there’s of course, you know, times when absolutely you can, you can like hit home runs with this, but there are also times when, uh, you know, you like, sometimes you, sometimes you put in all the effort and like, you just allow the server at that size to tread water a little bit longer. So, you know, always no, but generally, you know, you can, you can, you can do meaningful things to, uh, to improve the situation almost regardless of what, what the starting point is. But, uh, there, there are some, certainly some outlier starting points that I’ve seen that, uh, you know, like you could, you, you could tune, you could tune to your heart’s content and, uh, you just wouldn’t move in.

You wouldn’t move the needle enough to, um, like to, to allow, like allow for a downgrade of hardware. Anyway, that’s about good for this one. We have answered five questions, me and the voices.

Uh, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video. All right. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Row & Range Window Functions vs Batch Mode in SQL Server

A Little About Row & Range Window Functions vs Batch Mode in SQL Server


Video Summary

In this video, I delve into the nuances of window functions and how they interact with rows and range modes in SQL Server. I explore the differences between using `ROWS UNBOUNDED PRECEDING` versus `RANGE UNBOUNDED PRECEDING`, explaining why the former was historically slower due to on-disk spooling, while the latter could be more efficient. By comparing these methods through practical examples and query plans, I demonstrate how modern SQL Server versions and features like batch mode can significantly improve performance for window function queries. Whether you’re working with large datasets or just looking to optimize your T-SQL code, this video offers valuable insights into leveraging these powerful tools effectively.

Full Transcript

Hey, it’s Erik Darling, Darling Data, your number one source for SQL Server content on the internet. I hope. If you can go anywhere else, I mean, I can’t say what you’ll get will be correct or high quality. So, careful out there. Anyway, today’s video, we’re going to talk about rows and range versus batch mode. Now, this all sounds terribly foreign to you. You may not work with window functions very much. But back to the end, I’m going to talk about rows and range versus batch mode. Now, if this all sounds terribly foreign to you, you may not work with window functions very much. Back in the, let’s say, pre-batch mode days of window functions, SQL Server has two different ways of sort of giving you a partition clause or giving you a window of things to look at. You can look and specify a certain set of row elements to be in there, or you can specify a range of row elements to be in there. Now, back in the bad old days, you had to be very careful about specifying the rows. Because rows was a lot faster. If you use range, you got this terrible on disk window spool. There was very, very, there was not very, very, sometimes very, very slow. Let’s just say it was much slower than range.

But now, with batch mode, all that kind of goes away. So we’re going to look at examples of that in this video here today. But before we do, if you would like to sign up for a channel membership to support the best SQL Server content on the internet, there is a link in the video description below. And if you want to ask me questions privately that I will answer publicly on my Office Hours episodes, there is a link right up there for you to do that exact thing with. Otherwise, you can just do the usual YouTube stuff and keep me happy in different ways. If you need SQL Server consulting help, I am here for you, baby. I got you no matter what you need. You know, within reason. I mean, I don’t want to like carry a server somewhere for you. You got to do what you got to do. I’ll rent a dolly if I have to. And as always, my rates are reasonable.

My performance tuning training is about $150 for life. If you go to that link and use that discount code, that’s the everything bundle. And if you would like to pick up my new T-SQL course, all of the beginner content is recorded and published and available. It is currently on sale for the pre-sale price of $250, which will double in value to $500 once the advanced material is out about there in the world. And some other fun summer stuff coming up for me. Redgate is taking me on tour with the PASS mini events. That’s going to be New York City, Dallas, and Utrecht.

And of course, all that leads up to PASS Data Community Summit, November 17th to the 21st. So I look forward to seeing you at as many of those events as humanly possible. But with that out of the way, let’s talk about this window function stuff. Now, what I’m talking about when I say like rows versus range is specifically this part of the window function, where you say rows unbounded proceeding.

Now, I don’t technically have to say and current row because this is it’s by default when you do that. So if you just say rows unbounded proceeding, then you will just get like a running sum of all the prior rows up to the current row. Now, I’ve got this query hinted.

Well, let’s make sure. Query plans are turned on. I’ve got this query hinted to you as optimizer compatibility level 140. The reason for this is because my current database, which is on SQL Server 2022, is in database compatibility level 160. Being on developer edition, which is an enterprise equivalent SKU, and being on SQL Server 2022 in database compatibility level 160 means that I get the batch mode on rowstore feature automatically without having to do things, do any strange things there.

If my database compatibility level 140, I would have to use this hint to say 160 in order to show it off. So we’re just doing this in reverse this time. But when I run this query and we look at the rows unbounded proceeding, we are going to get back some results.

And they take a second because there’s a lot of rows. There’s about a million rows that come out of this. But this all takes just about a second to run.

You can see over here, just one second. It makes life easy. And, you know, things in here are not too terrible. If we run that same query, but we use the range unbounded proceeding, we’ve changed this from rows to range, then things are going to look a little bit different.

And if we look at the query plan, you can maybe felt that that took a little bit longer. That took twice as long at two seconds. Now, part of the reason why does require us running both of these at the same time.

I’ve turned statistics.io on for this because it’s a slightly easier way of seeing where that other second of time went. So now, if we look at the query plans next to each other, we will see just about that same pattern here, right, where it’s one second versus two seconds. And so the range one is twice as slow.

And if we look at the statistics.io results, notice that there’s this work table. Let’s frame that up a little bit better. There we go. And for the rows query, there is a work table with a scan count of zero and a logical reads of zero.

But for the range, there is a work table with a scan count of 1058840 and a reads of 5430748. So in the first query, we use rows, that work table didn’t actually end up getting like it was there, but nothing really happened to it. It’s almost like with a many to many merge, when you actually have a many, many, many, and you end up like there’s a work table, like when you’ll see a lot of activity there.

But when SQL Server is like, I think this is a many to many merge, but then there’s not actually a many to many. The work table doesn’t actually see anything. Now, let’s contrast this situation with database compatibility level 160, where we get the batch mode on rowstore feature to kick in.

I’m going to run this one, right? We’re going to take a look at how long this takes. And this, I mean, like we still have to display a million rows, so it may not feel quite as snappy as you’d hope, but it still runs very quickly.

We’re at just under a second in total for this. So we improved from the non-batch mode version by about, I don’t know, a couple hundred milliseconds or something. That’s not really the thing that is my big deal for this query.

The bigger deal for these queries is that the range unbounded proceeding version does not suffer and take twice as long when things run in batch mode. Now, this one takes just about the same amount of time as the previous one. So to talk a little bit about why, the main difference between these, and what I want to do is just get the estimated plan for this.

And if you look at what happens in here, we have the clustered index scan, we have a repartition streams, we have a sort, we have a segment and another segment, and then a window spool. The window spool over there, let’s clear these out so it’s just one thing in the middle that we care about. The window spool is where that work table occurs.

When this is on disk with range and using like a row mode execution, then this is a bad time, right? This is where that extra second in that work table lives. When we run these in batch mode, right?

And if I run these together, we’re going to get a different query plan. It’s a lot more compact and it has different operators in it. So if we run both of these, they’re both going to take around 800 milliseconds to run. Again, there’s like a million rows that come out.

So things are, you know, not quite perfect. But if you look at these, we still have the clustered index scan, and then we have a sort to put data in order. But then we have this window aggregate.

And this window aggregate doesn’t really have like the on disk thing for it when you, like, depending on if you use the default, which is range, or if you like go out of your way to specify rows. And if we look at the messages tab, and we look at like the work table for both of these, we’re going to see a scan count of zero in here and a logical reads of zero from the work table in here. So neither one of these did any of that like on disk window spool work, because in the query plan, we have the window aggregate rather than the window spool operator.

So this is like, you know, this is like a sign that your window function query ran in batch mode. You see the window aggregate and a lack of like the usual, like, you know, like, you know, segment sequence project, stuff like that. And then the window spool operator, which is where that hidden work table occurred in the other queries.

So if you are using, you know, a SKU of SQL Server and a database compatibility level where you can get batch mode, or if you are allowed to play some tricks on your queries, which I’ve explained in many other videos about how to get batch mode to happen for your queries without that, then you have a little bit less to fear with window functions if you need the range behavior, because the range behavior is slightly different from the rows behavior. So if you’re getting batch mode, you can be slightly less afraid of your queries, your window function queries that may require the range behavior over the row behavior.

If you are using, if you are getting batch mode execution, because you no longer have to deal with that pesky on disk window spool. So some, some, some light at the end of the tunnel out there for the, the big data analysis folks. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video where we’ll talk about something else interesting. I promise. All right. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: A Neat Thing with MERGE and OUTPUT

Learn T-SQL With Erik: A Neat Thing with MERGE and OUTPUT


Video Summary

In this video, I dive into an intriguing aspect of T-SQL using the `MERGE` statement and its `OUTPUT` clause, demonstrating something you can’t achieve with a regular `INSERT`. I walk through creating sample data in SQL Server Management Studio 21 (version 21.something.6.3) to illustrate how we can modify our target table by adding 10 to the ID column, incrementing dates by one day, and replacing values in another column—all while retaining visibility into what exactly was inserted. This feature is particularly powerful for complex processes where you need detailed insights into the changes made during a merge operation, allowing you to output this information to a different table for historical tracking or auditing purposes. I also highlight why trying to replicate this functionality with a standard `INSERT` statement results in errors, emphasizing the unique benefits of using `MERGE`.

Full Transcript

Hey! Hello, friends! My friends, my only friends, Erik Darling here with Darling Data. And we are going to do some more super fun T-SQL learning today. We’re going to talk about kind of a neat thing that you can do with merge and the output clause that you cannot do with a regular insert in the output clause. So I hope that is is majestically entertaining for you as you take time out of your busy day and schedule to watch this video. All right. So let’s get over to SQL Server Management Studio 21 version, 21.something.6.3. There’s been 15 updates in the last week or so. So I don’t know exactly which version I’m on. But what I want to show you here, so first let’s create some sample data to work with. So we’re going to create some sample data to work with. So we’re going to create some sample data to work with. So we’re going to create some sample data to work with. So we’re going to create a table called the target and a table called the source. Don’t ask me where I got these clever naming conventions from because it took a whole lot of, you know, really, like I needed a higher muse. There was a lot of creative work effort that went into those. But we’re going to insert 20 rows into this table called the source. And the data that we inserted into here is going to look, oh, that’s a good interesting keystroke, is going to look like this, right? So it’s the numbers one through 20.

So we’re going to look at the same tree in the ID column, a bunch of dates in the sum date column, and this sort of half of a Christmas tree or half of a triangle, however you prefer to call it, in the sum thing column, right? So we just have this nice sort of shape in there, right? Pythagoras would be proud. What’s that? It’s a triangle. Nope, it’s a bloop shape. All right. Cool. So let’s make sure that there is no data in the target table just in case. And what I’m going to do in this this merge query is something that you cannot normally do with an insert query, which I’ll show you next, is we are going to say, we’re going to merge, and we’re going to say, using merge into the target table. I do not have a serializable hint here because we are only doing an insert. Serializable is only necessary if you have multiple actions assigned to your merge statement.

So like, insert update, like the upsert pattern is the most common, but deletes would qualify as well. And we’re going to say, using the source, but we’re going to say, on one equals zero. So essentially, there would be no, there’s no match here, right? We’re making the implicit assumption that no rows will match between these two. And then we’re just going to say, when not matched by target, then insert into id, sum date, something.

And we’re going to say, and we’re going to use the values clause here, because that’s usually what you do with the merge and the insert. But we’re going to tweak what goes into the table a little bit. So we’re going to add 10 to the id column. We’re going to add a day to the sum date column. And we’re going to replace the bloop shape of As with a bloop shape of Bs.

All right. And then down here, we’re going to have our output clause. And in the output clause, we’re going to, I mean, A, there’s something you can do with merge that you cannot do with standard modification queries. And that is, you can have this action column. And this action column will tell you what came out of the merge. So if you write like a different type of merge that maybe has multiple actions, this action column will tell you if it was an insert, an update, or a delete, which is very helpful.

One thing that annoys me about output is that you can’t do the, like, column equals expression syntax. You have to say as. You have to say this as this, this as this, this as this. I guess it would be confusing for some people. But then I’m going to alias, just so we can see where the different columns start.

This is going to be where columns from the inserted table begin here. And then this is going to be, this is the other thing that you can’t do with a normal insert query is look at the source values in it, right? So we’re going to have a set of columns here called source values.

And this is where they start, this s dot star. And if we look at what happens when I run this whole thing, we are going to get back the output clause. And we are going to be able to, this is the really helpful thing, is that you can see what, you can see like the, like the source values, which you cannot normally see, right?

So this is where things started. It was the numbers 1 through 20. Everything was 20, 25, 06, 25, right?

And then this bloop shape was all A’s going down. But the values that we actually inserted look like this, right? Well, these are different now.

We can see that the IDs go from 11 to 30 rather than 1 to 20. Some date is all 626 rather than 625. And our bloop shape has taken on the form of all B’s rather than all A’s. So this has a lot of interesting uses, I think, that not a lot of people sort of get into, especially because as you write more and more complicated processes that use merge, you might need to have visibility into this stuff.

And you might even want to output this stuff to a different table to sort of retain a history of things as well. But notice that if we try to do this same thing with a normal insert and we write our output clause, this thing has a little squiggly under it. And even though this little squiggly thing should come from here, SQL Server is not able to, I guess, look forward enough to output values from this, right?

We’re not able to look in here and output any values from that. So this will just give us an error. SQL Server will say, no, we can’t do that.

The column prefix TS does not match with a table name or alias name, even though it’s just, it’s right here. But SQL Server is not able to get into that. So this is just, you know, something that annoyingly, very annoyingly, you can do with merge, but you can’t do with a normal, like, modification.

We’re using insert as an example here, but this is something you can’t do with a normal, like, insert, update, or delete either. You can’t see that source data. You can only see the change data, which is, well, I mean, I guess for updates, that’s, you know, you can see the before and after, but deletes, you know.

Anyway, looking at the source rows and there can be very useful for some query patterns. Anyway, just a neat thing that you can do with output and merge, I think, anyway. Thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you over in the next video where we will continue to express and enjoy ourselves in the T-SQL query language. Oh, that was a repair.

Structured T, Transact SQL query language. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.